[PATCH OLK-6.6 00/63] Add FUSE Passthrough Support

Amir Goldstein (57): fs: rename __mnt_{want,drop}_write*() helpers fs: export mnt_{get,put}_write_access() to modules fuse: factor out helper fuse_truncate_update_attr() fuse: allocate ff->release_args only if release is needed fuse: break up fuse_open_common() fuse: prepare for failing open response fuse: introduce inode io modes fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP ovl: use simpler function to convert iocb to rw flags ovl: propagate IOCB_APPEND flag on writes to realfile ovl: punt write aio completion to workqueue ovl: protect copying of realinode attributes to ovl inode fs: get mnt_writers count for an open backing file's real path fs: create helper file_user_path() for user displayed mapped file path fs: store real path instead of fake path in backing file f_path ovl: add helper ovl_file_modified() ovl: split ovl_want_write() into two helpers ovl: reorder ovl_want_write() after ovl_inode_lock() ovl: do not open/llseek lower file with upper sb_writers held ovl: do not encode lower fh with upper sb_writers held ovl: add permission hooks outside of do_splice_direct() splice: remove permission hook from do_splice_direct() splice: move permission hook out of splice_direct_to_actor() splice: move permission hook out of splice_file_to_pipe() splice: remove permission hook from iter_file_splice_write() remap_range: move permission hooks out of do_clone_file_range() remap_range: move file_start_write() to after permission hook btrfs: move file_start_write() to after permission hook coda: change locking order in coda_file_write_iter() fs: move file_start_write() into vfs_iter_write() fs: move permission hook out of do_iter_write() fs: move permission hook out of do_iter_read() fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: create __sb_write_started() helper fs: create file_write_started() helper fs: create {sb,file}_write_not_started() helpers fs: prepare for stackable filesystems backing file helpers fs: factor out backing_file_{read,write}_iter() helpers fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_mmap() helper fuse: factor out helper for FUSE_DEV_IOC_CLONE fuse: introduce FUSE_PASSTHROUGH capability fuse: implement ioctls to manage backing files fuse: prepare for opening file in passthrough mode fuse: implement open in passthrough mode fuse: implement read/write passthrough fuse: implement splice read/write passthrough fuse: implement passthrough for mmap fuse: fix wrong ff->iomode state changes from parallel dio write fuse: fix parallel dio write on file open in passthrough mode fuse: verify zero padding in fuse_backing_map fuse: respect FOPEN_KEEP_CACHE on opendir ovl: fix dentry reference leak after changes to underlying layers ovl: relax WARN_ON in ovl_verify_area() remap_range: merge do_clone_file_range() into vfs_clone_file_range() fs: pass offset and result to backing_file end_write() callback fuse: update inode size after extending passthrough write Bernd Schubert (3): fuse: create helper function if DIO write needs exclusive lock fuse: add fuse_dio_lock/unlock helper functions fuse: disable the combination of passthrough and writeback cache Ed Tsai (1): backing-file: convert to using fops->splice_write Vegard Nossum (1): fs: fix __sb_write_started() kerneldoc formatting yangyun (1): fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set fs/fuse/fuse_i.h | 151 +++++++++++-- fs/internal.h | 21 +- fs/overlayfs/overlayfs.h | 38 +++- include/linux/backing-file.h | 42 ++++ include/linux/fs.h | 89 ++++++-- include/linux/fsnotify.h | 3 +- include/linux/mount.h | 4 +- include/uapi/linux/fuse.h | 23 +- arch/arc/kernel/troubleshoot.c | 6 +- drivers/block/loop.c | 2 - drivers/target/target_core_file.c | 10 +- fs/backing-file.c | 339 ++++++++++++++++++++++++++++ fs/btrfs/ioctl.c | 12 +- fs/cachefiles/io.c | 5 +- fs/coda/file.c | 2 - fs/file_table.c | 12 +- fs/fuse/dev.c | 98 +++++--- fs/fuse/dir.c | 49 +++- fs/fuse/file.c | 362 ++++++++++++++++++++---------- fs/fuse/inode.c | 37 +++ fs/fuse/iomode.c | 276 +++++++++++++++++++++++ fs/fuse/passthrough.c | 355 +++++++++++++++++++++++++++++ fs/inode.c | 8 +- fs/namespace.c | 36 +-- fs/nfsd/vfs.c | 7 +- fs/open.c | 70 +++--- fs/overlayfs/copy_up.c | 168 +++++++++----- fs/overlayfs/dir.c | 60 +++-- fs/overlayfs/export.c | 7 +- fs/overlayfs/file.c | 230 ++++--------------- fs/overlayfs/inode.c | 57 +++-- fs/overlayfs/namei.c | 37 ++- fs/overlayfs/super.c | 47 ++-- fs/overlayfs/util.c | 75 ++++++- fs/proc/base.c | 2 +- fs/proc/nommu.c | 2 +- fs/proc/task_mmu.c | 4 +- fs/proc/task_nommu.c | 2 +- fs/read_write.c | 156 ++++++++----- fs/remap_range.c | 44 ++-- fs/splice.c | 82 ++++--- fs/super.c | 1 + kernel/acct.c | 4 +- kernel/trace/trace_output.c | 2 +- MAINTAINERS | 9 + fs/Kconfig | 4 + fs/Makefile | 1 + fs/fuse/Kconfig | 11 + fs/fuse/Makefile | 2 + fs/overlayfs/Kconfig | 1 + 50 files changed, 2290 insertions(+), 775 deletions(-) create mode 100644 include/linux/backing-file.h create mode 100644 fs/backing-file.c create mode 100644 fs/fuse/iomode.c create mode 100644 fs/fuse/passthrough.c -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 3e15dcf77b23b8e9b9b7f3c0d4def8fe9c12c534 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Before exporting these helpers to modules, make their names more meaningful. The names mnt_{get,put)_write_access*() were chosen, because they rhyme with the inode {get,put)_write_access() helpers, which have a very close meaning for the inode object. Suggested-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@bra... Signed-off-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230908132900.2983519-2-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/internal.h | 12 ++++++------ include/linux/mount.h | 4 ++-- fs/inode.c | 8 ++++---- fs/namespace.c | 34 +++++++++++++++++----------------- fs/open.c | 2 +- kernel/acct.c | 4 ++-- 6 files changed, 32 insertions(+), 32 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index d64ae03998cc..8260c738980c 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -73,8 +73,8 @@ extern int sb_prepare_remount_readonly(struct super_block *); extern void __init mnt_init(void); -extern int __mnt_want_write_file(struct file *); -extern void __mnt_drop_write_file(struct file *); +int mnt_get_write_access_file(struct file *file); +void mnt_put_write_access_file(struct file *file); extern void dissolve_on_fput(struct vfsmount *); extern bool may_mount(void); @@ -101,7 +101,7 @@ static inline void put_file_access(struct file *file) i_readcount_dec(file->f_inode); } else if (file->f_mode & FMODE_WRITER) { put_write_access(file->f_inode); - __mnt_drop_write(file->f_path.mnt); + mnt_put_write_access(file->f_path.mnt); } } @@ -130,9 +130,9 @@ static inline void sb_start_ro_state_change(struct super_block *sb) * mnt_is_readonly() making sure if mnt_is_readonly() sees SB_RDONLY * cleared, it will see s_readonly_remount set. * For RW->RO transition, the barrier pairs with the barrier in - * __mnt_want_write() before the mnt_is_readonly() check. The barrier - * makes sure if __mnt_want_write() sees MNT_WRITE_HOLD already - * cleared, it will see s_readonly_remount set. + * mnt_get_write_access() before the mnt_is_readonly() check. + * The barrier makes sure if mnt_get_write_access() sees MNT_WRITE_HOLD + * already cleared, it will see s_readonly_remount set. */ smp_wmb(); } diff --git a/include/linux/mount.h b/include/linux/mount.h index 4f40b40306d0..ac3dd2876197 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -92,8 +92,8 @@ extern bool __mnt_is_readonly(struct vfsmount *mnt); extern bool mnt_may_suid(struct vfsmount *mnt); extern struct vfsmount *clone_private_mount(const struct path *path); -extern int __mnt_want_write(struct vfsmount *); -extern void __mnt_drop_write(struct vfsmount *); +int mnt_get_write_access(struct vfsmount *mnt); +void mnt_put_write_access(struct vfsmount *mnt); extern struct vfsmount *fc_mount(struct fs_context *fc); extern struct vfsmount *vfs_create_mount(struct fs_context *fc); diff --git a/fs/inode.c b/fs/inode.c index 46a31aba7933..ad7445342ee9 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2052,7 +2052,7 @@ void touch_atime(const struct path *path) if (!sb_start_write_trylock(inode->i_sb)) return; - if (__mnt_want_write(mnt) != 0) + if (mnt_get_write_access(mnt) != 0) goto skip_update; /* * File systems can error out when updating inodes if they need to @@ -2064,7 +2064,7 @@ void touch_atime(const struct path *path) * of the fs read only, e.g. subvolumes in Btrfs. */ inode_update_time(inode, S_ATIME); - __mnt_drop_write(mnt); + mnt_put_write_access(mnt); skip_update: sb_end_write(inode->i_sb); } @@ -2177,9 +2177,9 @@ static int __file_update_time(struct file *file, int sync_mode) struct inode *inode = file_inode(file); /* try to update time settings */ - if (!__mnt_want_write_file(file)) { + if (!mnt_get_write_access_file(file)) { ret = inode_update_time(inode, sync_mode); - __mnt_drop_write_file(file); + mnt_put_write_access_file(file); } return ret; diff --git a/fs/namespace.c b/fs/namespace.c index 190fbb026ff0..815adb5d1490 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -320,16 +320,16 @@ static int mnt_is_readonly(struct vfsmount *mnt) * can determine when writes are able to occur to a filesystem. */ /** - * __mnt_want_write - get write access to a mount without freeze protection + * mnt_get_write_access - get write access to a mount without freeze protection * @m: the mount on which to take a write * * This tells the low-level filesystem that a write is about to be performed to * it, and makes sure that writes are allowed (mnt it read-write) before * returning success. This operation does not protect against filesystem being - * frozen. When the write operation is finished, __mnt_drop_write() must be + * frozen. When the write operation is finished, mnt_put_write_access() must be * called. This is effectively a refcount. */ -int __mnt_want_write(struct vfsmount *m) +int mnt_get_write_access(struct vfsmount *m) { struct mount *mnt = real_mount(m); int ret = 0; @@ -391,7 +391,7 @@ int mnt_want_write(struct vfsmount *m) int ret; sb_start_write(m->mnt_sb); - ret = __mnt_want_write(m); + ret = mnt_get_write_access(m); if (ret) sb_end_write(m->mnt_sb); return ret; @@ -399,15 +399,15 @@ int mnt_want_write(struct vfsmount *m) EXPORT_SYMBOL_GPL(mnt_want_write); /** - * __mnt_want_write_file - get write access to a file's mount + * mnt_get_write_access_file - get write access to a file's mount * @file: the file who's mount on which to take a write * - * This is like __mnt_want_write, but if the file is already open for writing it + * This is like mnt_get_write_access, but if @file is already open for write it * skips incrementing mnt_writers (since the open file already has a reference) * and instead only does the check for emergency r/o remounts. This must be - * paired with __mnt_drop_write_file. + * paired with mnt_put_write_access_file. */ -int __mnt_want_write_file(struct file *file) +int mnt_get_write_access_file(struct file *file) { if (file->f_mode & FMODE_WRITER) { /* @@ -418,7 +418,7 @@ int __mnt_want_write_file(struct file *file) return -EROFS; return 0; } - return __mnt_want_write(file->f_path.mnt); + return mnt_get_write_access(file->f_path.mnt); } /** @@ -435,7 +435,7 @@ int mnt_want_write_file(struct file *file) int ret; sb_start_write(file_inode(file)->i_sb); - ret = __mnt_want_write_file(file); + ret = mnt_get_write_access_file(file); if (ret) sb_end_write(file_inode(file)->i_sb); return ret; @@ -443,14 +443,14 @@ int mnt_want_write_file(struct file *file) EXPORT_SYMBOL_GPL(mnt_want_write_file); /** - * __mnt_drop_write - give up write access to a mount + * mnt_put_write_access - give up write access to a mount * @mnt: the mount on which to give up write access * * Tells the low-level filesystem that we are done * performing writes to it. Must be matched with - * __mnt_want_write() call above. + * mnt_get_write_access() call above. */ -void __mnt_drop_write(struct vfsmount *mnt) +void mnt_put_write_access(struct vfsmount *mnt) { preempt_disable(); mnt_dec_writers(real_mount(mnt)); @@ -467,20 +467,20 @@ void __mnt_drop_write(struct vfsmount *mnt) */ void mnt_drop_write(struct vfsmount *mnt) { - __mnt_drop_write(mnt); + mnt_put_write_access(mnt); sb_end_write(mnt->mnt_sb); } EXPORT_SYMBOL_GPL(mnt_drop_write); -void __mnt_drop_write_file(struct file *file) +void mnt_put_write_access_file(struct file *file) { if (!(file->f_mode & FMODE_WRITER)) - __mnt_drop_write(file->f_path.mnt); + mnt_put_write_access(file->f_path.mnt); } void mnt_drop_write_file(struct file *file) { - __mnt_drop_write_file(file); + mnt_put_write_access_file(file); sb_end_write(file_inode(file)->i_sb); } EXPORT_SYMBOL(mnt_drop_write_file); diff --git a/fs/open.c b/fs/open.c index f9ac703ec1b2..575cc1406709 100644 --- a/fs/open.c +++ b/fs/open.c @@ -895,7 +895,7 @@ static int do_dentry_open(struct file *f, error = get_write_access(inode); if (unlikely(error)) goto cleanup_file; - error = __mnt_want_write(f->f_path.mnt); + error = mnt_get_write_access(f->f_path.mnt); if (unlikely(error)) { put_write_access(inode); goto cleanup_file; diff --git a/kernel/acct.c b/kernel/acct.c index a58a284f1f38..84e151e8a65e 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -269,7 +269,7 @@ static int acct_on(struct filename *pathname) filp_close(file, NULL); return PTR_ERR(internal); } - err = __mnt_want_write(internal); + err = mnt_get_write_access(internal); if (err) { mntput(internal); kfree(acct); @@ -294,7 +294,7 @@ static int acct_on(struct filename *pathname) old = xchg(&ns->bacct, &acct->pin); mutex_unlock(&acct->lock); pin_kill(old); - __mnt_drop_write(mnt); + mnt_put_write_access(mnt); mntput(mnt); return 0; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit ddf9e2ff67a910acde1d000e76b7e31267599539 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Overlayfs is going to use those to get write access on the upper mount during entire copy up without taking freeze protection on upper sb for the entire copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230908132900.2983519-3-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/namespace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index 815adb5d1490..012c3c4461e7 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -376,6 +376,7 @@ int mnt_get_write_access(struct vfsmount *m) return ret; } +EXPORT_SYMBOL_GPL(mnt_get_write_access); /** * mnt_want_write - get write access to a mount @@ -456,6 +457,7 @@ void mnt_put_write_access(struct vfsmount *mnt) mnt_dec_writers(real_mount(mnt)); preempt_enable(); } +EXPORT_SYMBOL_GPL(mnt_put_write_access); /** * mnt_drop_write - give up write access to a mount -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.9-rc1 commit 699cf8246ee4c2c524f18c2e395909d16e7fda1b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This makes the code a bit easier to read and allows to more easily add more conditions when an exclusive lock is needed. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/file.c | 63 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 45 insertions(+), 18 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index fca2be898336..bfdfe5d2c484 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1305,6 +1305,47 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii) return res; } +static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter) +{ + struct inode *inode = file_inode(iocb->ki_filp); + + return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode); +} + +/* + * @return true if an exclusive lock for direct IO writes is needed + */ +static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct fuse_file *ff = file->private_data; + struct inode *inode = file_inode(iocb->ki_filp); + + /* Server side has to advise that it supports parallel dio writes. */ + if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)) + return true; + + /* + * Append will need to know the eventual EOF - always needs an + * exclusive lock. + */ + if (iocb->ki_flags & IOCB_APPEND) + return true; + + /* + * Combination of page access and direct-io is difficult, shared locks + * actually introduce a conflict. + */ + if (get_fuse_conn(inode)->direct_io_allow_mmap) + return true; + + /* Parallel dio beyond EOF is not supported, at least for now. */ + if (fuse_io_past_eof(iocb, from)) + return true; + + return false; +} + static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; @@ -1581,26 +1622,12 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to) return res; } -static bool fuse_direct_write_extending_i_size(struct kiocb *iocb, - struct iov_iter *iter) -{ - struct inode *inode = file_inode(iocb->ki_filp); - - return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode); -} - static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); - struct file *file = iocb->ki_filp; - struct fuse_file *ff = file->private_data; struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb); ssize_t res; - bool exclusive_lock = - !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) || - get_fuse_conn(inode)->direct_io_allow_mmap || - iocb->ki_flags & IOCB_APPEND || - fuse_direct_write_extending_i_size(iocb, from); + bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from); /* * Take exclusive lock if @@ -1614,10 +1641,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) else { inode_lock_shared(inode); - /* A race with truncate might have come up as the decision for - * the lock type was done without holding the lock, check again. + /* + * Previous check was without any lock and might have raced. */ - if (fuse_direct_write_extending_i_size(iocb, from)) { + if (fuse_io_past_eof(iocb, from)) { inode_unlock_shared(inode); inode_lock(inode); exclusive_lock = true; -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.9-rc1 commit 9bbb6717dfd286a2861ca33273f4d7c3e65423b0 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- So far this is just a helper to remove complex locking logic out of fuse_direct_write_iter. Especially needed by the next patch in the series to that adds the fuse inode cache IO mode and adds in even more locking complexity. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/file.c | 61 ++++++++++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index bfdfe5d2c484..0fa83bdea673 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1346,6 +1346,37 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from return false; } +static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, + bool *exclusive) +{ + struct inode *inode = file_inode(iocb->ki_filp); + + *exclusive = fuse_dio_wr_exclusive_lock(iocb, from); + if (*exclusive) { + inode_lock(inode); + } else { + inode_lock_shared(inode); + /* + * Previous check was without inode lock and might have raced, + * check again. + */ + if (fuse_io_past_eof(iocb, from)) { + inode_unlock_shared(inode); + inode_lock(inode); + *exclusive = true; + } + } +} + +static void fuse_dio_unlock(struct inode *inode, bool exclusive) +{ + if (exclusive) { + inode_unlock(inode); + } else { + inode_unlock_shared(inode); + } +} + static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; @@ -1627,30 +1658,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb); ssize_t res; - bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from); - - /* - * Take exclusive lock if - * - Parallel direct writes are disabled - a user space decision - * - Parallel direct writes are enabled and i_size is being extended. - * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP). - * This might not be needed at all, but needs further investigation. - */ - if (exclusive_lock) - inode_lock(inode); - else { - inode_lock_shared(inode); - - /* - * Previous check was without any lock and might have raced. - */ - if (fuse_io_past_eof(iocb, from)) { - inode_unlock_shared(inode); - inode_lock(inode); - exclusive_lock = true; - } - } + bool exclusive; + fuse_dio_lock(iocb, from, &exclusive); res = generic_write_checks(iocb, from); if (res > 0) { if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) { @@ -1661,10 +1671,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) fuse_write_update_attr(inode, iocb->ki_pos, res); } } - if (exclusive_lock) - inode_unlock(inode); - else - inode_unlock_shared(inode); + fuse_dio_unlock(inode, exclusive); return res; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 0c9d708953d02f74cea05a01cf3e2c8f5a9fbaf4 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- fuse_finish_open() is called from fuse_open_common() and from fuse_create_open(). In the latter case, the O_TRUNC flag is always cleared in finish_open()m before calling into fuse_finish_open(). Move the bits that update attribute cache post O_TRUNC open into a helper and call this helper from fuse_open_common() directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/file.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 0fa83bdea673..5a9d881d89f3 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -204,30 +204,31 @@ void fuse_finish_open(struct inode *inode, struct file *file) else if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); - if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { - struct fuse_inode *fi = get_fuse_inode(inode); - - spin_lock(&fi->lock); - fi->attr_version = atomic64_inc_return(&fc->attr_version); - i_size_write(inode, 0); - spin_unlock(&fi->lock); - file_update_time(file); - fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); - } if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) fuse_link_write_file(file); } +static void fuse_truncate_update_attr(struct inode *inode, struct file *file) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fi->lock); + fi->attr_version = atomic64_inc_return(&fc->attr_version); + i_size_write(inode, 0); + spin_unlock(&fi->lock); + file_update_time(file); + fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); +} + int fuse_open_common(struct inode *inode, struct file *file, bool isdir) { struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_conn *fc = fm->fc; int err; - bool is_wb_truncate = (file->f_flags & O_TRUNC) && - fc->atomic_o_trunc && - fc->writeback_cache; - bool dax_truncate = (file->f_flags & O_TRUNC) && - fc->atomic_o_trunc && FUSE_IS_DAX(inode); + bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc; + bool is_wb_truncate = is_truncate && fc->writeback_cache; + bool dax_truncate = is_truncate && FUSE_IS_DAX(inode); if (fuse_is_bad(inode)) return -EIO; @@ -250,15 +251,18 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) fuse_set_nowrite(inode); err = fuse_do_open(fm, get_node_id(inode), file, isdir); - if (!err) + if (!err) { fuse_finish_open(inode, file); + if (is_truncate) + fuse_truncate_update_attr(inode, file); + } if (is_wb_truncate || dax_truncate) fuse_release_nowrite(inode); if (!err) { struct fuse_file *ff = file->private_data; - if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) + if (is_truncate) truncate_pagecache(inode, 0); else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) invalidate_inode_pages2(inode->i_mapping); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit e26ee4efbc79610b20e7abe9d96c87f33dacc1ff category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This removed the need to pass isdir argument to fuse_put_file(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 2 +- fs/fuse/dir.c | 2 +- fs/fuse/file.c | 69 +++++++++++++++++++++++++++--------------------- 3 files changed, 41 insertions(+), 32 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 1bb136bcbe9e..cb3350660d7a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1052,7 +1052,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, */ int fuse_open_common(struct inode *inode, struct file *file, bool isdir); -struct fuse_file *fuse_file_alloc(struct fuse_mount *fm); +struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release); void fuse_file_free(struct fuse_file *ff); void fuse_finish_open(struct inode *inode, struct file *file); diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 95f9913a3537..b08eb62639dc 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -634,7 +634,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, goto out_err; err = -ENOMEM; - ff = fuse_file_alloc(fm); + ff = fuse_file_alloc(fm, true); if (!ff) goto out_put_forget_req; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 5a9d881d89f3..aecaae0f74cd 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -55,7 +55,7 @@ struct fuse_release_args { struct inode *inode; }; -struct fuse_file *fuse_file_alloc(struct fuse_mount *fm) +struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release) { struct fuse_file *ff; @@ -64,11 +64,13 @@ struct fuse_file *fuse_file_alloc(struct fuse_mount *fm) return NULL; ff->fm = fm; - ff->release_args = kzalloc(sizeof(*ff->release_args), - GFP_KERNEL_ACCOUNT); - if (!ff->release_args) { - kfree(ff); - return NULL; + if (release) { + ff->release_args = kzalloc(sizeof(*ff->release_args), + GFP_KERNEL_ACCOUNT); + if (!ff->release_args) { + kfree(ff); + return NULL; + } } INIT_LIST_HEAD(&ff->write_entry); @@ -104,14 +106,14 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args, kfree(ra); } -static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir) +static void fuse_file_put(struct fuse_file *ff, bool sync) { if (refcount_dec_and_test(&ff->count)) { - struct fuse_args *args = &ff->release_args->args; + struct fuse_release_args *ra = ff->release_args; + struct fuse_args *args = (ra ? &ra->args : NULL); - if (isdir ? ff->fm->fc->no_opendir : ff->fm->fc->no_open) { - /* Do nothing when client does not implement 'open' */ - fuse_release_end(ff->fm, args, 0); + if (!args) { + /* Do nothing when server does not implement 'open' */ } else if (sync) { fuse_simple_request(ff->fm, args); fuse_release_end(ff->fm, args, 0); @@ -131,15 +133,16 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, struct fuse_conn *fc = fm->fc; struct fuse_file *ff; int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN; + bool open = isdir ? !fc->no_opendir : !fc->no_open; - ff = fuse_file_alloc(fm); + ff = fuse_file_alloc(fm, open); if (!ff) return ERR_PTR(-ENOMEM); ff->fh = 0; /* Default for no-open */ ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0); - if (isdir ? !fc->no_opendir : !fc->no_open) { + if (open) { struct fuse_open_out outarg; int err; @@ -147,11 +150,13 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, if (!err) { ff->fh = outarg.fh; ff->open_flags = outarg.open_flags; - } else if (err != -ENOSYS) { fuse_file_free(ff); return ERR_PTR(err); } else { + /* No release needed */ + kfree(ff->release_args); + ff->release_args = NULL; if (isdir) fc->no_opendir = 1; else @@ -277,7 +282,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) } static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, - unsigned int flags, int opcode) + unsigned int flags, int opcode, bool sync) { struct fuse_conn *fc = ff->fm->fc; struct fuse_release_args *ra = ff->release_args; @@ -295,6 +300,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, wake_up_interruptible_all(&ff->poll_wait); + if (!ra) + return; + ra->inarg.fh = ff->fh; ra->inarg.flags = flags; ra->args.in_numargs = 1; @@ -304,6 +312,13 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, ra->args.nodeid = ff->nodeid; ra->args.force = true; ra->args.nocreds = true; + + /* + * Hold inode until release is finished. + * From fuse_sync_release() the refcount is 1 and everything's + * synchronous, so we are fine with not doing igrab() here. + */ + ra->inode = sync ? NULL : igrab(&fi->inode); } void fuse_file_release(struct inode *inode, struct fuse_file *ff, @@ -313,14 +328,12 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, struct fuse_release_args *ra = ff->release_args; int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE; - fuse_prepare_release(fi, ff, open_flags, opcode); + fuse_prepare_release(fi, ff, open_flags, opcode, false); - if (ff->flock) { + if (ra && ff->flock) { ra->inarg.release_flags |= FUSE_RELEASE_FLOCK_UNLOCK; ra->inarg.lock_owner = fuse_lock_owner_id(ff->fm->fc, id); } - /* Hold inode until release is finished */ - ra->inode = igrab(inode); /* * Normally this will send the RELEASE request, however if @@ -331,7 +344,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, * synchronous RELEASE is allowed (and desirable) in this case * because the server can be trusted not to screw up. */ - fuse_file_put(ff, ff->fm->fc->destroy, isdir); + fuse_file_put(ff, ff->fm->fc->destroy); } void fuse_release_common(struct file *file, bool isdir) @@ -366,12 +379,8 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff, unsigned int flags) { WARN_ON(refcount_read(&ff->count) > 1); - fuse_prepare_release(fi, ff, flags, FUSE_RELEASE); - /* - * iput(NULL) is a no-op and since the refcount is 1 and everything's - * synchronous, we are fine with not doing igrab() here" - */ - fuse_file_put(ff, true, false); + fuse_prepare_release(fi, ff, flags, FUSE_RELEASE, true); + fuse_file_put(ff, true); } EXPORT_SYMBOL_GPL(fuse_sync_release); @@ -935,7 +944,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, put_page(page); } if (ia->ff) - fuse_file_put(ia->ff, false, false); + fuse_file_put(ia->ff, false); fuse_io_free(ia); } @@ -1728,7 +1737,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) __free_page(ap->pages[i]); if (wpa->ia.ff) - fuse_file_put(wpa->ia.ff, false, false); + fuse_file_put(wpa->ia.ff, false); kfree(ap->pages); kfree(wpa); @@ -1976,7 +1985,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc) ff = __fuse_write_file_get(fi); err = fuse_flush_times(inode, ff); if (ff) - fuse_file_put(ff, false, false); + fuse_file_put(ff, false); return err; } @@ -2374,7 +2383,7 @@ static int fuse_writepages(struct address_space *mapping, fuse_writepages_send(&data); } if (data.ff) - fuse_file_put(data.ff, false, false); + fuse_file_put(data.ff, false); kfree(data.orig_pages); out: -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 7de64d521bf92396b7da8ae0600188ea5d75a4c9 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- fuse_open_common() has a lot of code relevant only for regular files and O_TRUNC in particular. Copy the little bit of remaining code into fuse_dir_open() and stop using this common helper for directory open. Also split out fuse_dir_finish_open() from fuse_finish_open() before we add inode io modes to fuse_finish_open(). Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 5 ----- fs/fuse/dir.c | 25 ++++++++++++++++++++++++- fs/fuse/file.c | 9 ++------- 3 files changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index cb3350660d7a..639fb0aadbd9 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1047,11 +1047,6 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, size_t count, int opcode); -/** - * Send OPEN or OPENDIR request - */ -int fuse_open_common(struct inode *inode, struct file *file, bool isdir); - struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release); void fuse_file_free(struct fuse_file *ff); void fuse_finish_open(struct inode *inode, struct file *file); diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index b08eb62639dc..5244bc200b74 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1635,7 +1635,30 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode, static int fuse_dir_open(struct inode *inode, struct file *file) { - return fuse_open_common(inode, file, true); + struct fuse_mount *fm = get_fuse_mount(inode); + int err; + + if (fuse_is_bad(inode)) + return -EIO; + + err = generic_file_open(inode, file); + if (err) + return err; + + err = fuse_do_open(fm, get_node_id(inode), file, true); + if (!err) { + struct fuse_file *ff = file->private_data; + + /* + * Keep handling FOPEN_STREAM and FOPEN_NONSEEKABLE for + * directories for backward compatibility, though it's unlikely + * to be useful. + */ + if (ff->open_flags & (FOPEN_STREAM | FOPEN_NONSEEKABLE)) + nonseekable_open(inode, file); + } + + return err; } static int fuse_dir_release(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index aecaae0f74cd..bf8c6a55c0b6 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -226,7 +226,7 @@ static void fuse_truncate_update_attr(struct inode *inode, struct file *file) fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); } -int fuse_open_common(struct inode *inode, struct file *file, bool isdir) +static int fuse_open(struct inode *inode, struct file *file) { struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_conn *fc = fm->fc; @@ -255,7 +255,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) if (is_wb_truncate || dax_truncate) fuse_set_nowrite(inode); - err = fuse_do_open(fm, get_node_id(inode), file, isdir); + err = fuse_do_open(fm, get_node_id(inode), file, false); if (!err) { fuse_finish_open(inode, file); if (is_truncate) @@ -353,11 +353,6 @@ void fuse_release_common(struct file *file, bool isdir) (fl_owner_t) file, isdir); } -static int fuse_open(struct inode *inode, struct file *file) -{ - return fuse_open_common(inode, file, false); -} - static int fuse_release(struct inode *inode, struct file *file) { struct fuse_conn *fc = get_fuse_conn(inode); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit d2c487f150ae00e3cb9faf57aceacc584e0a130c category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In preparation for inode io modes, a server open response could fail due to conflicting inode io modes. Allow returning an error from fuse_finish_open() and handle the error in the callers. fuse_finish_open() is used as the callback of finish_open(), so that FMODE_OPENED will not be set if fuse_finish_open() fails. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 2 +- fs/fuse/dir.c | 8 +++++--- fs/fuse/file.c | 15 ++++++++++----- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 639fb0aadbd9..e8389d8a53ad 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1049,7 +1049,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release); void fuse_file_free(struct fuse_file *ff); -void fuse_finish_open(struct inode *inode, struct file *file); +int fuse_finish_open(struct inode *inode, struct file *file); void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff, unsigned int flags); diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 5244bc200b74..ca865e7c4b55 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -696,13 +696,15 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, d_instantiate(entry, inode); fuse_change_entry_timeout(entry, &outentry); fuse_dir_changed(dir); - err = finish_open(file, entry, generic_file_open); + err = generic_file_open(inode, file); + if (!err) { + file->private_data = ff; + err = finish_open(file, entry, fuse_finish_open); + } if (err) { fi = get_fuse_inode(inode); fuse_sync_release(fi, ff, flags); } else { - file->private_data = ff; - fuse_finish_open(inode, file); if (fm->fc->atomic_o_trunc && trunc) truncate_pagecache(inode, 0); else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index bf8c6a55c0b6..bd05637c9de3 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -199,7 +199,7 @@ static void fuse_link_write_file(struct file *file) spin_unlock(&fi->lock); } -void fuse_finish_open(struct inode *inode, struct file *file) +int fuse_finish_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); @@ -211,6 +211,8 @@ void fuse_finish_open(struct inode *inode, struct file *file) if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) fuse_link_write_file(file); + + return 0; } static void fuse_truncate_update_attr(struct inode *inode, struct file *file) @@ -229,7 +231,9 @@ static void fuse_truncate_update_attr(struct inode *inode, struct file *file) static int fuse_open(struct inode *inode, struct file *file) { struct fuse_mount *fm = get_fuse_mount(inode); + struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = fm->fc; + struct fuse_file *ff; int err; bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc; bool is_wb_truncate = is_truncate && fc->writeback_cache; @@ -257,16 +261,17 @@ static int fuse_open(struct inode *inode, struct file *file) err = fuse_do_open(fm, get_node_id(inode), file, false); if (!err) { - fuse_finish_open(inode, file); - if (is_truncate) + ff = file->private_data; + err = fuse_finish_open(inode, file); + if (err) + fuse_sync_release(fi, ff, file->f_flags); + else if (is_truncate) fuse_truncate_update_attr(inode, file); } if (is_wb_truncate || dax_truncate) fuse_release_nowrite(inode); if (!err) { - struct fuse_file *ff = file->private_data; - if (is_truncate) truncate_pagecache(inode, 0); else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit cb098dd24bab8a315aa00bab1ccddb6be872156d category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The fuse inode io mode is determined by the mode of its open files/mmaps and parallel dio opens and expressed in the value of fi->iocachectr:
0 - caching io: files open in caching mode or mmap on direct_io file < 0 - parallel dio: direct io mode with parallel dio writes enabled == 0 - direct io: no files open in caching mode and no files mmaped
Note that iocachectr value of 0 might become positive or negative, while non-parallel dio is getting processed. direct_io mmap uses page cache, so first mmap will mark the file as ff->io_opened and increment fi->iocachectr to enter the caching io mode. If the server opens the file in caching mode while it is already open for parallel dio or vice versa the open fails. This allows executing parallel dio when inode is not in caching mode and no mmaps have been performed on the inode in question. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 17 ++++- fs/fuse/file.c | 15 +++++ fs/fuse/iomode.c | 158 +++++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/Makefile | 1 + 4 files changed, 189 insertions(+), 2 deletions(-) create mode 100644 fs/fuse/iomode.c diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e8389d8a53ad..66cc5deece50 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -112,7 +112,7 @@ struct fuse_inode { u64 attr_version; union { - /* Write related fields (regular file only) */ + /* read/write io cache (regular file only) */ struct { /* Files usable in writepage. Protected by fi->lock */ struct list_head write_files; @@ -124,6 +124,9 @@ struct fuse_inode { * (FUSE_NOWRITE) means more writes are blocked */ int writectr; + /** Number of files/maps using page cache */ + int iocachectr; + /* Waitq for writepage completion */ wait_queue_head_t page_waitq; @@ -190,6 +193,8 @@ enum { FUSE_I_BAD, /* Has btime */ FUSE_I_BTIME, + /* Wants or already has page cache IO */ + FUSE_I_CACHE_IO_MODE, }; struct fuse_conn; @@ -247,6 +252,9 @@ struct fuse_file { /** Wait queue head for poll */ wait_queue_head_t poll_wait; + /** Does file hold a fi->iocachectr refcount? */ + enum { IOM_NONE, IOM_CACHED, IOM_UNCACHED } iomode; + /** Has flock been performed on this file? */ bool flock:1; }; @@ -1359,8 +1367,13 @@ int fuse_fileattr_get(struct dentry *dentry, struct fileattr *fa); int fuse_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); -/* file.c */ +/* iomode.c */ +int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff); +int fuse_file_io_open(struct file *file, struct inode *inode); +void fuse_file_io_release(struct fuse_file *ff, struct inode *inode); + +/* file.c */ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, unsigned int open_flags, bool isdir); void fuse_file_release(struct inode *inode, struct fuse_file *ff, diff --git a/fs/fuse/file.c b/fs/fuse/file.c index bd05637c9de3..84c960f1abcd 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -112,6 +112,9 @@ static void fuse_file_put(struct fuse_file *ff, bool sync) struct fuse_release_args *ra = ff->release_args; struct fuse_args *args = (ra ? &ra->args : NULL); + if (ra && ra->inode) + fuse_file_io_release(ff, ra->inode); + if (!args) { /* Do nothing when server does not implement 'open' */ } else if (sync) { @@ -203,6 +206,11 @@ int fuse_finish_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); + int err; + + err = fuse_file_io_open(file, inode); + if (err) + return err; if (ff->open_flags & FOPEN_STREAM) stream_open(inode, file); @@ -2538,6 +2546,7 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fm->fc; + int rc; /* DAX mmap is superior to direct_io mmap */ if (FUSE_IS_DAX(file_inode(file))) @@ -2557,6 +2566,11 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) /* MAP_PRIVATE */ return generic_file_mmap(file, vma); } + + /* First mmap of direct_io file enters caching inode io mode. */ + rc = fuse_file_cached_io_start(file_inode(file), ff); + if (rc) + return rc; } if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE)) @@ -3325,6 +3339,7 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); fi->writectr = 0; + fi->iocachectr = 0; init_waitqueue_head(&fi->page_waitq); fi->writepages = RB_ROOT; diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c new file mode 100644 index 000000000000..a1a836b2aacc --- /dev/null +++ b/fs/fuse/iomode.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * FUSE inode io modes. + * + * Copyright (c) 2024 CTERA Networks. + */ + +#include "fuse_i.h" + +#include <linux/kernel.h> +#include <linux/sched.h> +#include <linux/file.h> +#include <linux/fs.h> + +/* + * Start cached io mode, where parallel dio writes are not allowed. + */ +int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + int err = 0; + + /* There are no io modes if server does not implement open */ + if (!ff->release_args) + return 0; + + spin_lock(&fi->lock); + if (fi->iocachectr < 0) { + err = -ETXTBSY; + goto unlock; + } + WARN_ON(ff->iomode == IOM_UNCACHED); + if (ff->iomode == IOM_NONE) { + ff->iomode = IOM_CACHED; + if (fi->iocachectr == 0) + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + fi->iocachectr++; + } +unlock: + spin_unlock(&fi->lock); + return err; +} + +static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fi->lock); + WARN_ON(fi->iocachectr <= 0); + WARN_ON(ff->iomode != IOM_CACHED); + ff->iomode = IOM_NONE; + fi->iocachectr--; + if (fi->iocachectr == 0) + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + spin_unlock(&fi->lock); +} + +/* Start strictly uncached io mode where cache access is not allowed */ +static int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + int err = 0; + + spin_lock(&fi->lock); + if (fi->iocachectr > 0) { + err = -ETXTBSY; + goto unlock; + } + WARN_ON(ff->iomode != IOM_NONE); + fi->iocachectr--; + ff->iomode = IOM_UNCACHED; +unlock: + spin_unlock(&fi->lock); + return err; +} + +static void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fi->lock); + WARN_ON(fi->iocachectr >= 0); + WARN_ON(ff->iomode != IOM_UNCACHED); + ff->iomode = IOM_NONE; + fi->iocachectr++; + spin_unlock(&fi->lock); +} + +/* Request access to submit new io to inode via open file */ +int fuse_file_io_open(struct file *file, struct inode *inode) +{ + struct fuse_file *ff = file->private_data; + int err; + + /* + * io modes are not relevant with DAX and with server that does not + * implement open. + */ + if (FUSE_IS_DAX(inode) || !ff->release_args) + return 0; + + /* + * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO. + */ + if (!(ff->open_flags & FOPEN_DIRECT_IO)) + ff->open_flags &= ~FOPEN_PARALLEL_DIRECT_WRITES; + + /* + * First parallel dio open denies caching inode io mode. + * First caching file open enters caching inode io mode. + * + * Note that if user opens a file open with O_DIRECT, but server did + * not specify FOPEN_DIRECT_IO, a later fcntl() could remove O_DIRECT, + * so we put the inode in caching mode to prevent parallel dio. + */ + if (ff->open_flags & FOPEN_DIRECT_IO) { + if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) + err = fuse_file_uncached_io_start(inode, ff); + else + return 0; + } else { + err = fuse_file_cached_io_start(inode, ff); + } + if (err) + goto fail; + + return 0; + +fail: + pr_debug("failed to open file in requested io mode (open_flags=0x%x, err=%i).\n", + ff->open_flags, err); + /* + * The file open mode determines the inode io mode. + * Using incorrect open mode is a server mistake, which results in + * user visible failure of open() with EIO error. + */ + return -EIO; +} + +/* No more pending io and no new io possible to inode via open/mmapped file */ +void fuse_file_io_release(struct fuse_file *ff, struct inode *inode) +{ + /* + * Last parallel dio close allows caching inode io mode. + * Last caching file close exits caching inode io mode. + */ + switch (ff->iomode) { + case IOM_NONE: + /* Nothing to do */ + break; + case IOM_UNCACHED: + fuse_file_uncached_io_end(inode, ff); + break; + case IOM_CACHED: + fuse_file_cached_io_end(inode, ff); + break; + } +} diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index 0c48b35c058d..b734cc2a5e65 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o obj-$(CONFIG_VIRTIO_FS) += virtiofs.o fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o +fuse-y += iomode.o fuse-$(CONFIG_FUSE_DAX) += dax.o virtiofs-y := virtio_fs.o -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 205c1d8026835746d8597e1aa70c370e014e83fa category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Instead of denying caching mode on parallel dio open, deny caching open only while parallel dio are in-progress and wait for in-progress parallel dio writes before entering inode caching io mode. This allows executing parallel dio when inode is not in caching mode even if shared mmap is allowed, but no mmaps have been performed on the inode in question. An mmap on direct_io file now waits for all in-progress parallel dio writes to complete, so parallel dio writes together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 5 +++++ fs/fuse/file.c | 41 +++++++++++++++++++++++++++++------------ fs/fuse/iomode.c | 48 ++++++++++++++++++++++++++++++------------------ 3 files changed, 64 insertions(+), 30 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 66cc5deece50..5cd7bd9b9c29 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -130,6 +130,9 @@ struct fuse_inode { /* Waitq for writepage completion */ wait_queue_head_t page_waitq; + /* waitq for direct-io completion */ + wait_queue_head_t direct_io_waitq; + /* List of writepage requestst (pending or sent) */ struct rb_root writepages; }; @@ -1369,6 +1372,8 @@ int fuse_fileattr_set(struct mnt_idmap *idmap, /* iomode.c */ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff); +int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff); +void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff); int fuse_file_io_open(struct file *file, struct inode *inode); void fuse_file_io_release(struct fuse_file *ff, struct inode *inode); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 84c960f1abcd..3a1751d7f78e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1341,6 +1341,7 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; struct inode *inode = file_inode(iocb->ki_filp); + struct fuse_inode *fi = get_fuse_inode(inode); /* Server side has to advise that it supports parallel dio writes. */ if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)) @@ -1353,12 +1354,9 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from if (iocb->ki_flags & IOCB_APPEND) return true; - /* - * Combination of page access and direct-io is difficult, shared locks - * actually introduce a conflict. - */ - if (get_fuse_conn(inode)->direct_io_allow_mmap) - return true; + /* shared locks are not allowed with parallel page cache IO */ + if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state)) + return false; /* Parallel dio beyond EOF is not supported, at least for now. */ if (fuse_io_past_eof(iocb, from)) @@ -1371,6 +1369,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, bool *exclusive) { struct inode *inode = file_inode(iocb->ki_filp); + struct fuse_file *ff = iocb->ki_filp->private_data; *exclusive = fuse_dio_wr_exclusive_lock(iocb, from); if (*exclusive) { @@ -1378,10 +1377,14 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, } else { inode_lock_shared(inode); /* - * Previous check was without inode lock and might have raced, - * check again. + * New parallal dio allowed only if inode is not in caching + * mode and denies new opens in caching mode. This check + * should be performed only after taking shared inode lock. + * Previous past eof check was without inode lock and might + * have raced, so check it again. */ - if (fuse_io_past_eof(iocb, from)) { + if (fuse_io_past_eof(iocb, from) || + fuse_file_uncached_io_start(inode, ff) != 0) { inode_unlock_shared(inode); inode_lock(inode); *exclusive = true; @@ -1389,11 +1392,16 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, } } -static void fuse_dio_unlock(struct inode *inode, bool exclusive) +static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) { + struct inode *inode = file_inode(iocb->ki_filp); + struct fuse_file *ff = iocb->ki_filp->private_data; + if (exclusive) { inode_unlock(inode); } else { + /* Allow opens in caching mode after last parallel dio end */ + fuse_file_uncached_io_end(inode, ff); inode_unlock_shared(inode); } } @@ -1692,7 +1700,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) fuse_write_update_attr(inode, iocb->ki_pos, res); } } - fuse_dio_unlock(inode, exclusive); + fuse_dio_unlock(iocb, exclusive); return res; } @@ -2552,6 +2560,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) if (FUSE_IS_DAX(file_inode(file))) return fuse_dax_mmap(file, vma); + /* + * FOPEN_DIRECT_IO handling is special compared to O_DIRECT, + * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP. + */ if (ff->open_flags & FOPEN_DIRECT_IO) { /* * Can't provide the coherency needed for MAP_SHARED @@ -2567,7 +2579,11 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } - /* First mmap of direct_io file enters caching inode io mode. */ + /* + * First mmap of direct_io file enters caching inode io mode. + * Also waits for parallel dio writers to go into serial mode + * (exclusive instead of shared lock). + */ rc = fuse_file_cached_io_start(file_inode(file), ff); if (rc) return rc; @@ -3341,6 +3357,7 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) fi->writectr = 0; fi->iocachectr = 0; init_waitqueue_head(&fi->page_waitq); + init_waitqueue_head(&fi->direct_io_waitq); fi->writepages = RB_ROOT; if (IS_ENABLED(CONFIG_FUSE_DAX)) diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index a1a836b2aacc..ea47c76b9df1 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -13,21 +13,37 @@ #include <linux/fs.h> /* - * Start cached io mode, where parallel dio writes are not allowed. + * Return true if need to wait for new opens in caching mode. + */ +static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi) +{ + return READ_ONCE(fi->iocachectr) < 0; +} + +/* + * Start cached io mode. + * + * Blocks new parallel dio writes and waits for the in-progress parallel dio + * writes to complete. */ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) { struct fuse_inode *fi = get_fuse_inode(inode); - int err = 0; /* There are no io modes if server does not implement open */ if (!ff->release_args) return 0; spin_lock(&fi->lock); - if (fi->iocachectr < 0) { - err = -ETXTBSY; - goto unlock; + /* + * Setting the bit advises new direct-io writes to use an exclusive + * lock - without it the wait below might be forever. + */ + while (fuse_is_io_cache_wait(fi)) { + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + spin_unlock(&fi->lock); + wait_event(fi->direct_io_waitq, !fuse_is_io_cache_wait(fi)); + spin_lock(&fi->lock); } WARN_ON(ff->iomode == IOM_UNCACHED); if (ff->iomode == IOM_NONE) { @@ -36,9 +52,8 @@ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); fi->iocachectr++; } -unlock: spin_unlock(&fi->lock); - return err; + return 0; } static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) @@ -56,7 +71,7 @@ static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) } /* Start strictly uncached io mode where cache access is not allowed */ -static int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) +int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) { struct fuse_inode *fi = get_fuse_inode(inode); int err = 0; @@ -74,7 +89,7 @@ static int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff return err; } -static void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) +void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -83,6 +98,8 @@ static void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) WARN_ON(ff->iomode != IOM_UNCACHED); ff->iomode = IOM_NONE; fi->iocachectr++; + if (!fi->iocachectr) + wake_up(&fi->direct_io_waitq); spin_unlock(&fi->lock); } @@ -106,21 +123,16 @@ int fuse_file_io_open(struct file *file, struct inode *inode) ff->open_flags &= ~FOPEN_PARALLEL_DIRECT_WRITES; /* - * First parallel dio open denies caching inode io mode. * First caching file open enters caching inode io mode. * * Note that if user opens a file open with O_DIRECT, but server did * not specify FOPEN_DIRECT_IO, a later fcntl() could remove O_DIRECT, * so we put the inode in caching mode to prevent parallel dio. */ - if (ff->open_flags & FOPEN_DIRECT_IO) { - if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) - err = fuse_file_uncached_io_start(inode, ff); - else - return 0; - } else { - err = fuse_file_cached_io_start(inode, ff); - } + if (ff->open_flags & FOPEN_DIRECT_IO) + return 0; + + err = fuse_file_cached_io_start(inode, ff); if (err) goto fail; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit db5b5e83eee46ec5e3d685282c9e4f38946cb0ea category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Overlayfs implements its own function to translate iocb flags into rw flags, so that they can be passed into another vfs call. With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags") Jens created a 1:1 matching between the iocb flags and rw flags, simplifying the conversion. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/file.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 8be4dc050d1e..1f4dcf3d8540 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -260,20 +260,12 @@ static void ovl_file_accessed(struct file *file) touch_atime(&file->f_path); } -static rwf_t ovl_iocb_to_rwf(int ifl) +#define OVL_IOCB_MASK \ + (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC) + +static rwf_t iocb_to_rw_flags(int flags) { - rwf_t flags = 0; - - if (ifl & IOCB_NOWAIT) - flags |= RWF_NOWAIT; - if (ifl & IOCB_HIPRI) - flags |= RWF_HIPRI; - if (ifl & IOCB_DSYNC) - flags |= RWF_DSYNC; - if (ifl & IOCB_SYNC) - flags |= RWF_SYNC; - - return flags; + return (__force rwf_t)(flags & OVL_IOCB_MASK); } static inline void ovl_aio_put(struct ovl_aio_req *aio_req) @@ -331,8 +323,9 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) old_cred = ovl_override_creds(file_inode(file)->i_sb); if (is_sync_kiocb(iocb)) { - ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, - ovl_iocb_to_rwf(iocb->ki_flags)); + rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags); + + ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf); } else { struct ovl_aio_req *aio_req; @@ -398,9 +391,10 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) old_cred = ovl_override_creds(file_inode(file)->i_sb); if (is_sync_kiocb(iocb)) { + rwf_t rwf = iocb_to_rw_flags(ifl); + file_start_write(real.file); - ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, - ovl_iocb_to_rwf(ifl)); + ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf); file_end_write(real.file); /* Update size */ ovl_copyattr(inode); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 5f034d34737e8c440bbbd13e5ef283793d841140 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- If ovl file is opened O_APPEND, the underlying realfile is also opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags on sync writes to realfile, just as we do with aio writes. Effectively, because sync ovl writes are protected by inode lock, this change only makes a difference if the realfile is written to (size extending writes) from underneath overlayfs. The behavior in this case is undefined, so it is ok if we change the behavior (to fail the ovl IOCB_APPEND write). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 1f4dcf3d8540..f44017d79bce 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -261,7 +261,7 @@ static void ovl_file_accessed(struct file *file) } #define OVL_IOCB_MASK \ - (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC) + (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND) static rwf_t iocb_to_rw_flags(int flags) { -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 389a4a4a19851211bb9c40d31c664591fb206f69 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- We want to protect concurrent updates of ovl inode size and mtime (i.e. ovl_copyattr()) from aio completion context. Punt write aio completion to a workqueue so that we can protect ovl_copyattr() with a spinlock. Export sb_init_dio_done_wq(), so that overlayfs can use its own dio workqueue to punt aio completions. Suggested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/8620dfd3-372d-4ae0-aa3f-2fe97dda1bca@kernel.dk/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/file.c | 42 +++++++++++++++++++++++++++++++++++++++++- fs/super.c | 1 + 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index f44017d79bce..09997340bbd3 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -15,10 +15,15 @@ #include <linux/fs.h> #include "overlayfs.h" +#include "../internal.h" /* for sb_init_dio_done_wq */ + struct ovl_aio_req { struct kiocb iocb; refcount_t ref; struct kiocb *orig_iocb; + /* used for aio completion */ + struct work_struct work; + long res; }; static struct kmem_cache *ovl_aio_request_cachep; @@ -302,6 +307,37 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res) orig_iocb->ki_complete(orig_iocb, res); } +static void ovl_aio_complete_work(struct work_struct *work) +{ + struct ovl_aio_req *aio_req = container_of(work, + struct ovl_aio_req, work); + + ovl_aio_rw_complete(&aio_req->iocb, aio_req->res); +} + +static void ovl_aio_queue_completion(struct kiocb *iocb, long res) +{ + struct ovl_aio_req *aio_req = container_of(iocb, + struct ovl_aio_req, iocb); + struct kiocb *orig_iocb = aio_req->orig_iocb; + + /* + * Punt to a work queue to serialize updates of mtime/size. + */ + aio_req->res = res; + INIT_WORK(&aio_req->work, ovl_aio_complete_work); + queue_work(file_inode(orig_iocb->ki_filp)->i_sb->s_dio_done_wq, + &aio_req->work); +} + +static int ovl_init_aio_done_wq(struct super_block *sb) +{ + if (sb->s_dio_done_wq) + return 0; + + return sb_init_dio_done_wq(sb); +} + static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; @@ -401,6 +437,10 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) } else { struct ovl_aio_req *aio_req; + ret = ovl_init_aio_done_wq(inode->i_sb); + if (ret) + goto out; + ret = -ENOMEM; aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); if (!aio_req) @@ -409,7 +449,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) aio_req->orig_iocb = iocb; kiocb_clone(&aio_req->iocb, iocb, get_file(real.file)); aio_req->iocb.ki_flags = ifl; - aio_req->iocb.ki_complete = ovl_aio_rw_complete; + aio_req->iocb.ki_complete = ovl_aio_queue_completion; refcount_set(&aio_req->ref, 2); kiocb_start_write(&aio_req->iocb); ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); diff --git a/fs/super.c b/fs/super.c index 42c9754e3add..14862ca3500a 100644 --- a/fs/super.c +++ b/fs/super.c @@ -2160,3 +2160,4 @@ int sb_init_dio_done_wq(struct super_block *sb) destroy_workqueue(wq); return 0; } +EXPORT_SYMBOL_GPL(sb_init_dio_done_wq); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit f7621b11e8acc8efa208c9420ff3ecb198b20e29 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- ovl_copyattr() may be called concurrently from aio completion context without any lock and that could lead to overlay inode attributes getting permanently out of sync with real inode attributes. Use ovl inode spinlock to protect ovl_copyattr(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/util.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 0bf3ffcd072f..3bc4a6bc582e 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -1406,6 +1406,7 @@ void ovl_copyattr(struct inode *inode) realinode = ovl_i_path_real(inode, &realpath); real_idmap = mnt_idmap(realpath.mnt); + spin_lock(&inode->i_lock); vfsuid = i_uid_into_vfsuid(real_idmap, realinode); vfsgid = i_gid_into_vfsgid(real_idmap, realinode); @@ -1416,4 +1417,5 @@ void ovl_copyattr(struct inode *inode) inode->i_mtime = realinode->i_mtime; inode_set_ctime_to_ts(inode, inode_get_ctime(realinode)); i_size_write(inode, i_size_read(realinode)); + spin_unlock(&inode->i_lock); } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> A writeable mapped backing file can perform writes to the real inode. Therefore, the real path mount must be kept writable so long as the writable map exists. This may not be strictly needed for ovelrayfs private upper mount, but it is correct to take the mnt_writers count in the vfs helper. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231009153712.1566422-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/internal.h | 11 +++++++++-- fs/open.c | 31 +++++++++++++++++++++++++------ 2 files changed, 34 insertions(+), 8 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 8260c738980c..eb6445994599 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -95,13 +95,20 @@ struct file *alloc_empty_file(int flags, const struct cred *cred); struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred); struct file *alloc_empty_backing_file(int flags, const struct cred *cred); +static inline void file_put_write_access(struct file *file) +{ + put_write_access(file->f_inode); + mnt_put_write_access(file->f_path.mnt); + if (unlikely(file->f_mode & FMODE_BACKING)) + mnt_put_write_access(backing_file_real_path(file)->mnt); +} + static inline void put_file_access(struct file *file) { if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { i_readcount_dec(file->f_inode); } else if (file->f_mode & FMODE_WRITER) { - put_write_access(file->f_inode); - mnt_put_write_access(file->f_path.mnt); + file_put_write_access(file); } } diff --git a/fs/open.c b/fs/open.c index 575cc1406709..992540db8639 100644 --- a/fs/open.c +++ b/fs/open.c @@ -870,6 +870,30 @@ SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group) return ksys_fchown(fd, user, group); } +static inline int file_get_write_access(struct file *f) +{ + int error; + + error = get_write_access(f->f_inode); + if (unlikely(error)) + return error; + error = mnt_get_write_access(f->f_path.mnt); + if (unlikely(error)) + goto cleanup_inode; + if (unlikely(f->f_mode & FMODE_BACKING)) { + error = mnt_get_write_access(backing_file_real_path(f)->mnt); + if (unlikely(error)) + goto cleanup_mnt; + } + return 0; + +cleanup_mnt: + mnt_put_write_access(f->f_path.mnt); +cleanup_inode: + put_write_access(f->f_inode); + return error; +} + static int do_dentry_open(struct file *f, struct inode *inode, int (*open)(struct inode *, struct file *)) @@ -892,14 +916,9 @@ static int do_dentry_open(struct file *f, if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { i_readcount_inc(inode); } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { - error = get_write_access(inode); + error = file_get_write_access(f); if (unlikely(error)) goto cleanup_file; - error = mnt_get_write_access(f->f_path.mnt); - if (unlikely(error)) { - put_write_access(inode); - goto cleanup_file; - } f->f_mode |= FMODE_WRITER; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 08582d678fcf11fc86188f0b92239d3d49667d8e category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops for mapped files and to display the overlayfs path in /proc/<pid>/maps. In preparation for storing the overlayfs "fake" path instead of the underlying "real" path in struct backing_file, define a noop helper file_user_path() that returns f_path for now. Use the new helper in procfs and kernel logs whenever a path of a mapped file is displayed to users. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231009153712.1566422-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 14 ++++++++++++++ arch/arc/kernel/troubleshoot.c | 6 ++++-- fs/proc/base.c | 2 +- fs/proc/nommu.c | 2 +- fs/proc/task_mmu.c | 4 ++-- fs/proc/task_nommu.c | 2 +- kernel/trace/trace_output.c | 2 +- 7 files changed, 24 insertions(+), 8 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 493c13cf7cd6..08a90dc413e8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2616,6 +2616,20 @@ static inline const struct path *file_real_path(struct file *f) return &f->f_path; } +/* + * file_user_path - get the path to display for memory mapped file + * + * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file + * stored in ->vm_file is a backing file whose f_inode is on the underlying + * filesystem. When the mapped file path is displayed to user (e.g. via + * /proc/<pid>/maps), this helper should be used to get the path to display + * to the user, which is the path of the fd that user has requested to map. + */ +static inline const struct path *file_user_path(struct file *f) +{ + return &f->f_path; +} + static inline struct file *file_clone_open(struct file *file) { return dentry_open(&file->f_path, file->f_flags, file->f_cred); diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c index d5b3ed2c58f5..c380d8c30704 100644 --- a/arch/arc/kernel/troubleshoot.c +++ b/arch/arc/kernel/troubleshoot.c @@ -90,10 +90,12 @@ static void show_faulting_vma(unsigned long address) */ if (vma) { char buf[ARC_PATH_MAX]; - char *nm = "?"; + char *nm = "anon"; if (vma->vm_file) { - nm = file_path(vma->vm_file, buf, ARC_PATH_MAX-1); + /* XXX: can we use %pD below and get rid of buf? */ + nm = d_path(file_user_path(vma->vm_file), buf, + ARC_PATH_MAX-1); if (IS_ERR(nm)) nm = "?"; } diff --git a/fs/proc/base.c b/fs/proc/base.c index ac79ba23c0cb..250e2c66348f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2281,7 +2281,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path) rc = -ENOENT; vma = find_exact_vma(mm, vm_start, vm_end); if (vma && vma->vm_file) { - *path = vma->vm_file->f_path; + *path = *file_user_path(vma->vm_file); path_get(path); rc = 0; } diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c index 4d3493579458..c6e7ebc63756 100644 --- a/fs/proc/nommu.c +++ b/fs/proc/nommu.c @@ -58,7 +58,7 @@ static int nommu_region_show(struct seq_file *m, struct vm_region *region) if (file) { seq_pad(m, ' '); - seq_file_path(m, file, ""); + seq_path(m, file_user_path(file), ""); } seq_putc(m, '\n'); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ab12eb479c1f..46b4c39a12db 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -297,7 +297,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (anon_name) seq_printf(m, "[anon_shmem:%s]", anon_name->name); else - seq_file_path(m, file, "\n"); + seq_path(m, file_user_path(file), "\n"); goto done; } @@ -1976,7 +1976,7 @@ static int show_numa_map(struct seq_file *m, void *v) if (file) { seq_puts(m, " file="); - seq_file_path(m, file, "\n\t= "); + seq_path(m, file_user_path(file), "\n\t= "); } else if (vma_is_initial_heap(vma)) { seq_puts(m, " heap"); } else if (vma_is_initial_stack(vma)) { diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 7cebd397cc26..bce674533000 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -157,7 +157,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) if (file) { seq_pad(m, ' '); - seq_file_path(m, file, ""); + seq_path(m, file_user_path(file), ""); } else if (mm && vma_is_initial_stack(vma)) { seq_pad(m, ' '); seq_puts(m, "[stack]"); diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 2b948d35fb59..70ba8571425c 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -408,7 +408,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, vmstart = vma->vm_start; } if (file) { - ret = trace_seq_path(s, &file->f_path); + ret = trace_seq_path(s, file_user_path(file)); if (ret) trace_seq_printf(s, "[+0x%lx]", ip - vmstart); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> A backing file struct stores two path's, one "real" path that is referring to f_inode and one "fake" path, which should be displayed to users in /proc/<pid>/maps. There is a lot more potential code that needs to know the "real" path, then code that needs to know the "fake" path. Instead of code having to request the "real" path with file_real_path(), store the "real" path in f_path and require code that needs to know the "fake" path request it with file_user_path(). Replace the file_real_path() helper with a simple const accessor f_path(). After this change, file_dentry() is not expected to observe any files with overlayfs f_path and real f_inode, so the call to ->d_real() should not be needed. Leave the ->d_real() call for now and add an assertion in ovl_d_real() to catch if we made wrong assumptions. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsn... Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231009153712.1566422-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/internal.h | 2 +- include/linux/fs.h | 22 ++++------------------ include/linux/fsnotify.h | 3 +-- fs/file_table.c | 12 ++++++------ fs/open.c | 23 +++++++++++------------ fs/overlayfs/super.c | 16 ++++++++++++---- 6 files changed, 35 insertions(+), 43 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index eb6445994599..273e6fd40d1b 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -100,7 +100,7 @@ static inline void file_put_write_access(struct file *file) put_write_access(file->f_inode); mnt_put_write_access(file->f_path.mnt); if (unlikely(file->f_mode & FMODE_BACKING)) - mnt_put_write_access(backing_file_real_path(file)->mnt); + mnt_put_write_access(backing_file_user_path(file)->mnt); } static inline void put_file_access(struct file *file) diff --git a/include/linux/fs.h b/include/linux/fs.h index 08a90dc413e8..72bd9a001ff3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2595,26 +2595,10 @@ struct file *dentry_open(const struct path *path, int flags, const struct cred *creds); struct file *dentry_create(const struct path *path, int flags, umode_t mode, const struct cred *cred); -struct file *backing_file_open(const struct path *path, int flags, +struct file *backing_file_open(const struct path *user_path, int flags, const struct path *real_path, const struct cred *cred); -struct path *backing_file_real_path(struct file *f); - -/* - * file_real_path - get the path corresponding to f_inode - * - * When opening a backing file for a stackable filesystem (e.g., - * overlayfs) f_path may be on the stackable filesystem and f_inode on - * the underlying filesystem. When the path associated with f_inode is - * needed, this helper should be used instead of accessing f_path - * directly. -*/ -static inline const struct path *file_real_path(struct file *f) -{ - if (unlikely(f->f_mode & FMODE_BACKING)) - return backing_file_real_path(f); - return &f->f_path; -} +struct path *backing_file_user_path(struct file *f); /* * file_user_path - get the path to display for memory mapped file @@ -2627,6 +2611,8 @@ static inline const struct path *file_real_path(struct file *f) */ static inline const struct path *file_user_path(struct file *f) { + if (unlikely(f->f_mode & FMODE_BACKING)) + return backing_file_user_path(f); return &f->f_path; } diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 0dea8d0fdb0b..912b58bb66c5 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -102,8 +102,7 @@ static inline int fsnotify_file(struct file *file, __u32 mask) if (file->f_mode & (FMODE_NONOTIFY | FMODE_PATH)) return 0; - /* Overlayfs internal files have fake f_path */ - path = file_real_path(file); + path = &file->f_path; return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); } diff --git a/fs/file_table.c b/fs/file_table.c index 234284ef72a9..a5a3a385f24c 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -44,10 +44,10 @@ static struct kmem_cache *filp_cachep __read_mostly; static struct percpu_counter nr_files __cacheline_aligned_in_smp; -/* Container for backing file with optional real path */ +/* Container for backing file with optional user path */ struct backing_file { struct file file; - struct path real_path; + struct path user_path; }; static inline struct backing_file *backing_file(struct file *f) @@ -55,11 +55,11 @@ static inline struct backing_file *backing_file(struct file *f) return container_of(f, struct backing_file, file); } -struct path *backing_file_real_path(struct file *f) +struct path *backing_file_user_path(struct file *f) { - return &backing_file(f)->real_path; + return &backing_file(f)->user_path; } -EXPORT_SYMBOL_GPL(backing_file_real_path); +EXPORT_SYMBOL_GPL(backing_file_user_path); static void file_free_rcu(struct rcu_head *head) { @@ -76,7 +76,7 @@ static inline void file_free(struct file *f) { security_file_free(f); if (unlikely(f->f_mode & FMODE_BACKING)) - path_put(backing_file_real_path(f)); + path_put(backing_file_user_path(f)); if (likely(!(f->f_mode & FMODE_NOACCOUNT))) percpu_counter_dec(&nr_files); call_rcu(&f->f_rcuhead, file_free_rcu); diff --git a/fs/open.c b/fs/open.c index 992540db8639..ec0e471ef5cc 100644 --- a/fs/open.c +++ b/fs/open.c @@ -881,7 +881,7 @@ static inline int file_get_write_access(struct file *f) if (unlikely(error)) goto cleanup_inode; if (unlikely(f->f_mode & FMODE_BACKING)) { - error = mnt_get_write_access(backing_file_real_path(f)->mnt); + error = mnt_get_write_access(backing_file_user_path(f)->mnt); if (unlikely(error)) goto cleanup_mnt; } @@ -1179,20 +1179,19 @@ EXPORT_SYMBOL_GPL(kernel_file_open); /** * backing_file_open - open a backing file for kernel internal use - * @path: path of the file to open + * @user_path: path that the user reuqested to open * @flags: open flags * @real_path: path of the backing file * @cred: credentials for open * * Open a backing file for a stackable filesystem (e.g., overlayfs). - * @path may be on the stackable filesystem and backing inode on the - * underlying filesystem. In this case, we want to be able to return - * the @real_path of the backing inode. This is done by embedding the - * returned file into a container structure that also stores the path of - * the backing inode on the underlying filesystem, which can be - * retrieved using backing_file_real_path(). + * @user_path may be on the stackable filesystem and @real_path on the + * underlying filesystem. In this case, we want to be able to return the + * @user_path of the stackable filesystem. This is done by embedding the + * returned file into a container structure that also stores the stacked + * file's path, which can be retrieved using backing_file_user_path(). */ -struct file *backing_file_open(const struct path *path, int flags, +struct file *backing_file_open(const struct path *user_path, int flags, const struct path *real_path, const struct cred *cred) { @@ -1203,9 +1202,9 @@ struct file *backing_file_open(const struct path *path, int flags, if (IS_ERR(f)) return f; - f->f_path = *path; - path_get(real_path); - *backing_file_real_path(f) = *real_path; + path_get(user_path); + *backing_file_user_path(f) = *user_path; + f->f_path = *real_path; error = do_dentry_open(f, d_inode(real_path->dentry), NULL); if (error) { fput(f); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2c056d737c27..f37d2cb86404 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -34,14 +34,22 @@ static struct dentry *ovl_d_real(struct dentry *dentry, struct dentry *real = NULL, *lower; int err; - /* It's an overlay file */ + /* + * vfs is only expected to call d_real() with NULL from d_real_inode() + * and with overlay inode from file_dentry() on an overlay file. + * + * TODO: remove @inode argument from d_real() API, remove code in this + * function that deals with non-NULL @inode and remove d_real() call + * from file_dentry(). + */ if (inode && d_inode(dentry) == inode) return dentry; + else if (inode) + goto bug; if (!d_is_reg(dentry)) { - if (!inode || inode == d_inode(dentry)) - return dentry; - goto bug; + /* d_real_inode() is only relevant for regular files */ + return dentry; } real = ovl_dentry_upper(dentry); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit c002728f608183449673818076380124935e6b9b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- A simple wrapper for updating ovl inode size/mtime, to conform with ovl_file_accessed(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/file.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 09997340bbd3..acdd79dd4bfa 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -240,6 +240,12 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence) return ret; } +static void ovl_file_modified(struct file *file) +{ + /* Update size/mtime */ + ovl_copyattr(file_inode(file)); +} + static void ovl_file_accessed(struct file *file) { struct inode *inode, *upperinode; @@ -287,10 +293,8 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req) struct kiocb *orig_iocb = aio_req->orig_iocb; if (iocb->ki_flags & IOCB_WRITE) { - struct inode *inode = file_inode(orig_iocb->ki_filp); - kiocb_end_write(iocb); - ovl_copyattr(inode); + ovl_file_modified(orig_iocb->ki_filp); } orig_iocb->ki_pos = iocb->ki_pos; @@ -433,7 +437,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf); file_end_write(real.file); /* Update size */ - ovl_copyattr(inode); + ovl_file_modified(file); } else { struct ovl_aio_req *aio_req; @@ -523,7 +527,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out, file_end_write(real.file); /* Update size */ - ovl_copyattr(inode); + ovl_file_modified(out); revert_creds(old_cred); fdput(real); @@ -604,7 +608,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len revert_creds(old_cred); /* Update size */ - ovl_copyattr(inode); + ovl_file_modified(file); fdput(real); @@ -688,7 +692,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in, revert_creds(old_cred); /* Update size */ - ovl_copyattr(inode_out); + ovl_file_modified(file_out); fdput(real_in); fdput(real_out); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit d08d3b3c2caf6c482703bbc5efaa7b9ae95dea20 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- ovl_get_write_access() gets write access to upper mnt without taking freeze protection on upper sb and ovl_start_write() only takes freeze protection on upper sb. These helpers will be used to breakup the large ovl_want_write() scope during copy up into finer grained freeze protection scopes. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/overlayfs.h | 4 ++++ fs/overlayfs/util.c | 26 ++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 981967e507b3..95c4bc67dbe3 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -406,6 +406,10 @@ static inline int ovl_do_getattr(const struct path *path, struct kstat *stat, } /* util.c */ +int ovl_get_write_access(struct dentry *dentry); +void ovl_put_write_access(struct dentry *dentry); +void ovl_start_write(struct dentry *dentry); +void ovl_end_write(struct dentry *dentry); int ovl_want_write(struct dentry *dentry); void ovl_drop_write(struct dentry *dentry); struct dentry *ovl_workdir(struct dentry *dentry); diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 3bc4a6bc582e..f543956d23b4 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -17,12 +17,38 @@ #include <linux/ratelimit.h> #include "overlayfs.h" +/* Get write access to upper mnt - may fail if upper sb was remounted ro */ +int ovl_get_write_access(struct dentry *dentry) +{ + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); + return mnt_get_write_access(ovl_upper_mnt(ofs)); +} + +/* Get write access to upper sb - may block if upper sb is frozen */ +void ovl_start_write(struct dentry *dentry) +{ + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); + sb_start_write(ovl_upper_mnt(ofs)->mnt_sb); +} + int ovl_want_write(struct dentry *dentry) { struct ovl_fs *ofs = OVL_FS(dentry->d_sb); return mnt_want_write(ovl_upper_mnt(ofs)); } +void ovl_put_write_access(struct dentry *dentry) +{ + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); + mnt_put_write_access(ovl_upper_mnt(ofs)); +} + +void ovl_end_write(struct dentry *dentry) +{ + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); + sb_end_write(ovl_upper_mnt(ofs)->mnt_sb); +} + void ovl_drop_write(struct dentry *dentry) { struct ovl_fs *ofs = OVL_FS(dentry->d_sb); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 162d06444070c12827d604a2cb6b6bd98d48cbb0 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/copy_up.c | 13 +++------ fs/overlayfs/dir.c | 60 ++++++++++++++++++------------------------ fs/overlayfs/export.c | 7 +---- fs/overlayfs/inode.c | 57 +++++++++++++++++++-------------------- fs/overlayfs/util.c | 34 +++++++++++++++++++----- 5 files changed, 84 insertions(+), 87 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index dbf7b3cd70ca..cc5fe472d6e3 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -1170,17 +1170,10 @@ static bool ovl_open_need_copy_up(struct dentry *dentry, int flags) int ovl_maybe_copy_up(struct dentry *dentry, int flags) { - int err = 0; - - if (ovl_open_need_copy_up(dentry, flags)) { - err = ovl_want_write(dentry); - if (!err) { - err = ovl_copy_up_flags(dentry, flags); - ovl_drop_write(dentry); - } - } + if (!ovl_open_need_copy_up(dentry, flags)) + return 0; - return err; + return ovl_copy_up_flags(dentry, flags); } int ovl_copy_up_with_data(struct dentry *dentry) diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 54602f0bed8b..68cfb2959f75 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -556,10 +556,6 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode, struct cred *override_cred; struct dentry *parent = dentry->d_parent; - err = ovl_copy_up(parent); - if (err) - return err; - old_cred = ovl_override_creds(dentry->d_sb); /* @@ -623,6 +619,10 @@ static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev, .link = link, }; + err = ovl_copy_up(dentry->d_parent); + if (err) + return err; + err = ovl_want_write(dentry); if (err) goto out; @@ -697,28 +697,24 @@ static int ovl_link(struct dentry *old, struct inode *newdir, int err; struct inode *inode; - err = ovl_want_write(old); + err = ovl_copy_up(old); if (err) goto out; - err = ovl_copy_up(old); + err = ovl_copy_up(new->d_parent); if (err) - goto out_drop_write; + goto out; - err = ovl_copy_up(new->d_parent); + err = ovl_nlink_start(old); if (err) - goto out_drop_write; + goto out; if (ovl_is_metacopy_dentry(old)) { err = ovl_set_link_redirect(old); if (err) - goto out_drop_write; + goto out_nlink_end; } - err = ovl_nlink_start(old); - if (err) - goto out_drop_write; - inode = d_inode(old); ihold(inode); @@ -728,9 +724,8 @@ static int ovl_link(struct dentry *old, struct inode *newdir, if (err) iput(inode); +out_nlink_end: ovl_nlink_end(old); -out_drop_write: - ovl_drop_write(old); out: return err; } @@ -888,17 +883,13 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir) goto out; } - err = ovl_want_write(dentry); - if (err) - goto out; - err = ovl_copy_up(dentry->d_parent); if (err) - goto out_drop_write; + goto out; err = ovl_nlink_start(dentry); if (err) - goto out_drop_write; + goto out; old_cred = ovl_override_creds(dentry->d_sb); if (!lower_positive) @@ -923,8 +914,6 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir) if (ovl_dentry_upper(dentry)) ovl_copyattr(d_inode(dentry)); -out_drop_write: - ovl_drop_write(dentry); out: ovl_cache_free(&list); return err; @@ -1128,29 +1117,32 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir, } } - err = ovl_want_write(old); - if (err) - goto out; - err = ovl_copy_up(old); if (err) - goto out_drop_write; + goto out; err = ovl_copy_up(new->d_parent); if (err) - goto out_drop_write; + goto out; if (!overwrite) { err = ovl_copy_up(new); if (err) - goto out_drop_write; + goto out; } else if (d_inode(new)) { err = ovl_nlink_start(new); if (err) - goto out_drop_write; + goto out; update_nlink = true; } + if (!update_nlink) { + /* ovl_nlink_start() took ovl_want_write() */ + err = ovl_want_write(old); + if (err) + goto out; + } + old_cred = ovl_override_creds(old->d_sb); if (!list_empty(&list)) { @@ -1283,8 +1275,8 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir, revert_creds(old_cred); if (update_nlink) ovl_nlink_end(new); -out_drop_write: - ovl_drop_write(old); + else + ovl_drop_write(old); out: dput(opaquedir); ovl_cache_free(&list); diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c index 3a17e4366f28..19df4efbf148 100644 --- a/fs/overlayfs/export.c +++ b/fs/overlayfs/export.c @@ -23,12 +23,7 @@ static int ovl_encode_maybe_copy_up(struct dentry *dentry) if (ovl_dentry_upper(dentry)) return 0; - err = ovl_want_write(dentry); - if (!err) { - err = ovl_copy_up(dentry); - ovl_drop_write(dentry); - } - + err = ovl_copy_up(dentry); if (err) { pr_warn_ratelimited("failed to copy up on encode (%pd2, err=%i)\n", dentry, err); diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 9c42a30317d5..64a18913b1d6 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -32,10 +32,6 @@ int ovl_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (err) return err; - err = ovl_want_write(dentry); - if (err) - goto out; - if (attr->ia_valid & ATTR_SIZE) { /* Truncate should trigger data copy up as well */ full_copy_up = true; @@ -54,7 +50,7 @@ int ovl_setattr(struct mnt_idmap *idmap, struct dentry *dentry, winode = d_inode(upperdentry); err = get_write_access(winode); if (err) - goto out_drop_write; + goto out; } if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID)) @@ -78,6 +74,10 @@ int ovl_setattr(struct mnt_idmap *idmap, struct dentry *dentry, */ attr->ia_valid &= ~ATTR_OPEN; + err = ovl_want_write(dentry); + if (err) + goto out_put_write; + inode_lock(upperdentry->d_inode); old_cred = ovl_override_creds(dentry->d_sb); err = ovl_do_notify_change(ofs, upperdentry, attr); @@ -85,12 +85,12 @@ int ovl_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (!err) ovl_copyattr(dentry->d_inode); inode_unlock(upperdentry->d_inode); + ovl_drop_write(dentry); +out_put_write: if (winode) put_write_access(winode); } -out_drop_write: - ovl_drop_write(dentry); out: return err; } @@ -361,27 +361,27 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, struct path realpath; const struct cred *old_cred; - err = ovl_want_write(dentry); - if (err) - goto out; - if (!value && !upperdentry) { ovl_path_lower(dentry, &realpath); old_cred = ovl_override_creds(dentry->d_sb); err = vfs_getxattr(mnt_idmap(realpath.mnt), realdentry, name, NULL, 0); revert_creds(old_cred); if (err < 0) - goto out_drop_write; + goto out; } if (!upperdentry) { err = ovl_copy_up(dentry); if (err) - goto out_drop_write; + goto out; realdentry = ovl_dentry_upper(dentry); } + err = ovl_want_write(dentry); + if (err) + goto out; + old_cred = ovl_override_creds(dentry->d_sb); if (value) { err = ovl_do_setxattr(ofs, realdentry, name, value, size, @@ -391,12 +391,10 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, err = ovl_do_removexattr(ofs, realdentry, name); } revert_creds(old_cred); + ovl_drop_write(dentry); /* copy c/mtime */ ovl_copyattr(inode); - -out_drop_write: - ovl_drop_write(dentry); out: return err; } @@ -611,10 +609,6 @@ static int ovl_set_or_remove_acl(struct dentry *dentry, struct inode *inode, struct dentry *upperdentry = ovl_dentry_upper(dentry); struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry); - err = ovl_want_write(dentry); - if (err) - return err; - /* * If ACL is to be removed from a lower file, check if it exists in * the first place before copying it up. @@ -630,7 +624,7 @@ static int ovl_set_or_remove_acl(struct dentry *dentry, struct inode *inode, revert_creds(old_cred); if (IS_ERR(real_acl)) { err = PTR_ERR(real_acl); - goto out_drop_write; + goto out; } posix_acl_release(real_acl); } @@ -638,23 +632,26 @@ static int ovl_set_or_remove_acl(struct dentry *dentry, struct inode *inode, if (!upperdentry) { err = ovl_copy_up(dentry); if (err) - goto out_drop_write; + goto out; realdentry = ovl_dentry_upper(dentry); } + err = ovl_want_write(dentry); + if (err) + goto out; + old_cred = ovl_override_creds(dentry->d_sb); if (acl) err = ovl_do_set_acl(ofs, realdentry, acl_name, acl); else err = ovl_do_remove_acl(ofs, realdentry, acl_name); revert_creds(old_cred); + ovl_drop_write(dentry); /* copy c/mtime */ ovl_copyattr(inode); - -out_drop_write: - ovl_drop_write(dentry); +out: return err; } @@ -782,14 +779,14 @@ int ovl_fileattr_set(struct mnt_idmap *idmap, unsigned int flags; int err; - err = ovl_want_write(dentry); - if (err) - goto out; - err = ovl_copy_up(dentry); if (!err) { ovl_path_real(dentry, &upperpath); + err = ovl_want_write(dentry); + if (err) + goto out; + old_cred = ovl_override_creds(inode->i_sb); /* * Store immutable/append-only flags in xattr and clear them @@ -802,6 +799,7 @@ int ovl_fileattr_set(struct mnt_idmap *idmap, if (!err) err = ovl_real_fileattr_set(&upperpath, fa); revert_creds(old_cred); + ovl_drop_write(dentry); /* * Merge real inode flags with inode flags read from @@ -816,7 +814,6 @@ int ovl_fileattr_set(struct mnt_idmap *idmap, /* Update ctime */ ovl_copyattr(inode); } - ovl_drop_write(dentry); out: return err; } diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index f543956d23b4..e9e867119f34 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -679,16 +679,26 @@ int ovl_copy_up_start(struct dentry *dentry, int flags) int err; err = ovl_inode_lock_interruptible(inode); - if (!err && ovl_already_copied_up_locked(dentry, flags)) { + if (err) + return err; + + if (ovl_already_copied_up_locked(dentry, flags)) err = 1; /* Already copied up */ - ovl_inode_unlock(inode); - } + else + err = ovl_want_write(dentry); + if (err) + goto out_unlock; + + return 0; +out_unlock: + ovl_inode_unlock(inode); return err; } void ovl_copy_up_end(struct dentry *dentry) { + ovl_drop_write(dentry); ovl_inode_unlock(d_inode(dentry)); } @@ -1091,8 +1101,12 @@ int ovl_nlink_start(struct dentry *dentry) if (err) return err; + err = ovl_want_write(dentry); + if (err) + goto out_unlock; + if (d_is_dir(dentry) || !ovl_test_flag(OVL_INDEX, inode)) - goto out; + return 0; old_cred = ovl_override_creds(dentry->d_sb); /* @@ -1103,10 +1117,15 @@ int ovl_nlink_start(struct dentry *dentry) */ err = ovl_set_nlink_upper(dentry); revert_creds(old_cred); - -out: if (err) - ovl_inode_unlock(inode); + goto out_drop_write; + + return 0; + +out_drop_write: + ovl_drop_write(dentry); +out_unlock: + ovl_inode_unlock(inode); return err; } @@ -1123,6 +1142,7 @@ void ovl_nlink_end(struct dentry *dentry) revert_creds(old_cred); } + ovl_drop_write(dentry); ovl_inode_unlock(inode); } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit c63e56a4a6523fcb1358e1878607d77a40b534bb category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek take the ovl_inode_lock, without holding upper sb_writers. In case of nested lower overlay that uses same upper fs as this overlay, lockdep will warn about (possibly false positive) circular lock dependency when doing open/llseek of lower ovl file during copy up with our upper sb_writers held, because the locking ordering seems reverse to the locking order in ovl_copy_up_start(): - lower ovl_inode_lock - upper sb_writers Let the copy up "transaction" keeps an elevated mnt write count on upper mnt, but leaves taking upper sb_writers to lower level helpers only when they actually need it. This allows to avoid holding upper sb_writers during lower file open/llseek and prevents the lockdep warning. Minimizing the scope of upper sb_writers during copy up is also needed for fixing another possible deadlocks by a following patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Conflicts: fs/overlayfs/copy_up.c [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/copy_up.c | 76 ++++++++++++++++++++++++++++++------------ fs/overlayfs/util.c | 8 +++-- 2 files changed, 61 insertions(+), 23 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index cc5fe472d6e3..8ad22ea5eec0 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -252,7 +252,9 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, return PTR_ERR(old_file); /* Try to use clone_file_range to clone up within the same fs */ + ovl_start_write(dentry); cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0); + ovl_end_write(dentry); if (cloned == len) goto out_fput; /* Couldn't clone, so now we try to copy the data */ @@ -287,8 +289,12 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, * it may not recognize all kind of holes and sometimes * only skips partial of hole area. However, it will be * enough for most of the use cases. + * + * We do not hold upper sb_writers throughout the loop to avert + * lockdep warning with llseek of lower file in nested overlay: + * - upper sb_writers + * -- lower ovl_inode_lock (ovl_llseek) */ - if (skip_hole && data_pos < old_pos) { data_pos = vfs_llseek(old_file, old_pos, SEEK_DATA); if (data_pos > old_pos) { @@ -303,9 +309,11 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, } } + ovl_start_write(dentry); bytes = do_splice_direct(old_file, &old_pos, new_file, &new_pos, this_len, SPLICE_F_MOVE); + ovl_end_write(dentry); if (bytes <= 0) { error = bytes; break; @@ -555,14 +563,16 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c) struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb); struct inode *udir = d_inode(upperdir); + ovl_start_write(c->dentry); + /* Mark parent "impure" because it may now contain non-pure upper */ err = ovl_set_impure(c->parent, upperdir); if (err) - return err; + goto out; err = ovl_set_nlink_lower(c->dentry); if (err) - return err; + goto out; inode_lock_nested(udir, I_MUTEX_PARENT); upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir, @@ -581,10 +591,12 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c) } inode_unlock(udir); if (err) - return err; + goto out; err = ovl_set_nlink_upper(c->dentry); +out: + ovl_end_write(c->dentry); return err; } @@ -719,21 +731,19 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) .link = c->link }; - /* workdir and destdir could be the same when copying up to indexdir */ - err = -EIO; - if (lock_rename(c->workdir, c->destdir) != NULL) - goto unlock; - err = ovl_prep_cu_creds(c->dentry, &cc); if (err) - goto unlock; + return err; + ovl_start_write(c->dentry); + inode_lock(wdir); temp = ovl_create_temp(ofs, c->workdir, &cattr); + inode_unlock(wdir); + ovl_end_write(c->dentry); ovl_revert_cu_creds(&cc); - err = PTR_ERR(temp); if (IS_ERR(temp)) - goto unlock; + return PTR_ERR(temp); /* * Copy up data first and then xattrs. Writing data after @@ -741,8 +751,21 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) */ path.dentry = temp; err = ovl_copy_up_data(c, &path); - if (err) + /* + * We cannot hold lock_rename() throughout this helper, because or + * lock ordering with sb_writers, which shouldn't be held when calling + * ovl_copy_up_data(), so lock workdir and destdir and make sure that + * temp wasn't moved before copy up completion or cleanup. + * If temp was moved, abort without the cleanup. + */ + ovl_start_write(c->dentry); + if (lock_rename(c->workdir, c->destdir) != NULL || + temp->d_parent != c->workdir) { + err = -EIO; + goto unlock; + } else if (err) { goto cleanup; + } err = ovl_copy_up_metadata(c, temp); if (err) @@ -779,6 +802,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) ovl_set_flag(OVL_WHITEOUTS, inode); unlock: unlock_rename(c->workdir, c->destdir); + ovl_end_write(c->dentry); return err; @@ -802,9 +826,10 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) if (err) return err; + ovl_start_write(c->dentry); tmpfile = ovl_do_tmpfile(ofs, c->workdir, c->stat.mode); + ovl_end_write(c->dentry); ovl_revert_cu_creds(&cc); - if (IS_ERR(tmpfile)) return PTR_ERR(tmpfile); @@ -815,9 +840,11 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) goto out_fput; } + ovl_start_write(c->dentry); + err = ovl_copy_up_metadata(c, temp); if (err) - goto out_fput; + goto out; inode_lock_nested(udir, I_MUTEX_PARENT); @@ -831,7 +858,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) inode_unlock(udir); if (err) - goto out_fput; + goto out; if (c->metacopy_digest) ovl_set_flag(OVL_HAS_DIGEST, d_inode(c->dentry)); @@ -843,6 +870,8 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) ovl_set_upperdata(d_inode(c->dentry)); ovl_inode_update(d_inode(c->dentry), dget(temp)); +out: + ovl_end_write(c->dentry); out_fput: fput(tmpfile); return err; @@ -893,7 +922,9 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) * Mark parent "impure" because it may now contain non-pure * upper */ + ovl_start_write(c->dentry); err = ovl_set_impure(c->parent, c->destdir); + ovl_end_write(c->dentry); if (err) return err; } @@ -909,6 +940,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) if (c->indexed) ovl_set_flag(OVL_INDEX, d_inode(c->dentry)); + ovl_start_write(c->dentry); if (to_index) { /* Initialize nlink for copy up of disconnected dentry */ err = ovl_set_nlink_upper(c->dentry); @@ -923,6 +955,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) ovl_dentry_set_upper_alias(c->dentry); ovl_dentry_update_reval(c->dentry, ovl_dentry_upper(c->dentry)); } + ovl_end_write(c->dentry); out: if (to_index) @@ -1011,15 +1044,16 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) * Writing to upper file will clear security.capability xattr. We * don't want that to happen for normal copy-up operation. */ + ovl_start_write(c->dentry); if (capability) { err = ovl_do_setxattr(ofs, upperpath.dentry, XATTR_NAME_CAPS, capability, cap_size, 0); - if (err) - goto out_free; } - - - err = ovl_removexattr(ofs, upperpath.dentry, OVL_XATTR_METACOPY); + if (!err) { + err = ovl_removexattr(ofs, upperpath.dentry, + OVL_XATTR_METACOPY); + } + ovl_end_write(c->dentry); if (err) goto out_free; diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index e9e867119f34..a08901e24096 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -673,6 +673,10 @@ bool ovl_already_copied_up(struct dentry *dentry, int flags) return false; } +/* + * The copy up "transaction" keeps an elevated mnt write count on upper mnt, + * but leaves taking freeze protection on upper sb to lower level helpers. + */ int ovl_copy_up_start(struct dentry *dentry, int flags) { struct inode *inode = d_inode(dentry); @@ -685,7 +689,7 @@ int ovl_copy_up_start(struct dentry *dentry, int flags) if (ovl_already_copied_up_locked(dentry, flags)) err = 1; /* Already copied up */ else - err = ovl_want_write(dentry); + err = ovl_get_write_access(dentry); if (err) goto out_unlock; @@ -698,7 +702,7 @@ int ovl_copy_up_start(struct dentry *dentry, int flags) void ovl_copy_up_end(struct dentry *dentry) { - ovl_drop_write(dentry); + ovl_put_write_access(dentry); ovl_inode_unlock(d_inode(dentry)); } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 5b02bfc1e7e3811c5bf7f0fa626a0694d0dbbd77 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When lower fs is a nested overlayfs, calling encode_fh() on a lower directory dentry may trigger copy up and take sb_writers on the upper fs of the lower nested overlayfs. The lower nested overlayfs may have the same upper fs as this overlayfs, so nested sb_writers lock is illegal. Move all the callers that encode lower fh to before ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Conflicts: fs/overlayfs/copy_up.c fs/overlayfs/overlayfs.h [Context differences.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/overlayfs.h | 26 ++++++++++++++------ fs/overlayfs/copy_up.c | 52 ++++++++++++++++++++++++---------------- fs/overlayfs/namei.c | 37 +++++++++++++++++++++------- fs/overlayfs/super.c | 20 +++++++++++----- fs/overlayfs/util.c | 11 ++++++++- 5 files changed, 103 insertions(+), 43 deletions(-) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 95c4bc67dbe3..6ac8b69549da 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -636,11 +636,15 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh, int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected, struct dentry *upperdentry, struct ovl_path **stackp); int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry, - enum ovl_xattr ox, struct dentry *real, bool is_upper, - bool set); + enum ovl_xattr ox, const struct ovl_fh *fh, + bool is_upper, bool set); +int ovl_verify_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry, + enum ovl_xattr ox, struct dentry *real, + bool is_upper, bool set); struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index, bool connected); int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index); +int ovl_get_index_name_fh(const struct ovl_fh *fh, struct qstr *name); int ovl_get_index_name(struct ovl_fs *ofs, struct dentry *origin, struct qstr *name); struct dentry *ovl_get_index_fh(struct ovl_fs *ofs, struct ovl_fh *fh); @@ -652,17 +656,24 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); bool ovl_lower_positive(struct dentry *dentry); +static inline int ovl_verify_origin_fh(struct ovl_fs *ofs, struct dentry *upper, + const struct ovl_fh *fh, bool set) +{ + return ovl_verify_set_fh(ofs, upper, OVL_XATTR_ORIGIN, fh, false, set); +} + static inline int ovl_verify_origin(struct ovl_fs *ofs, struct dentry *upper, struct dentry *origin, bool set) { - return ovl_verify_set_fh(ofs, upper, OVL_XATTR_ORIGIN, origin, - false, set); + return ovl_verify_origin_xattr(ofs, upper, OVL_XATTR_ORIGIN, origin, + false, set); } static inline int ovl_verify_upper(struct ovl_fs *ofs, struct dentry *index, struct dentry *upper, bool set) { - return ovl_verify_set_fh(ofs, index, OVL_XATTR_UPPER, upper, true, set); + return ovl_verify_origin_xattr(ofs, index, OVL_XATTR_UPPER, upper, + true, set); } /* readdir.c */ @@ -827,8 +838,9 @@ int ovl_copy_xattr(struct super_block *sb, const struct path *path, struct dentr int ovl_set_attr(struct ovl_fs *ofs, struct dentry *upper, struct kstat *stat); struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct inode *realinode, bool is_upper); -int ovl_set_origin(struct ovl_fs *ofs, struct dentry *lower, - struct dentry *upper); +struct ovl_fh *ovl_get_origin_fh(struct ovl_fs *ofs, struct dentry *origin); +int ovl_set_origin_fh(struct ovl_fs *ofs, const struct ovl_fh *fh, + struct dentry *upper); /* export.c */ extern const struct export_operations ovl_export_operations; diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 8ad22ea5eec0..75059b830246 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -434,29 +434,28 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct inode *realinode, return ERR_PTR(err); } -int ovl_set_origin(struct ovl_fs *ofs, struct dentry *lower, - struct dentry *upper) +struct ovl_fh *ovl_get_origin_fh(struct ovl_fs *ofs, struct dentry *origin) { - const struct ovl_fh *fh = NULL; - int err; - /* * When lower layer doesn't support export operations store a 'null' fh, * so we can use the overlay.origin xattr to distignuish between a copy * up and a pure upper inode. */ - if (ovl_can_decode_fh(lower->d_sb)) { - fh = ovl_encode_real_fh(ofs, d_inode(lower), false); - if (IS_ERR(fh)) - return PTR_ERR(fh); - } + if (!ovl_can_decode_fh(origin->d_sb)) + return NULL; + return ovl_encode_real_fh(ofs, d_inode(origin), false); +} + +int ovl_set_origin_fh(struct ovl_fs *ofs, const struct ovl_fh *fh, + struct dentry *upper) +{ + int err; /* * Do not fail when upper doesn't support xattrs. */ err = ovl_check_setxattr(ofs, upper, OVL_XATTR_ORIGIN, fh->buf, fh ? fh->fb.len : 0, 0); - kfree(fh); /* Ignore -EPERM from setting "user.*" on symlink/special */ return err == -EPERM ? 0 : err; @@ -484,7 +483,7 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper, * * Caller must hold i_mutex on indexdir. */ -static int ovl_create_index(struct dentry *dentry, struct dentry *origin, +static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh, struct dentry *upper) { struct ovl_fs *ofs = OVL_FS(dentry->d_sb); @@ -510,7 +509,7 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin, if (WARN_ON(ovl_test_flag(OVL_INDEX, d_inode(dentry)))) return -EIO; - err = ovl_get_index_name(ofs, origin, &name); + err = ovl_get_index_name_fh(fh, &name); if (err) return err; @@ -549,6 +548,7 @@ struct ovl_copy_up_ctx { struct dentry *destdir; struct qstr destname; struct dentry *workdir; + const struct ovl_fh *origin_fh; bool origin; bool indexed; bool metacopy; @@ -649,7 +649,7 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp) * hard link. */ if (c->origin) { - err = ovl_set_origin(ofs, c->lowerpath.dentry, temp); + err = ovl_set_origin_fh(ofs, c->origin_fh, temp); if (err) return err; } @@ -772,7 +772,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) goto cleanup; if (S_ISDIR(c->stat.mode) && c->indexed) { - err = ovl_create_index(c->dentry, c->lowerpath.dentry, temp); + err = ovl_create_index(c->dentry, c->origin_fh, temp); if (err) goto cleanup; } @@ -890,6 +890,8 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) { int err; struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb); + struct dentry *origin = c->lowerpath.dentry; + struct ovl_fh *fh = NULL; bool to_index = false; /* @@ -906,17 +908,25 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) to_index = true; } - if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || to_index) + if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || to_index) { + fh = ovl_get_origin_fh(ofs, origin); + if (IS_ERR(fh)) + return PTR_ERR(fh); + + /* origin_fh may be NULL */ + c->origin_fh = fh; c->origin = true; + } if (to_index) { c->destdir = ovl_indexdir(c->dentry->d_sb); - err = ovl_get_index_name(ofs, c->lowerpath.dentry, &c->destname); + err = ovl_get_index_name(ofs, origin, &c->destname); if (err) - return err; + goto out_free_fh; } else if (WARN_ON(!c->parent)) { /* Disconnected dentry must be copied up to index dir */ - return -EIO; + err = -EIO; + goto out_free_fh; } else { /* * Mark parent "impure" because it may now contain non-pure @@ -926,7 +936,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) err = ovl_set_impure(c->parent, c->destdir); ovl_end_write(c->dentry); if (err) - return err; + goto out_free_fh; } /* Should we copyup with O_TMPFILE or with workdir? */ @@ -960,6 +970,8 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) out: if (to_index) kfree(c->destname.name); +out_free_fh: + kfree(fh); return err; } diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index 273a39d3e951..2d2ef671b36b 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -507,6 +507,19 @@ static int ovl_verify_fh(struct ovl_fs *ofs, struct dentry *dentry, return err; } +int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry, + enum ovl_xattr ox, const struct ovl_fh *fh, + bool is_upper, bool set) +{ + int err; + + err = ovl_verify_fh(ofs, dentry, ox, fh); + if (set && err == -ENODATA) + err = ovl_setxattr(ofs, dentry, ox, fh->buf, fh->fb.len); + + return err; +} + /* * Verify that @real dentry matches the file handle stored in xattr @name. * @@ -515,9 +528,9 @@ static int ovl_verify_fh(struct ovl_fs *ofs, struct dentry *dentry, * * Return 0 on match, -ESTALE on mismatch, -ENODATA on no xattr, < 0 on error. */ -int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry, - enum ovl_xattr ox, struct dentry *real, bool is_upper, - bool set) +int ovl_verify_origin_xattr(struct ovl_fs *ofs, struct dentry *dentry, + enum ovl_xattr ox, struct dentry *real, + bool is_upper, bool set) { struct inode *inode; struct ovl_fh *fh; @@ -530,9 +543,7 @@ int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry, goto fail; } - err = ovl_verify_fh(ofs, dentry, ox, fh); - if (set && err == -ENODATA) - err = ovl_setxattr(ofs, dentry, ox, fh->buf, fh->fb.len); + err = ovl_verify_set_fh(ofs, dentry, ox, fh, is_upper, set); if (err) goto fail; @@ -548,6 +559,7 @@ int ovl_verify_set_fh(struct ovl_fs *ofs, struct dentry *dentry, goto out; } + /* Get upper dentry from index */ struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index, bool connected) @@ -684,7 +696,7 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index) goto out; } -static int ovl_get_index_name_fh(struct ovl_fh *fh, struct qstr *name) +int ovl_get_index_name_fh(const struct ovl_fh *fh, struct qstr *name) { char *n, *s; @@ -873,20 +885,27 @@ int ovl_path_next(int idx, struct dentry *dentry, struct path *path) static int ovl_fix_origin(struct ovl_fs *ofs, struct dentry *dentry, struct dentry *lower, struct dentry *upper) { + const struct ovl_fh *fh; int err; if (ovl_check_origin_xattr(ofs, upper)) return 0; + fh = ovl_get_origin_fh(ofs, lower); + if (IS_ERR(fh)) + return PTR_ERR(fh); + err = ovl_want_write(dentry); if (err) - return err; + goto out; - err = ovl_set_origin(ofs, lower, upper); + err = ovl_set_origin_fh(ofs, fh, upper); if (!err) err = ovl_set_impure(dentry->d_parent, upper->d_parent); ovl_drop_write(dentry); +out: + kfree(fh); return err; } diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index f37d2cb86404..2c776ccf3025 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -887,15 +887,20 @@ static int ovl_get_indexdir(struct super_block *sb, struct ovl_fs *ofs, { struct vfsmount *mnt = ovl_upper_mnt(ofs); struct dentry *indexdir; + struct dentry *origin = ovl_lowerstack(oe)->dentry; + const struct ovl_fh *fh; int err; + fh = ovl_get_origin_fh(ofs, origin); + if (IS_ERR(fh)) + return PTR_ERR(fh); + err = mnt_want_write(mnt); if (err) - return err; + goto out_free_fh; /* Verify lower root is upper root origin */ - err = ovl_verify_origin(ofs, upperpath->dentry, - ovl_lowerstack(oe)->dentry, true); + err = ovl_verify_origin_fh(ofs, upperpath->dentry, fh, true); if (err) { pr_err("failed to verify upper root origin\n"); goto out; @@ -927,9 +932,10 @@ static int ovl_get_indexdir(struct super_block *sb, struct ovl_fs *ofs, * directory entries. */ if (ovl_check_origin_xattr(ofs, ofs->indexdir)) { - err = ovl_verify_set_fh(ofs, ofs->indexdir, - OVL_XATTR_ORIGIN, - upperpath->dentry, true, false); + err = ovl_verify_origin_xattr(ofs, ofs->indexdir, + OVL_XATTR_ORIGIN, + upperpath->dentry, true, + false); if (err) pr_err("failed to verify index dir 'origin' xattr\n"); } @@ -947,6 +953,8 @@ static int ovl_get_indexdir(struct super_block *sb, struct ovl_fs *ofs, out: mnt_drop_write(mnt); +out_free_fh: + kfree(fh); return err; } diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index a08901e24096..cecdd7f2ac38 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -1016,12 +1016,18 @@ static void ovl_cleanup_index(struct dentry *dentry) struct dentry *index = NULL; struct inode *inode; struct qstr name = { }; + bool got_write = false; int err; err = ovl_get_index_name(ofs, lowerdentry, &name); if (err) goto fail; + err = ovl_want_write(dentry); + if (err) + goto fail; + + got_write = true; inode = d_inode(upperdentry); if (!S_ISDIR(inode->i_mode) && inode->i_nlink != 1) { pr_warn_ratelimited("cleanup linked index (%pd2, ino=%lu, nlink=%u)\n", @@ -1059,6 +1065,8 @@ static void ovl_cleanup_index(struct dentry *dentry) goto fail; out: + if (got_write) + ovl_drop_write(dentry); kfree(name.name); dput(index); return; @@ -1138,6 +1146,8 @@ void ovl_nlink_end(struct dentry *dentry) { struct inode *inode = d_inode(dentry); + ovl_drop_write(dentry); + if (ovl_test_flag(OVL_INDEX, inode) && inode->i_nlink == 0) { const struct cred *old_cred; @@ -1146,7 +1156,6 @@ void ovl_nlink_end(struct dentry *dentry) revert_creds(old_cred); } - ovl_drop_write(dentry); ovl_inode_unlock(inode); } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit ca7ab482401cf0a7497dad05f4918dc64115538b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The main callers of do_splice_direct() also call rw_verify_area() for the entire range that is being copied, e.g. by vfs_copy_file_range() or do_sendfile() before calling do_splice_direct(). The only caller that does not have those checks for entire range is ovl_copy_up_file(). In preparation for removing the checks inside do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file(). For extra safety, perform minimal sanity checks from rw_verify_area() for non negative offsets also in the copy up do_splice_direct() loop without calling the file permission hooks. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-2-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/copy_up.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 75059b830246..b8119520300d 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -230,6 +230,19 @@ static int ovl_copy_fileattr(struct inode *inode, const struct path *old, return ovl_real_fileattr_set(new, &newfa); } +static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen) +{ + loff_t tmp; + + if (WARN_ON_ONCE(pos != pos2)) + return -EIO; + if (WARN_ON_ONCE(pos < 0 || len < 0 || totlen < 0)) + return -EIO; + if (WARN_ON_ONCE(check_add_overflow(pos, len, &tmp))) + return -EIO; + return 0; +} + static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, struct file *new_file, loff_t len) { @@ -244,13 +257,20 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, int error = 0; ovl_path_lowerdata(dentry, &datapath); - if (WARN_ON(datapath.dentry == NULL)) + if (WARN_ON_ONCE(datapath.dentry == NULL) || + WARN_ON_ONCE(len < 0)) return -EIO; old_file = ovl_path_open(&datapath, O_LARGEFILE | O_RDONLY); if (IS_ERR(old_file)) return PTR_ERR(old_file); + error = rw_verify_area(READ, old_file, &old_pos, len); + if (!error) + error = rw_verify_area(WRITE, new_file, &new_pos, len); + if (error) + goto out_fput; + /* Try to use clone_file_range to clone up within the same fs */ ovl_start_write(dentry); cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0); @@ -309,6 +329,10 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, } } + error = ovl_verify_area(old_pos, new_pos, this_len, len); + if (error) + break; + ovl_start_write(dentry); bytes = do_splice_direct(old_file, &old_pos, new_file, &new_pos, -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 2a33e2ddc6ebf9b5468091aded8a38f57de9a580 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All callers of do_splice_direct() have a call to rw_verify_area() for the entire range that is being copied, e.g. by vfs_copy_file_range() or do_sendfile() before calling do_splice_direct(). The rw_verify_area() check inside do_splice_direct() is redundant and is called after sb_start_write(), so it is not "start-write-safe". Remove this redundant check. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-3-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/splice.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index d983d375ff11..6e917db6f49a 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1166,6 +1166,7 @@ static void direct_file_splice_eof(struct splice_desc *sd) * (splice in + splice out, as compared to just sendfile()). So this helper * can splice directly through a process-private pipe. * + * Callers already called rw_verify_area() on the entire range. */ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, loff_t *opos, size_t len, unsigned int flags) @@ -1187,10 +1188,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, if (unlikely(out->f_flags & O_APPEND)) return -EINVAL; - ret = rw_verify_area(WRITE, out, opos, len); - if (unlikely(ret < 0)) - return ret; - ret = splice_direct_to_actor(in, &sd, direct_splice_actor); if (ret > 0) *ppos = sd.pos; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit feebea75bdf499aefd11d0df7b02d384a9f92fc1 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- vfs_splice_read() has a permission hook inside rw_verify_area() and it is called from do_splice_direct() -> splice_direct_to_actor(). The callers of do_splice_direct() (e.g. vfs_copy_file_range()) already call rw_verify_area() for the entire range, but the other caller of splice_direct_to_actor() (nfsd) does not. Add the rw_verify_area() checks in nfsd_splice_read() and use a variant of vfs_splice_read() without rw_verify_area() check in splice_direct_to_actor() to avoid the redundant rw_verify_area() checks. This is needed for fanotify "pre content" events. Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-4-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/nfsd/vfs.c | 5 ++++- fs/splice.c | 58 +++++++++++++++++++++++++++++++-------------------- 2 files changed, 39 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index b3e51d88faff..396dc9116147 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1025,7 +1025,10 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp, ssize_t host_err; trace_nfsd_read_splice(rqstp, fhp, offset, *count); - host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor); + host_err = rw_verify_area(READ, file, &offset, *count); + if (!host_err) + host_err = splice_direct_to_actor(file, &sd, + nfsd_direct_splice_actor); return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); } diff --git a/fs/splice.c b/fs/splice.c index 6e917db6f49a..6fc2c27e9520 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -944,27 +944,15 @@ static void do_splice_eof(struct splice_desc *sd) sd->splice_eof(sd); } -/** - * vfs_splice_read - Read data from a file and splice it into a pipe - * @in: File to splice from - * @ppos: Input file offset - * @pipe: Pipe to splice to - * @len: Number of bytes to splice - * @flags: Splice modifier flags (SPLICE_F_*) - * - * Splice the requested amount of data from the input file to the pipe. This - * is synchronous as the caller must hold the pipe lock across the entire - * operation. - * - * If successful, it returns the amount of data spliced, 0 if it hit the EOF or - * a hole and a negative error code otherwise. +/* + * Callers already called rw_verify_area() on the entire range. + * No need to call it for sub ranges. */ -long vfs_splice_read(struct file *in, loff_t *ppos, - struct pipe_inode_info *pipe, size_t len, - unsigned int flags) +static long do_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) { unsigned int p_space; - int ret; if (unlikely(!(in->f_mode & FMODE_READ))) return -EBADF; @@ -975,10 +963,6 @@ long vfs_splice_read(struct file *in, loff_t *ppos, p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail); len = min_t(size_t, len, p_space << PAGE_SHIFT); - ret = rw_verify_area(READ, in, ppos, len); - if (unlikely(ret < 0)) - return ret; - if (unlikely(len > MAX_RW_COUNT)) len = MAX_RW_COUNT; @@ -992,6 +976,34 @@ long vfs_splice_read(struct file *in, loff_t *ppos, return copy_splice_read(in, ppos, pipe, len, flags); return in->f_op->splice_read(in, ppos, pipe, len, flags); } + +/** + * vfs_splice_read - Read data from a file and splice it into a pipe + * @in: File to splice from + * @ppos: Input file offset + * @pipe: Pipe to splice to + * @len: Number of bytes to splice + * @flags: Splice modifier flags (SPLICE_F_*) + * + * Splice the requested amount of data from the input file to the pipe. This + * is synchronous as the caller must hold the pipe lock across the entire + * operation. + * + * If successful, it returns the amount of data spliced, 0 if it hit the EOF or + * a hole and a negative error code otherwise. + */ +long vfs_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + int ret; + + ret = rw_verify_area(READ, in, ppos, len); + if (unlikely(ret < 0)) + return ret; + + return do_splice_read(in, ppos, pipe, len, flags); +} EXPORT_SYMBOL_GPL(vfs_splice_read); /** @@ -1066,7 +1078,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, size_t read_len; loff_t pos = sd->pos, prev_pos = pos; - ret = vfs_splice_read(in, &pos, pipe, len, flags); + ret = do_splice_read(in, &pos, pipe, len, flags); if (unlikely(ret <= 0)) goto read_failure; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit b70d8e2b8ce56c79d9d18d20955e6de1631e9509 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- vfs_splice_read() has a permission hook inside rw_verify_area() and it is called from splice_file_to_pipe(), which is called from do_splice() and do_sendfile(). do_sendfile() already has a rw_verify_area() check for the entire range. do_splice() has a rw_verify_check() for the splice to file case, not for the splice from file case. Add the rw_verify_area() check for splice from file case in do_splice() and use a variant of vfs_splice_read() without rw_verify_area() check in splice_file_to_pipe() to avoid the redundant rw_verify_area() checks. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-5-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/splice.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index 6fc2c27e9520..d4fdd44c0b32 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1239,7 +1239,7 @@ long splice_file_to_pipe(struct file *in, pipe_lock(opipe); ret = wait_for_space(opipe, flags); if (!ret) - ret = vfs_splice_read(in, offset, opipe, len, flags); + ret = do_splice_read(in, offset, opipe, len, flags); pipe_unlock(opipe); if (ret > 0) wakeup_pipe_readers(opipe); @@ -1316,6 +1316,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, offset = in->f_pos; } + ret = rw_verify_area(READ, in, &offset, len); + if (unlikely(ret < 0)) + return ret; + if (out->f_flags & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit d53471ba6f7ae97a4e223539029528108b705af1 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All the callers of ->splice_write(), (e.g. do_splice_direct() and do_splice()) already check rw_verify_area() for the entire range and perform all the other checks that are in vfs_write_iter(). Instead of creating another tiny helper for special caller, just open-code it. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-6-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/splice.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index d4fdd44c0b32..3fce5f6072dd 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -673,10 +673,13 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, .u.file = out, }; int nbufs = pipe->max_usage; - struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), - GFP_KERNEL); + struct bio_vec *array; ssize_t ret; + if (!out->f_op->write_iter) + return -EINVAL; + + array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); if (unlikely(!array)) return -ENOMEM; @@ -684,6 +687,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, splice_from_pipe_begin(&sd); while (sd.total_len) { + struct kiocb kiocb; struct iov_iter from; unsigned int head, tail, mask; size_t left; @@ -733,7 +737,10 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, } iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left); - ret = vfs_iter_write(out, &from, &sd.pos, 0); + init_sync_kiocb(&kiocb, out); + kiocb.ki_pos = sd.pos; + ret = call_write_iter(out, &kiocb, &from); + sd.pos = kiocb.ki_pos; if (ret <= 0) break; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit dfad37051ade6ac0d404ef4913f3bd01954ee51c category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In many of the vfs helpers, file permission hook is called before taking sb_start_write(), making them "start-write-safe". do_clone_file_range() is an exception to this rule. do_clone_file_range() has two callers - vfs_clone_file_range() and overlayfs. Move remap_verify_area() checks from do_clone_file_range() out to vfs_clone_file_range() to make them "start-write-safe". Overlayfs already has calls to rw_verify_area() with the same security permission hooks as remap_verify_area() has. The rest of the checks in remap_verify_area() are irrelevant for overlayfs that calls do_clone_file_range() offset 0 and positive length. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-7-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/remap_range.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/remap_range.c b/fs/remap_range.c index 87ae4f0dc3aa..42f79cb2b1b1 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -385,14 +385,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, if (!file_in->f_op->remap_file_range) return -EOPNOTSUPP; - ret = remap_verify_area(file_in, pos_in, len, false); - if (ret) - return ret; - - ret = remap_verify_area(file_out, pos_out, len, true); - if (ret) - return ret; - ret = file_in->f_op->remap_file_range(file_in, pos_in, file_out, pos_out, len, remap_flags); if (ret < 0) @@ -410,6 +402,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, { loff_t ret; + ret = remap_verify_area(file_in, pos_in, len, false); + if (ret) + return ret; + + ret = remap_verify_area(file_out, pos_out, len, true); + if (ret) + return ret; + file_start_write(file_out); ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len, remap_flags); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 0b5263d12aed0437c1bdb7ba0be27437fc12c274 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In vfs code, file_start_write() is usually called after the permission hook in rw_verify_area(). vfs_dedupe_file_range_one() is an exception to this rule. In vfs_dedupe_file_range_one(), move file_start_write() to after the the rw_verify_area() checks to make them "start-write-safe". This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-8-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/remap_range.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/fs/remap_range.c b/fs/remap_range.c index 42f79cb2b1b1..12131f2a6c9e 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -420,7 +420,7 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, EXPORT_SYMBOL(vfs_clone_file_range); /* Check whether we are allowed to dedupe the destination file */ -static bool allow_file_dedupe(struct file *file) +static bool may_dedupe_file(struct file *file) { struct mnt_idmap *idmap = file_mnt_idmap(file); struct inode *inode = file_inode(file); @@ -445,24 +445,29 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, WARN_ON_ONCE(remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_CAN_SHORTEN)); - ret = mnt_want_write_file(dst_file); - if (ret) - return ret; - /* * This is redundant if called from vfs_dedupe_file_range(), but other * callers need it and it's not performance sesitive... */ ret = remap_verify_area(src_file, src_pos, len, false); if (ret) - goto out_drop_write; + return ret; ret = remap_verify_area(dst_file, dst_pos, len, true); if (ret) - goto out_drop_write; + return ret; + + /* + * This needs to be called after remap_verify_area() because of + * sb_start_write() and before may_dedupe_file() because the mount's + * MAY_WRITE need to be checked with mnt_get_write_access_file() held. + */ + ret = mnt_want_write_file(dst_file); + if (ret) + return ret; ret = -EPERM; - if (!allow_file_dedupe(dst_file)) + if (!may_dedupe_file(dst_file)) goto out_drop_write; ret = -EXDEV; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 2f4d8ad82511a336f8a805e3759c5189a25bb286 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In vfs code, file_start_write() is usually called after the permission hook in rw_verify_area(). btrfs_ioctl_encoded_write() in an exception to this rule. Move file_start_write() to after the rw_verify_area() check in encoded write to make the permission hook "start-write-safe". This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-9-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/btrfs/ioctl.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d61bd816808c..3d1cb018f3c4 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4559,29 +4559,29 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool if (ret < 0) goto out_acct; - file_start_write(file); - if (iov_iter_count(&iter) == 0) { ret = 0; - goto out_end_write; + goto out_iov; } pos = args.offset; ret = rw_verify_area(WRITE, file, &pos, args.len); if (ret < 0) - goto out_end_write; + goto out_iov; init_sync_kiocb(&kiocb, file); ret = kiocb_set_rw_flags(&kiocb, 0); if (ret) - goto out_end_write; + goto out_iov; kiocb.ki_pos = pos; + file_start_write(file); + ret = btrfs_do_write_iter(&kiocb, &iter, &args); if (ret > 0) fsnotify_modify(file); -out_end_write: file_end_write(file); +out_iov: kfree(iov); out_acct: if (ret > 0) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit e389b76a7ee1b62392ab52c22f9ba81f23145824 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The coda host file is a backing file for the coda inode on a different filesystem than the coda inode. Change the locking order to take the coda inode lock before taking the backing host file freeze protection, same as in ovl_write_iter() and in network filesystems that use cachefiles. Link: https://lore.kernel.org/r/CAOQ4uxjcnwuF1gMxe64WLODGA_MyAy8x-DtqkCUxqVQKk3Xbn... Acked-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-10-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/coda/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/coda/file.c b/fs/coda/file.c index 42346618b4ed..6e671fc03f64 100644 --- a/fs/coda/file.c +++ b/fs/coda/file.c @@ -79,14 +79,14 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to) if (ret) goto finish_write; - file_start_write(host_file); inode_lock(coda_inode); + file_start_write(host_file); ret = vfs_iter_write(cfi->cfi_container, to, &iocb->ki_pos, 0); coda_inode->i_size = file_inode(host_file)->i_size; coda_inode->i_blocks = (coda_inode->i_size + 511) >> 9; coda_inode->i_mtime = inode_set_ctime_current(coda_inode); - inode_unlock(coda_inode); file_end_write(host_file); + inode_unlock(coda_inode); finish_write: venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode), -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 269aed7014b3db9acdbc5a5e163d8a6c62e0e770 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- All the callers of vfs_iter_write() call file_start_write() just before calling vfs_iter_write() except for target_core_file's fd_do_rw(). Move file_start_write() from the callers into vfs_iter_write(). fd_do_rw() calls vfs_iter_write() with a non-regular file, so file_start_write() is a no-op. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- drivers/block/loop.c | 2 -- drivers/target/target_core_file.c | 10 +++------- fs/coda/file.c | 2 -- fs/nfsd/vfs.c | 2 -- fs/overlayfs/file.c | 2 -- fs/read_write.c | 13 ++++++++++--- 6 files changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 886c63599037..bd0e138ea6b6 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -240,9 +240,7 @@ static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos) iov_iter_bvec(&i, ITER_SOURCE, bvec, 1, bvec->bv_len); - file_start_write(file); bw = vfs_iter_write(file, &i, ppos, 0); - file_end_write(file); if (likely(bw == bvec->bv_len)) return 0; diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index 4e4cf6c34a77..4d447520bab8 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -332,13 +332,11 @@ static int fd_do_rw(struct se_cmd *cmd, struct file *fd, } iov_iter_bvec(&iter, is_write, bvec, sgl_nents, len); - if (is_write) { - file_start_write(fd); + if (is_write) ret = vfs_iter_write(fd, &iter, &pos, 0); - file_end_write(fd); - } else { + else ret = vfs_iter_read(fd, &iter, &pos, 0); - } + if (is_write) { if (ret < 0 || ret != data_length) { pr_err("%s() write returned %d\n", __func__, ret); @@ -469,9 +467,7 @@ fd_execute_write_same(struct se_cmd *cmd) } iov_iter_bvec(&iter, ITER_SOURCE, bvec, nolb, len); - file_start_write(fd_dev->fd_file); ret = vfs_iter_write(fd_dev->fd_file, &iter, &pos, 0); - file_end_write(fd_dev->fd_file); kfree(bvec); if (ret < 0 || ret != len) { diff --git a/fs/coda/file.c b/fs/coda/file.c index 6e671fc03f64..5e5ea5bc3701 100644 --- a/fs/coda/file.c +++ b/fs/coda/file.c @@ -80,12 +80,10 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to) goto finish_write; inode_lock(coda_inode); - file_start_write(host_file); ret = vfs_iter_write(cfi->cfi_container, to, &iocb->ki_pos, 0); coda_inode->i_size = file_inode(host_file)->i_size; coda_inode->i_blocks = (coda_inode->i_size + 511) >> 9; coda_inode->i_mtime = inode_set_ctime_current(coda_inode); - file_end_write(host_file); inode_unlock(coda_inode); finish_write: diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 396dc9116147..f65bd7b686d8 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1165,9 +1165,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, since = READ_ONCE(file->f_wb_err); if (verf) nfsd_copy_write_verifier(verf, nn); - file_start_write(file); host_err = vfs_iter_write(file, &iter, &pos, flags); - file_end_write(file); if (host_err < 0) { nfsd_reset_write_verifier(nn); trace_nfsd_writeverf_reset(nn, rqstp, host_err); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index acdd79dd4bfa..352836288943 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -433,9 +433,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) if (is_sync_kiocb(iocb)) { rwf_t rwf = iocb_to_rw_flags(ifl); - file_start_write(real.file); ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf); - file_end_write(real.file); /* Update size */ ovl_file_modified(file); } else { diff --git a/fs/read_write.c b/fs/read_write.c index dd5c90675f51..be8e05e8aa7f 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -841,7 +841,7 @@ ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, EXPORT_SYMBOL(vfs_iter_read); static ssize_t do_iter_write(struct file *file, struct iov_iter *iter, - loff_t *pos, rwf_t flags) + loff_t *pos, rwf_t flags) { size_t tot_len; ssize_t ret = 0; @@ -896,11 +896,18 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, EXPORT_SYMBOL(vfs_iocb_iter_write); ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos, - rwf_t flags) + rwf_t flags) { + int ret; + if (!file->f_op->write_iter) return -EINVAL; - return do_iter_write(file, iter, ppos, flags); + + file_start_write(file); + ret = do_iter_write(file, iter, ppos, flags); + file_end_write(file); + + return ret; } EXPORT_SYMBOL(vfs_iter_write); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 1c8aa833034a00617866ea4738a40491e3e23902 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In many of the vfs helpers, the rw_verity_area() checks are called before taking sb_start_write(), making them "start-write-safe". do_iter_write() is an exception to this rule. do_iter_write() has two callers - vfs_iter_write() and vfs_writev(). Move rw_verify_area() and other checks from do_iter_write() out to its callers to make them "start-write-safe". Move also the fsnotify_modify() hook to align with similar pattern used in vfs_write() and other vfs helpers. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-12-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/read_write.c | 86 +++++++++++++++++++++++++++---------------------- 1 file changed, 48 insertions(+), 38 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index be8e05e8aa7f..dd1457b9209a 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -840,33 +840,6 @@ ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, } EXPORT_SYMBOL(vfs_iter_read); -static ssize_t do_iter_write(struct file *file, struct iov_iter *iter, - loff_t *pos, rwf_t flags) -{ - size_t tot_len; - ssize_t ret = 0; - - if (!(file->f_mode & FMODE_WRITE)) - return -EBADF; - if (!(file->f_mode & FMODE_CAN_WRITE)) - return -EINVAL; - - tot_len = iov_iter_count(iter); - if (!tot_len) - return 0; - ret = rw_verify_area(WRITE, file, pos, tot_len); - if (ret < 0) - return ret; - - if (file->f_op->write_iter) - ret = do_iter_readv_writev(file, iter, pos, WRITE, flags); - else - ret = do_loop_readv_writev(file, iter, pos, WRITE, flags); - if (ret > 0) - fsnotify_modify(file); - return ret; -} - ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, struct iov_iter *iter) { @@ -898,13 +871,28 @@ EXPORT_SYMBOL(vfs_iocb_iter_write); ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags) { - int ret; + size_t tot_len; + ssize_t ret; + if (!(file->f_mode & FMODE_WRITE)) + return -EBADF; + if (!(file->f_mode & FMODE_CAN_WRITE)) + return -EINVAL; if (!file->f_op->write_iter) return -EINVAL; + tot_len = iov_iter_count(iter); + if (!tot_len) + return 0; + + ret = rw_verify_area(WRITE, file, ppos, tot_len); + if (ret < 0) + return ret; + file_start_write(file); - ret = do_iter_write(file, iter, ppos, flags); + ret = do_iter_readv_writev(file, iter, ppos, WRITE, flags); + if (ret > 0) + fsnotify_modify(file); file_end_write(file); return ret; @@ -929,20 +917,42 @@ static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec, } static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec, - unsigned long vlen, loff_t *pos, rwf_t flags) + unsigned long vlen, loff_t *pos, rwf_t flags) { struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; - ssize_t ret; + size_t tot_len; + ssize_t ret = 0; - ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); - if (ret >= 0) { - file_start_write(file); - ret = do_iter_write(file, &iter, pos, flags); - file_end_write(file); - kfree(iov); - } + if (!(file->f_mode & FMODE_WRITE)) + return -EBADF; + if (!(file->f_mode & FMODE_CAN_WRITE)) + return -EINVAL; + + ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, + &iter); + if (ret < 0) + return ret; + + tot_len = iov_iter_count(&iter); + if (!tot_len) + goto out; + + ret = rw_verify_area(WRITE, file, pos, tot_len); + if (ret < 0) + goto out; + + file_start_write(file); + if (file->f_op->write_iter) + ret = do_iter_readv_writev(file, &iter, pos, WRITE, flags); + else + ret = do_loop_readv_writev(file, &iter, pos, WRITE, flags); + if (ret > 0) + fsnotify_modify(file); + file_end_write(file); +out: + kfree(iov); return ret; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit b8e1425bae856b189e2365ff795e30fdd9e77049 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- We recently moved fsnotify hook, rw_verify_area() and other checks from do_iter_write() out to its two callers. for consistency, do the same thing for do_iter_read() - move the rw_verify_area() checks and fsnotify hook to the callers vfs_iter_read() and vfs_readv(). This aligns those vfs helpers with the pattern used in vfs_read() and vfs_iocb_iter_read() and the vfs write helpers, where all the checks are in the vfs helpers and the do_* or call_* helpers do the work. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-13-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/read_write.c | 66 +++++++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 27 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index dd1457b9209a..a43dd3915c1a 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -775,12 +775,14 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, return ret; } -static ssize_t do_iter_read(struct file *file, struct iov_iter *iter, - loff_t *pos, rwf_t flags) +ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb, + struct iov_iter *iter) { size_t tot_len; ssize_t ret = 0; + if (!file->f_op->read_iter) + return -EINVAL; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) @@ -789,22 +791,20 @@ static ssize_t do_iter_read(struct file *file, struct iov_iter *iter, tot_len = iov_iter_count(iter); if (!tot_len) goto out; - ret = rw_verify_area(READ, file, pos, tot_len); + ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len); if (ret < 0) return ret; - if (file->f_op->read_iter) - ret = do_iter_readv_writev(file, iter, pos, READ, flags); - else - ret = do_loop_readv_writev(file, iter, pos, READ, flags); + ret = call_read_iter(file, iocb, iter); out: if (ret >= 0) fsnotify_access(file); return ret; } +EXPORT_SYMBOL(vfs_iocb_iter_read); -ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb, - struct iov_iter *iter) +ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, + rwf_t flags) { size_t tot_len; ssize_t ret = 0; @@ -819,25 +819,16 @@ ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb, tot_len = iov_iter_count(iter); if (!tot_len) goto out; - ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len); + ret = rw_verify_area(READ, file, ppos, tot_len); if (ret < 0) return ret; - ret = call_read_iter(file, iocb, iter); + ret = do_iter_readv_writev(file, iter, ppos, READ, flags); out: if (ret >= 0) fsnotify_access(file); return ret; } -EXPORT_SYMBOL(vfs_iocb_iter_read); - -ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, - rwf_t flags) -{ - if (!file->f_op->read_iter) - return -EINVAL; - return do_iter_read(file, iter, ppos, flags); -} EXPORT_SYMBOL(vfs_iter_read); ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, @@ -900,19 +891,40 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos, EXPORT_SYMBOL(vfs_iter_write); static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec, - unsigned long vlen, loff_t *pos, rwf_t flags) + unsigned long vlen, loff_t *pos, rwf_t flags) { struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; - ssize_t ret; + size_t tot_len; + ssize_t ret = 0; - ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); - if (ret >= 0) { - ret = do_iter_read(file, &iter, pos, flags); - kfree(iov); - } + if (!(file->f_mode & FMODE_READ)) + return -EBADF; + if (!(file->f_mode & FMODE_CAN_READ)) + return -EINVAL; + + ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, + &iter); + if (ret < 0) + return ret; + + tot_len = iov_iter_count(&iter); + if (!tot_len) + goto out; + + ret = rw_verify_area(READ, file, pos, tot_len); + if (ret < 0) + goto out; + if (file->f_op->read_iter) + ret = do_iter_readv_writev(file, &iter, pos, READ, flags); + else + ret = do_loop_readv_writev(file, &iter, pos, READ, flags); +out: + if (ret >= 0) + fsnotify_access(file); + kfree(iov); return ret; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 6ae654392bb516a0baa47fed1f085d84e8cad739 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In vfs code, sb_start_write() is usually called after the permission hook in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule, where kiocb_start_write() is called by its callers. Move kiocb_start_write() from the callers into vfs_iocb_iter_write() after the rw_verify_area() checks, to make them "start-write-safe". The semantics of vfs_iocb_iter_write() is changed, so that the caller is responsible for calling kiocb_end_write() on completion only if async iocb was queued. The completion handlers of both callers were adapted to this semantic change. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/cachefiles/io.c | 5 ++--- fs/overlayfs/file.c | 8 ++++---- fs/read_write.c | 7 +++++++ 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index e55c7cc6a997..f7a507ddd668 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -268,7 +268,8 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret) _enter("%ld", ret); - kiocb_end_write(iocb); + if (ki->was_async) + kiocb_end_write(iocb); if (ret < 0) trace_cachefiles_io_error(object, inode, ret, @@ -331,8 +332,6 @@ int __cachefiles_write(struct cachefiles_object *object, ki->iocb.ki_complete = cachefiles_write_complete; atomic_long_add(ki->b_writing, &cache->b_writing); - kiocb_start_write(&ki->iocb); - get_file(ki->iocb.ki_filp); cachefiles_grab_object(object, cachefiles_obj_get_ioreq); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 352836288943..0d0ba0240fa5 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -292,10 +292,8 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req) struct kiocb *iocb = &aio_req->iocb; struct kiocb *orig_iocb = aio_req->orig_iocb; - if (iocb->ki_flags & IOCB_WRITE) { - kiocb_end_write(iocb); + if (iocb->ki_flags & IOCB_WRITE) ovl_file_modified(orig_iocb->ki_filp); - } orig_iocb->ki_pos = iocb->ki_pos; ovl_aio_put(aio_req); @@ -307,6 +305,9 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res) struct ovl_aio_req, iocb); struct kiocb *orig_iocb = aio_req->orig_iocb; + if (iocb->ki_flags & IOCB_WRITE) + kiocb_end_write(iocb); + ovl_aio_cleanup_handler(aio_req); orig_iocb->ki_complete(orig_iocb, res); } @@ -453,7 +454,6 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) aio_req->iocb.ki_flags = ifl; aio_req->iocb.ki_complete = ovl_aio_queue_completion; refcount_set(&aio_req->ref, 2); - kiocb_start_write(&aio_req->iocb); ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); ovl_aio_put(aio_req); if (ret != -EIOCBQUEUED) diff --git a/fs/read_write.c b/fs/read_write.c index a43dd3915c1a..efd4f0fd878b 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -831,6 +831,10 @@ ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, } EXPORT_SYMBOL(vfs_iter_read); +/* + * Caller is responsible for calling kiocb_end_write() on completion + * if async iocb was queued. + */ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, struct iov_iter *iter) { @@ -851,7 +855,10 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, if (ret < 0) return ret; + kiocb_start_write(iocb); ret = call_write_iter(file, iocb, iter); + if (ret != -EIOCBQUEUED) + kiocb_end_write(iocb); if (ret > 0) fsnotify_modify(file); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 8802e580ee643e3f63c6b39ff64e7c7baa4a55ba category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Similar to sb_write_started() for use by other sb freeze levels. Unlike the boolean sb_write_started(), this helper returns a tristate to distiguish the cases of lockdep disabled or unknown lock state. This is needed for fanotify "pre content" events. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-15-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 72bd9a001ff3..bacf140e45db 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1691,9 +1691,23 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level) #define __sb_writers_release(sb, lev) \ percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) +/** + * __sb_write_started - check if sb freeze level is held + * @sb: the super we write to + * @level: the freeze level + * + * > 0 sb freeze level is held + * 0 sb freeze level is not held + * < 0 !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN + */ +static inline int __sb_write_started(const struct super_block *sb, int level) +{ + return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1); +} + static inline bool sb_write_started(const struct super_block *sb) { - return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1); + return __sb_write_started(sb, SB_FREEZE_WRITE); } /** -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 3d5cd4911e04683df8f4439fddd788e00a2510a8 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Convenience wrapper for sb_write_started(file_inode(inode)->i_sb)), which has a single occurrence in the code right now. Document the false negatives of those helpers, which makes them unusable to assert that sb_start_write() is not held. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-16-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 21 +++++++++++++++++++++ fs/read_write.c | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index bacf140e45db..a4a895e06fa8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1705,11 +1705,32 @@ static inline int __sb_write_started(const struct super_block *sb, int level) return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1); } +/** + * sb_write_started - check if SB_FREEZE_WRITE is held + * @sb: the super we write to + * + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN. + */ static inline bool sb_write_started(const struct super_block *sb) { return __sb_write_started(sb, SB_FREEZE_WRITE); } +/** + * file_write_started - check if SB_FREEZE_WRITE is held + * @file: the file we write to + * + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN. + * May be false positive with !S_ISREG, because file_start_write() has + * no effect on !S_ISREG. + */ +static inline bool file_write_started(const struct file *file) +{ + if (!S_ISREG(file_inode(file)->i_mode)) + return true; + return sb_write_started(file_inode(file)->i_sb); +} + /** * sb_end_write - drop write access to a superblock * @sb: the super we wrote to diff --git a/fs/read_write.c b/fs/read_write.c index efd4f0fd878b..804c3bd5207e 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1425,7 +1425,7 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags) { - lockdep_assert(sb_write_started(file_inode(file_out)->i_sb)); + lockdep_assert(file_write_started(file_out)); return do_splice_direct(file_in, &pos_in, file_out, &pos_out, len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 21b32e6a0ab5b174fa1ca2fb4c212577cf405d83 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Create new helpers {sb,file}_write_not_started() that can be used to assert that sb_start_write() is not held. This is needed for fanotify "pre content" events. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-17-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index a4a895e06fa8..5f9cfb89106e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1716,6 +1716,17 @@ static inline bool sb_write_started(const struct super_block *sb) return __sb_write_started(sb, SB_FREEZE_WRITE); } +/** + * sb_write_not_started - check if SB_FREEZE_WRITE is not held + * @sb: the super we write to + * + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN. + */ +static inline bool sb_write_not_started(const struct super_block *sb) +{ + return __sb_write_started(sb, SB_FREEZE_WRITE) <= 0; +} + /** * file_write_started - check if SB_FREEZE_WRITE is held * @file: the file we write to @@ -1731,6 +1742,21 @@ static inline bool file_write_started(const struct file *file) return sb_write_started(file_inode(file)->i_sb); } +/** + * file_write_not_started - check if SB_FREEZE_WRITE is not held + * @file: the file we write to + * + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN. + * May be false positive with !S_ISREG, because file_start_write() has + * no effect on !S_ISREG. + */ +static inline bool file_write_not_started(const struct file *file) +{ + if (!S_ISREG(file_inode(file)->i_mode)) + return true; + return sb_write_not_started(file_inode(file)->i_sb); +} + /** * sb_end_write - drop write access to a superblock * @sb: the super we wrote to -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit f91a704f7161c2cf0fcd41fa9fbec4355b813fff category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In preparation for factoring out some backing file io helpers from overlayfs, move backing_file_open() into a new file fs/backing-file.c and header. Add a MAINTAINERS entry for stackable filesystems and add a Kconfig FS_STACK which stackable filesystems need to select. For now, the backing_file struct, the backing_file alloc/free functions and the backing_file_real_path() accessor remain internal to file_table.c. We may change that in the future. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/backing-file.h | 17 +++++++++++++ include/linux/fs.h | 3 --- fs/backing-file.c | 48 ++++++++++++++++++++++++++++++++++++ fs/open.c | 38 ---------------------------- fs/overlayfs/file.c | 1 + MAINTAINERS | 9 +++++++ fs/Kconfig | 4 +++ fs/Makefile | 1 + fs/overlayfs/Kconfig | 1 + 9 files changed, 81 insertions(+), 41 deletions(-) create mode 100644 include/linux/backing-file.h create mode 100644 fs/backing-file.c diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h new file mode 100644 index 000000000000..55c9e804f780 --- /dev/null +++ b/include/linux/backing-file.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Common helpers for stackable filesystems and backing files. + * + * Copyright (C) 2023 CTERA Networks. + */ + +#ifndef _LINUX_BACKING_FILE_H +#define _LINUX_BACKING_FILE_H + +#include <linux/file.h> + +struct file *backing_file_open(const struct path *user_path, int flags, + const struct path *real_path, + const struct cred *cred); + +#endif /* _LINUX_BACKING_FILE_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 5f9cfb89106e..9dc42354a781 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2656,9 +2656,6 @@ struct file *dentry_open(const struct path *path, int flags, const struct cred *creds); struct file *dentry_create(const struct path *path, int flags, umode_t mode, const struct cred *cred); -struct file *backing_file_open(const struct path *user_path, int flags, - const struct path *real_path, - const struct cred *cred); struct path *backing_file_user_path(struct file *f); /* diff --git a/fs/backing-file.c b/fs/backing-file.c new file mode 100644 index 000000000000..04b33036f709 --- /dev/null +++ b/fs/backing-file.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Common helpers for stackable filesystems and backing files. + * + * Copyright (C) 2023 CTERA Networks. + */ + +#include <linux/fs.h> +#include <linux/backing-file.h> + +#include "internal.h" + +/** + * backing_file_open - open a backing file for kernel internal use + * @user_path: path that the user reuqested to open + * @flags: open flags + * @real_path: path of the backing file + * @cred: credentials for open + * + * Open a backing file for a stackable filesystem (e.g., overlayfs). + * @user_path may be on the stackable filesystem and @real_path on the + * underlying filesystem. In this case, we want to be able to return the + * @user_path of the stackable filesystem. This is done by embedding the + * returned file into a container structure that also stores the stacked + * file's path, which can be retrieved using backing_file_user_path(). + */ +struct file *backing_file_open(const struct path *user_path, int flags, + const struct path *real_path, + const struct cred *cred) +{ + struct file *f; + int error; + + f = alloc_empty_backing_file(flags, cred); + if (IS_ERR(f)) + return f; + + path_get(user_path); + *backing_file_user_path(f) = *user_path; + error = vfs_open(real_path, f); + if (error) { + fput(f); + f = ERR_PTR(error); + } + + return f; +} +EXPORT_SYMBOL_GPL(backing_file_open); diff --git a/fs/open.c b/fs/open.c index ec0e471ef5cc..4679db501d43 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1177,44 +1177,6 @@ struct file *kernel_file_open(const struct path *path, int flags, } EXPORT_SYMBOL_GPL(kernel_file_open); -/** - * backing_file_open - open a backing file for kernel internal use - * @user_path: path that the user reuqested to open - * @flags: open flags - * @real_path: path of the backing file - * @cred: credentials for open - * - * Open a backing file for a stackable filesystem (e.g., overlayfs). - * @user_path may be on the stackable filesystem and @real_path on the - * underlying filesystem. In this case, we want to be able to return the - * @user_path of the stackable filesystem. This is done by embedding the - * returned file into a container structure that also stores the stacked - * file's path, which can be retrieved using backing_file_user_path(). - */ -struct file *backing_file_open(const struct path *user_path, int flags, - const struct path *real_path, - const struct cred *cred) -{ - struct file *f; - int error; - - f = alloc_empty_backing_file(flags, cred); - if (IS_ERR(f)) - return f; - - path_get(user_path); - *backing_file_user_path(f) = *user_path; - f->f_path = *real_path; - error = do_dentry_open(f, d_inode(real_path->dentry), NULL); - if (error) { - fput(f); - f = ERR_PTR(error); - } - - return f; -} -EXPORT_SYMBOL_GPL(backing_file_open); - #define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE)) #define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 0d0ba0240fa5..9c4272d10e99 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -13,6 +13,7 @@ #include <linux/security.h> #include <linux/mm.h> #include <linux/fs.h> +#include <linux/backing-file.h> #include "overlayfs.h" #include "../internal.h" /* for sb_init_dio_done_wq */ diff --git a/MAINTAINERS b/MAINTAINERS index c6a3ac61989c..501062b7875f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8058,6 +8058,15 @@ F: include/linux/fs_types.h F: include/uapi/linux/fs.h F: include/uapi/linux/openat2.h +FILESYSTEMS [STACKABLE] +M: Miklos Szeredi <miklos@szeredi.hu> +M: Amir Goldstein <amir73il@gmail.com> +L: linux-fsdevel@vger.kernel.org +L: linux-unionfs@vger.kernel.org +S: Maintained +F: fs/backing-file.c +F: include/linux/backing-file.h + FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER M: Riku Voipio <riku.voipio@iki.fi> L: linux-hwmon@vger.kernel.org diff --git a/fs/Kconfig b/fs/Kconfig index 9765213bf01d..3d1185e3e6a8 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -18,6 +18,10 @@ config VALIDATE_FS_PARSER config FS_IOMAP bool +# Stackable filesystems +config FS_STACK + bool + config BUFFER_HEAD bool diff --git a/fs/Makefile b/fs/Makefile index 45e54c34c309..81428bad22f0 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -40,6 +40,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o +obj-$(CONFIG_FS_STACK) += backing-file.o obj-$(CONFIG_FS_MBCACHE) += mbcache.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig index fec5020c3495..2ac67e04a6fb 100644 --- a/fs/overlayfs/Kconfig +++ b/fs/overlayfs/Kconfig @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only config OVERLAY_FS tristate "Overlay filesystem support" + select FS_STACK select EXPORTFS help An overlay filesystem combines two filesystems - an 'upper' filesystem -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit a6293b3e285cd0d7692141d7981a5f144f0e2f0b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Overlayfs submits files io to backing files on other filesystems. Factor out some common helpers to perform io to backing files, into fs/backing-file.c. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpeguhmZbjP3JLqtUy0AdWaHOkAPWeP827BBWwRFEAUgnUc... Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/overlayfs.h | 8 +- include/linux/backing-file.h | 15 +++ fs/backing-file.c | 210 +++++++++++++++++++++++++++++++++++ fs/overlayfs/file.c | 188 +++---------------------------- fs/overlayfs/super.c | 11 +- 5 files changed, 247 insertions(+), 185 deletions(-) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 6ac8b69549da..616d4f2e0fd8 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -414,6 +414,12 @@ int ovl_want_write(struct dentry *dentry); void ovl_drop_write(struct dentry *dentry); struct dentry *ovl_workdir(struct dentry *dentry); const struct cred *ovl_override_creds(struct super_block *sb); + +static inline const struct cred *ovl_creds(struct super_block *sb) +{ + return OVL_FS(sb)->creator_cred; +} + int ovl_can_decode_fh(struct super_block *sb); struct dentry *ovl_indexdir(struct super_block *sb); bool ovl_index_all(struct super_block *sb); @@ -822,8 +828,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir, /* file.c */ extern const struct file_operations ovl_file_operations; -int __init ovl_aio_request_cache_init(void); -void ovl_aio_request_cache_destroy(void); int ovl_real_fileattr_get(const struct path *realpath, struct fileattr *fa); int ovl_real_fileattr_set(const struct path *realpath, struct fileattr *fa); int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa); diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h index 55c9e804f780..0648d548a418 100644 --- a/include/linux/backing-file.h +++ b/include/linux/backing-file.h @@ -9,9 +9,24 @@ #define _LINUX_BACKING_FILE_H #include <linux/file.h> +#include <linux/uio.h> +#include <linux/fs.h> + +struct backing_file_ctx { + const struct cred *cred; + struct file *user_file; + void (*accessed)(struct file *); + void (*end_write)(struct file *); +}; struct file *backing_file_open(const struct path *user_path, int flags, const struct path *real_path, const struct cred *cred); +ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, + struct kiocb *iocb, int flags, + struct backing_file_ctx *ctx); +ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, + struct kiocb *iocb, int flags, + struct backing_file_ctx *ctx); #endif /* _LINUX_BACKING_FILE_H */ diff --git a/fs/backing-file.c b/fs/backing-file.c index 04b33036f709..323187a49da3 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -2,6 +2,9 @@ /* * Common helpers for stackable filesystems and backing files. * + * Forked from fs/overlayfs/file.c. + * + * Copyright (C) 2017 Red Hat, Inc. * Copyright (C) 2023 CTERA Networks. */ @@ -46,3 +49,210 @@ struct file *backing_file_open(const struct path *user_path, int flags, return f; } EXPORT_SYMBOL_GPL(backing_file_open); + +struct backing_aio { + struct kiocb iocb; + refcount_t ref; + struct kiocb *orig_iocb; + /* used for aio completion */ + void (*end_write)(struct file *); + struct work_struct work; + long res; +}; + +static struct kmem_cache *backing_aio_cachep; + +#define BACKING_IOCB_MASK \ + (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND) + +static rwf_t iocb_to_rw_flags(int flags) +{ + return (__force rwf_t)(flags & BACKING_IOCB_MASK); +} + +static void backing_aio_put(struct backing_aio *aio) +{ + if (refcount_dec_and_test(&aio->ref)) { + fput(aio->iocb.ki_filp); + kmem_cache_free(backing_aio_cachep, aio); + } +} + +static void backing_aio_cleanup(struct backing_aio *aio, long res) +{ + struct kiocb *iocb = &aio->iocb; + struct kiocb *orig_iocb = aio->orig_iocb; + + if (aio->end_write) + aio->end_write(orig_iocb->ki_filp); + + orig_iocb->ki_pos = iocb->ki_pos; + backing_aio_put(aio); +} + +static void backing_aio_rw_complete(struct kiocb *iocb, long res) +{ + struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb); + struct kiocb *orig_iocb = aio->orig_iocb; + + if (iocb->ki_flags & IOCB_WRITE) + kiocb_end_write(iocb); + + backing_aio_cleanup(aio, res); + orig_iocb->ki_complete(orig_iocb, res); +} + +static void backing_aio_complete_work(struct work_struct *work) +{ + struct backing_aio *aio = container_of(work, struct backing_aio, work); + + backing_aio_rw_complete(&aio->iocb, aio->res); +} + +static void backing_aio_queue_completion(struct kiocb *iocb, long res) +{ + struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb); + + /* + * Punt to a work queue to serialize updates of mtime/size. + */ + aio->res = res; + INIT_WORK(&aio->work, backing_aio_complete_work); + queue_work(file_inode(aio->orig_iocb->ki_filp)->i_sb->s_dio_done_wq, + &aio->work); +} + +static int backing_aio_init_wq(struct kiocb *iocb) +{ + struct super_block *sb = file_inode(iocb->ki_filp)->i_sb; + + if (sb->s_dio_done_wq) + return 0; + + return sb_init_dio_done_wq(sb); +} + + +ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, + struct kiocb *iocb, int flags, + struct backing_file_ctx *ctx) +{ + struct backing_aio *aio = NULL; + const struct cred *old_cred; + ssize_t ret; + + if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING))) + return -EIO; + + if (!iov_iter_count(iter)) + return 0; + + if (iocb->ki_flags & IOCB_DIRECT && + !(file->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; + + old_cred = override_creds(ctx->cred); + if (is_sync_kiocb(iocb)) { + rwf_t rwf = iocb_to_rw_flags(flags); + + ret = vfs_iter_read(file, iter, &iocb->ki_pos, rwf); + } else { + ret = -ENOMEM; + aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL); + if (!aio) + goto out; + + aio->orig_iocb = iocb; + kiocb_clone(&aio->iocb, iocb, get_file(file)); + aio->iocb.ki_complete = backing_aio_rw_complete; + refcount_set(&aio->ref, 2); + ret = vfs_iocb_iter_read(file, &aio->iocb, iter); + backing_aio_put(aio); + if (ret != -EIOCBQUEUED) + backing_aio_cleanup(aio, ret); + } +out: + revert_creds(old_cred); + + if (ctx->accessed) + ctx->accessed(ctx->user_file); + + return ret; +} +EXPORT_SYMBOL_GPL(backing_file_read_iter); + +ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, + struct kiocb *iocb, int flags, + struct backing_file_ctx *ctx) +{ + const struct cred *old_cred; + ssize_t ret; + + if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING))) + return -EIO; + + if (!iov_iter_count(iter)) + return 0; + + ret = file_remove_privs(ctx->user_file); + if (ret) + return ret; + + if (iocb->ki_flags & IOCB_DIRECT && + !(file->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; + + /* + * Stacked filesystems don't support deferred completions, don't copy + * this property in case it is set by the issuer. + */ + flags &= ~IOCB_DIO_CALLER_COMP; + + old_cred = override_creds(ctx->cred); + if (is_sync_kiocb(iocb)) { + rwf_t rwf = iocb_to_rw_flags(flags); + + ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf); + if (ctx->end_write) + ctx->end_write(ctx->user_file); + } else { + struct backing_aio *aio; + + ret = backing_aio_init_wq(iocb); + if (ret) + goto out; + + ret = -ENOMEM; + aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL); + if (!aio) + goto out; + + aio->orig_iocb = iocb; + aio->end_write = ctx->end_write; + kiocb_clone(&aio->iocb, iocb, get_file(file)); + aio->iocb.ki_flags = flags; + aio->iocb.ki_complete = backing_aio_queue_completion; + refcount_set(&aio->ref, 2); + ret = vfs_iocb_iter_write(file, &aio->iocb, iter); + backing_aio_put(aio); + if (ret != -EIOCBQUEUED) + backing_aio_cleanup(aio, ret); + } +out: + revert_creds(old_cred); + + return ret; +} +EXPORT_SYMBOL_GPL(backing_file_write_iter); + +static int __init backing_aio_init(void) +{ + backing_aio_cachep = kmem_cache_create("backing_aio", + sizeof(struct backing_aio), + 0, SLAB_HWCACHE_ALIGN, NULL); + if (!backing_aio_cachep) + return -ENOMEM; + + return 0; +} +fs_initcall(backing_aio_init); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 9c4272d10e99..4405458b82fa 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -16,19 +16,6 @@ #include <linux/backing-file.h> #include "overlayfs.h" -#include "../internal.h" /* for sb_init_dio_done_wq */ - -struct ovl_aio_req { - struct kiocb iocb; - refcount_t ref; - struct kiocb *orig_iocb; - /* used for aio completion */ - struct work_struct work; - long res; -}; - -static struct kmem_cache *ovl_aio_request_cachep; - static char ovl_whatisit(struct inode *inode, struct inode *realinode) { if (realinode != ovl_inode_upper(inode)) @@ -272,84 +259,16 @@ static void ovl_file_accessed(struct file *file) touch_atime(&file->f_path); } -#define OVL_IOCB_MASK \ - (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND) - -static rwf_t iocb_to_rw_flags(int flags) -{ - return (__force rwf_t)(flags & OVL_IOCB_MASK); -} - -static inline void ovl_aio_put(struct ovl_aio_req *aio_req) -{ - if (refcount_dec_and_test(&aio_req->ref)) { - fput(aio_req->iocb.ki_filp); - kmem_cache_free(ovl_aio_request_cachep, aio_req); - } -} - -static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req) -{ - struct kiocb *iocb = &aio_req->iocb; - struct kiocb *orig_iocb = aio_req->orig_iocb; - - if (iocb->ki_flags & IOCB_WRITE) - ovl_file_modified(orig_iocb->ki_filp); - - orig_iocb->ki_pos = iocb->ki_pos; - ovl_aio_put(aio_req); -} - -static void ovl_aio_rw_complete(struct kiocb *iocb, long res) -{ - struct ovl_aio_req *aio_req = container_of(iocb, - struct ovl_aio_req, iocb); - struct kiocb *orig_iocb = aio_req->orig_iocb; - - if (iocb->ki_flags & IOCB_WRITE) - kiocb_end_write(iocb); - - ovl_aio_cleanup_handler(aio_req); - orig_iocb->ki_complete(orig_iocb, res); -} - -static void ovl_aio_complete_work(struct work_struct *work) -{ - struct ovl_aio_req *aio_req = container_of(work, - struct ovl_aio_req, work); - - ovl_aio_rw_complete(&aio_req->iocb, aio_req->res); -} - -static void ovl_aio_queue_completion(struct kiocb *iocb, long res) -{ - struct ovl_aio_req *aio_req = container_of(iocb, - struct ovl_aio_req, iocb); - struct kiocb *orig_iocb = aio_req->orig_iocb; - - /* - * Punt to a work queue to serialize updates of mtime/size. - */ - aio_req->res = res; - INIT_WORK(&aio_req->work, ovl_aio_complete_work); - queue_work(file_inode(orig_iocb->ki_filp)->i_sb->s_dio_done_wq, - &aio_req->work); -} - -static int ovl_init_aio_done_wq(struct super_block *sb) -{ - if (sb->s_dio_done_wq) - return 0; - - return sb_init_dio_done_wq(sb); -} - static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; struct fd real; - const struct cred *old_cred; ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ovl_creds(file_inode(file)->i_sb), + .user_file = file, + .accessed = ovl_file_accessed, + }; if (!iov_iter_count(iter)) return 0; @@ -358,37 +277,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) if (ret) return ret; - ret = -EINVAL; - if (iocb->ki_flags & IOCB_DIRECT && - !(real.file->f_mode & FMODE_CAN_ODIRECT)) - goto out_fdput; - - old_cred = ovl_override_creds(file_inode(file)->i_sb); - if (is_sync_kiocb(iocb)) { - rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags); - - ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf); - } else { - struct ovl_aio_req *aio_req; - - ret = -ENOMEM; - aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); - if (!aio_req) - goto out; - - aio_req->orig_iocb = iocb; - kiocb_clone(&aio_req->iocb, iocb, get_file(real.file)); - aio_req->iocb.ki_complete = ovl_aio_rw_complete; - refcount_set(&aio_req->ref, 2); - ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter); - ovl_aio_put(aio_req); - if (ret != -EIOCBQUEUED) - ovl_aio_cleanup_handler(aio_req); - } -out: - revert_creds(old_cred); - ovl_file_accessed(file); -out_fdput: + ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags, + &ctx); fdput(real); return ret; @@ -399,9 +289,13 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); struct fd real; - const struct cred *old_cred; ssize_t ret; int ifl = iocb->ki_flags; + struct backing_file_ctx ctx = { + .cred = ovl_creds(inode->i_sb), + .user_file = file, + .end_write = ovl_file_modified, + }; if (!iov_iter_count(iter)) return 0; @@ -409,19 +303,11 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) inode_lock(inode); /* Update mode */ ovl_copyattr(inode); - ret = file_remove_privs(file); - if (ret) - goto out_unlock; ret = ovl_real_fdget(file, &real); if (ret) goto out_unlock; - ret = -EINVAL; - if (iocb->ki_flags & IOCB_DIRECT && - !(real.file->f_mode & FMODE_CAN_ODIRECT)) - goto out_fdput; - if (!ovl_should_sync(OVL_FS(inode->i_sb))) ifl &= ~(IOCB_DSYNC | IOCB_SYNC); @@ -430,39 +316,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) * this property in case it is set by the issuer. */ ifl &= ~IOCB_DIO_CALLER_COMP; - - old_cred = ovl_override_creds(file_inode(file)->i_sb); - if (is_sync_kiocb(iocb)) { - rwf_t rwf = iocb_to_rw_flags(ifl); - - ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf); - /* Update size */ - ovl_file_modified(file); - } else { - struct ovl_aio_req *aio_req; - - ret = ovl_init_aio_done_wq(inode->i_sb); - if (ret) - goto out; - - ret = -ENOMEM; - aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); - if (!aio_req) - goto out; - - aio_req->orig_iocb = iocb; - kiocb_clone(&aio_req->iocb, iocb, get_file(real.file)); - aio_req->iocb.ki_flags = ifl; - aio_req->iocb.ki_complete = ovl_aio_queue_completion; - refcount_set(&aio_req->ref, 2); - ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); - ovl_aio_put(aio_req); - if (ret != -EIOCBQUEUED) - ovl_aio_cleanup_handler(aio_req); - } -out: - revert_creds(old_cred); -out_fdput: + ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx); fdput(real); out_unlock: @@ -774,19 +628,3 @@ const struct file_operations ovl_file_operations = { .copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range, }; - -int __init ovl_aio_request_cache_init(void) -{ - ovl_aio_request_cachep = kmem_cache_create("ovl_aio_req", - sizeof(struct ovl_aio_req), - 0, SLAB_HWCACHE_ALIGN, NULL); - if (!ovl_aio_request_cachep) - return -ENOMEM; - - return 0; -} - -void ovl_aio_request_cache_destroy(void) -{ - kmem_cache_destroy(ovl_aio_request_cachep); -} diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2c776ccf3025..65ea8fc5b670 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1556,14 +1556,10 @@ static int __init ovl_init(void) if (ovl_inode_cachep == NULL) return -ENOMEM; - err = ovl_aio_request_cache_init(); - if (!err) { - err = register_filesystem(&ovl_fs_type); - if (!err) - return 0; + err = register_filesystem(&ovl_fs_type); + if (!err) + return 0; - ovl_aio_request_cache_destroy(); - } kmem_cache_destroy(ovl_inode_cachep); return err; @@ -1579,7 +1575,6 @@ static void __exit ovl_exit(void) */ rcu_barrier(); kmem_cache_destroy(ovl_inode_cachep); - ovl_aio_request_cache_destroy(); } module_init(ovl_init); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit 9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There is not much in those helpers, but it makes sense to have them logically next to the backing_file_{read,write}_iter() helpers as they may grow more common logic in the future. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/backing-file.h | 8 ++++++ fs/backing-file.c | 51 ++++++++++++++++++++++++++++++++++++ fs/overlayfs/file.c | 33 +++++++++-------------- 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h index 0648d548a418..0546d5b1c9f5 100644 --- a/include/linux/backing-file.h +++ b/include/linux/backing-file.h @@ -28,5 +28,13 @@ ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter, ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, struct kiocb *iocb, int flags, struct backing_file_ctx *ctx); +ssize_t backing_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags, + struct backing_file_ctx *ctx); +ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, size_t len, + unsigned int flags, + struct backing_file_ctx *ctx); #endif /* _LINUX_BACKING_FILE_H */ diff --git a/fs/backing-file.c b/fs/backing-file.c index 323187a49da3..ddd35c1d6c71 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -10,6 +10,7 @@ #include <linux/fs.h> #include <linux/backing-file.h> +#include <linux/splice.h> #include "internal.h" @@ -245,6 +246,56 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, } EXPORT_SYMBOL_GPL(backing_file_write_iter); +ssize_t backing_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags, + struct backing_file_ctx *ctx) +{ + const struct cred *old_cred; + ssize_t ret; + + if (WARN_ON_ONCE(!(in->f_mode & FMODE_BACKING))) + return -EIO; + + old_cred = override_creds(ctx->cred); + ret = vfs_splice_read(in, ppos, pipe, len, flags); + revert_creds(old_cred); + + if (ctx->accessed) + ctx->accessed(ctx->user_file); + + return ret; +} +EXPORT_SYMBOL_GPL(backing_file_splice_read); + +ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, size_t len, + unsigned int flags, + struct backing_file_ctx *ctx) +{ + const struct cred *old_cred; + ssize_t ret; + + if (WARN_ON_ONCE(!(out->f_mode & FMODE_BACKING))) + return -EIO; + + ret = file_remove_privs(ctx->user_file); + if (ret) + return ret; + + old_cred = override_creds(ctx->cred); + file_start_write(out); + ret = iter_file_splice_write(pipe, out, ppos, len, flags); + file_end_write(out); + revert_creds(old_cred); + + if (ctx->end_write) + ctx->end_write(ctx->user_file); + + return ret; +} +EXPORT_SYMBOL_GPL(backing_file_splice_write); + static int __init backing_aio_init(void) { backing_aio_cachep = kmem_cache_create("backing_aio", diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 4405458b82fa..5ca2fab77cc1 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -9,7 +9,6 @@ #include <linux/xattr.h> #include <linux/uio.h> #include <linux/uaccess.h> -#include <linux/splice.h> #include <linux/security.h> #include <linux/mm.h> #include <linux/fs.h> @@ -329,20 +328,21 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - const struct cred *old_cred; struct fd real; ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ovl_creds(file_inode(in)->i_sb), + .user_file = in, + .accessed = ovl_file_accessed, + }; ret = ovl_real_fdget(in, &real); if (ret) return ret; - old_cred = ovl_override_creds(file_inode(in)->i_sb); - ret = vfs_splice_read(real.file, ppos, pipe, len, flags); - revert_creds(old_cred); - ovl_file_accessed(in); - + ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx); fdput(real); + return ret; } @@ -358,30 +358,23 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { struct fd real; - const struct cred *old_cred; struct inode *inode = file_inode(out); ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ovl_creds(inode->i_sb), + .user_file = out, + .end_write = ovl_file_modified, + }; inode_lock(inode); /* Update mode */ ovl_copyattr(inode); - ret = file_remove_privs(out); - if (ret) - goto out_unlock; ret = ovl_real_fdget(out, &real); if (ret) goto out_unlock; - old_cred = ovl_override_creds(inode->i_sb); - file_start_write(real.file); - - ret = iter_file_splice_write(pipe, real.file, ppos, len, flags); - - file_end_write(real.file); - /* Update size */ - ovl_file_modified(out); - revert_creds(old_cred); + ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx); fdput(real); out_unlock: -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc1 commit f567377e406c032fff0799bde4fdf4a977529b84 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Assert that the file object is allocated in a backing_file container so that file_user_path() could be used to display the user path and not the backing file's path in /proc/<pid>/maps. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/backing-file.h | 2 ++ fs/backing-file.c | 27 +++++++++++++++++++++++++++ fs/overlayfs/file.c | 23 ++++++----------------- 3 files changed, 35 insertions(+), 17 deletions(-) diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h index 0546d5b1c9f5..3f1fe1774f1b 100644 --- a/include/linux/backing-file.h +++ b/include/linux/backing-file.h @@ -36,5 +36,7 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags, struct backing_file_ctx *ctx); +int backing_file_mmap(struct file *file, struct vm_area_struct *vma, + struct backing_file_ctx *ctx); #endif /* _LINUX_BACKING_FILE_H */ diff --git a/fs/backing-file.c b/fs/backing-file.c index ddd35c1d6c71..a681f38d84d8 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -11,6 +11,7 @@ #include <linux/fs.h> #include <linux/backing-file.h> #include <linux/splice.h> +#include <linux/mm.h> #include "internal.h" @@ -296,6 +297,32 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, } EXPORT_SYMBOL_GPL(backing_file_splice_write); +int backing_file_mmap(struct file *file, struct vm_area_struct *vma, + struct backing_file_ctx *ctx) +{ + const struct cred *old_cred; + int ret; + + if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)) || + WARN_ON_ONCE(ctx->user_file != vma->vm_file)) + return -EIO; + + if (!file->f_op->mmap) + return -ENODEV; + + vma_set_file(vma, file); + + old_cred = override_creds(ctx->cred); + ret = call_mmap(vma->vm_file, vma); + revert_creds(old_cred); + + if (ctx->accessed) + ctx->accessed(ctx->user_file); + + return ret; +} +EXPORT_SYMBOL_GPL(backing_file_mmap); + static int __init backing_aio_init(void) { backing_aio_cachep = kmem_cache_create("backing_aio", diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 5ca2fab77cc1..63f4b22ee955 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -10,7 +10,6 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> -#include <linux/mm.h> #include <linux/fs.h> #include <linux/backing-file.h> #include "overlayfs.h" @@ -412,23 +411,13 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) static int ovl_mmap(struct file *file, struct vm_area_struct *vma) { struct file *realfile = file->private_data; - const struct cred *old_cred; - int ret; - - if (!realfile->f_op->mmap) - return -ENODEV; - - if (WARN_ON(file != vma->vm_file)) - return -EIO; - - vma_set_file(vma, realfile); - - old_cred = ovl_override_creds(file_inode(file)->i_sb); - ret = call_mmap(vma->vm_file, vma); - revert_creds(old_cred); - ovl_file_accessed(file); + struct backing_file_ctx ctx = { + .cred = ovl_creds(file_inode(file)->i_sb), + .user_file = file, + .accessed = ovl_file_accessed, + }; - return ret; + return backing_file_mmap(realfile, vma, &ctx); } static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit aed918310ea2542059eeab6c74defca95c30f77b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In preparation to adding more fuse dev ioctls. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/dev.c | 59 ++++++++++++++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8573d79ef29c..117221e4d986 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2253,43 +2253,50 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new) return 0; } -static long fuse_dev_ioctl(struct file *file, unsigned int cmd, - unsigned long arg) +static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp) { int res; int oldfd; struct fuse_dev *fud = NULL; struct fd f; + if (get_user(oldfd, argp)) + return -EFAULT; + + f = fdget(oldfd); + if (!f.file) + return -EINVAL; + + /* + * Check against file->f_op because CUSE + * uses the same ioctl handler. + */ + if (f.file->f_op == file->f_op) + fud = fuse_get_dev(f.file); + + res = -EINVAL; + if (fud) { + mutex_lock(&fuse_mutex); + res = fuse_device_clone(fud->fc, file); + mutex_unlock(&fuse_mutex); + } + + fdput(f); + return res; +} + +static long fuse_dev_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + void __user *argp = (void __user *)arg; + switch (cmd) { case FUSE_DEV_IOC_CLONE: - if (get_user(oldfd, (__u32 __user *)arg)) - return -EFAULT; + return fuse_dev_ioctl_clone(file, argp); - f = fdget(oldfd); - if (!f.file) - return -EINVAL; - - /* - * Check against file->f_op because CUSE - * uses the same ioctl handler. - */ - if (f.file->f_op == file->f_op) - fud = fuse_get_dev(f.file); - - res = -EINVAL; - if (fud) { - mutex_lock(&fuse_mutex); - res = fuse_device_clone(fud->fc, file); - mutex_unlock(&fuse_mutex); - } - fdput(f); - break; default: - res = -ENOTTY; - break; + return -ENOTTY; } - return res; } const struct file_operations fuse_dev_operations = { -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 7dc4e97a4f9a55bae6ed6ab3f96c92921259d59f category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- FUSE_PASSTHROUGH capability to passthrough FUSE operations to backing files will be made available with kernel config CONFIG_FUSE_PASSTHROUGH. When requesting FUSE_PASSTHROUGH, userspace needs to specify the max_stack_depth that is allowed for FUSE on top of backing files. Introduce the flag FOPEN_PASSTHROUGH and backing_id to fuse_open_out argument that can be used when replying to OPEN request, to setup passthrough of io operations on the fuse inode to a backing file. Introduce a refcounted fuse_backing object that will be used to associate an open backing file with a fuse inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Conflicts: fs/fuse/fuse_i.h [Using KABI_RESERVE to skip kabi changes.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 46 +++++++++++++++++++++++++++++++++++---- include/uapi/linux/fuse.h | 14 +++++++++--- fs/fuse/inode.c | 26 ++++++++++++++++++++++ fs/fuse/passthrough.c | 30 +++++++++++++++++++++++++ fs/fuse/Kconfig | 11 ++++++++++ fs/fuse/Makefile | 1 + 6 files changed, 121 insertions(+), 7 deletions(-) create mode 100644 fs/fuse/passthrough.c diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 5cd7bd9b9c29..d7b05e7561b8 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -77,6 +77,15 @@ struct fuse_submount_lookup { struct fuse_forget_link *forget; }; +/** Container for data related to mapping to backing file */ +struct fuse_backing { + struct file *file; + + /** refcount */ + refcount_t count; + struct rcu_head rcu; +}; + /** FUSE inode */ struct fuse_inode { /** Inode data */ @@ -180,8 +189,10 @@ struct fuse_inode { #endif /** Submount specific lookup tracking */ struct fuse_submount_lookup *submount_lookup; - - KABI_RESERVE(1) +#ifdef CONFIG_FUSE_PASSTHROUGH + /** Reference to backing file in passthrough mode */ + struct fuse_backing *fb; +#endif }; /** FUSE inode state bits */ @@ -840,6 +851,12 @@ struct fuse_conn { /* Use pages instead of pointer for kernel I/O */ unsigned int use_pages_for_kvec_io:1; + /** Passthrough support for read/write IO */ + unsigned int passthrough:1; + + /** Maximum stack depth for passthrough backing files */ + int max_stack_depth; + /** The number of requests waiting for completion */ atomic_t num_waiting; @@ -892,8 +909,6 @@ struct fuse_conn { KABI_RESERVE(1) KABI_RESERVE(2) - KABI_RESERVE(3) - KABI_RESERVE(4) }; /* @@ -1384,4 +1399,27 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, void fuse_file_release(struct inode *inode, struct fuse_file *ff, unsigned int open_flags, fl_owner_t id, bool isdir); +/* passthrough.c */ +static inline struct fuse_backing *fuse_inode_backing(struct fuse_inode *fi) +{ +#ifdef CONFIG_FUSE_PASSTHROUGH + return READ_ONCE(fi->fb); +#else + return NULL; +#endif +} + +static inline struct fuse_backing *fuse_inode_backing_set(struct fuse_inode *fi, + struct fuse_backing *fb) +{ +#ifdef CONFIG_FUSE_PASSTHROUGH + return xchg(&fi->fb, fb); +#else + return NULL; +#endif +} + +struct fuse_backing *fuse_backing_get(struct fuse_backing *fb); +void fuse_backing_put(struct fuse_backing *fb); + #endif /* _FS_FUSE_I_H */ diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index e7418d15fe39..7bb6219cfda0 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -211,6 +211,10 @@ * 7.39 * - add FUSE_DIRECT_IO_ALLOW_MMAP * - add FUSE_STATX and related structures + * + * 7.40 + * - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH init flag + * - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag */ #ifndef _LINUX_FUSE_H @@ -246,7 +250,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 39 +#define FUSE_KERNEL_MINOR_VERSION 40 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -353,6 +357,7 @@ struct fuse_file_lock { * FOPEN_STREAM: the file is stream-like (no file position at all) * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE) * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode + * FOPEN_PASSTHROUGH: passthrough read/write io for this open file */ #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) @@ -361,6 +366,7 @@ struct fuse_file_lock { #define FOPEN_STREAM (1 << 4) #define FOPEN_NOFLUSH (1 << 5) #define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6) +#define FOPEN_PASSTHROUGH (1 << 7) /** * INIT request/reply flags @@ -449,6 +455,7 @@ struct fuse_file_lock { #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) +#define FUSE_PASSTHROUGH (1ULL << 37) /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP @@ -761,7 +768,7 @@ struct fuse_create_in { struct fuse_open_out { uint64_t fh; uint32_t open_flags; - uint32_t padding; + int32_t backing_id; }; struct fuse_release_in { @@ -877,7 +884,8 @@ struct fuse_init_out { uint16_t max_pages; uint16_t map_alignment; uint32_t flags2; - uint32_t unused[7]; + uint32_t max_stack_depth; + uint32_t unused[6]; }; #define CUSE_INIT_INFO_MAX 4096 diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 735abf426a06..89e77b136879 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -111,6 +111,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) if (IS_ENABLED(CONFIG_FUSE_DAX) && !fuse_dax_inode_alloc(sb, fi)) goto out_free_forget; + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_inode_backing_set(fi, NULL); + return &fi->inode; out_free_forget: @@ -129,6 +132,9 @@ static void fuse_free_inode(struct inode *inode) #ifdef CONFIG_FUSE_DAX kfree(fi->dax); #endif + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_backing_put(fuse_inode_backing(fi)); + kmem_cache_free(fuse_inode_cachep, fi); } @@ -1312,6 +1318,24 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, fc->create_supp_group = 1; if (flags & FUSE_DIRECT_IO_ALLOW_MMAP) fc->direct_io_allow_mmap = 1; + /* + * max_stack_depth is the max stack depth of FUSE fs, + * so it has to be at least 1 to support passthrough + * to backing files. + * + * with max_stack_depth > 1, the backing files can be + * on a stacked fs (e.g. overlayfs) themselves and with + * max_stack_depth == 1, FUSE fs can be stacked as the + * underlying fs of a stacked fs (e.g. overlayfs). + */ + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) && + (flags & FUSE_PASSTHROUGH) && + arg->max_stack_depth > 0 && + arg->max_stack_depth <= FILESYSTEM_MAX_STACK_DEPTH) { + fc->passthrough = 1; + fc->max_stack_depth = arg->max_stack_depth; + fm->sb->s_stack_depth = arg->max_stack_depth; + } } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; @@ -1367,6 +1391,8 @@ void fuse_send_init(struct fuse_mount *fm) #endif if (fm->fc->auto_submounts) flags |= FUSE_SUBMOUNTS; + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + flags |= FUSE_PASSTHROUGH; ia->in.flags = flags; ia->in.flags2 = flags >> 32; diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c new file mode 100644 index 000000000000..e8639c0a9ac6 --- /dev/null +++ b/fs/fuse/passthrough.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * FUSE passthrough to backing file. + * + * Copyright (c) 2023 CTERA Networks. + */ + +#include "fuse_i.h" + +#include <linux/file.h> + +struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) +{ + if (fb && refcount_inc_not_zero(&fb->count)) + return fb; + return NULL; +} + +static void fuse_backing_free(struct fuse_backing *fb) +{ + if (fb->file) + fput(fb->file); + kfree_rcu(fb, rcu); +} + +void fuse_backing_put(struct fuse_backing *fb) +{ + if (fb && refcount_dec_and_test(&fb->count)) + fuse_backing_free(fb); +} diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig index 038ed0b9aaa5..8674dbfbe59d 100644 --- a/fs/fuse/Kconfig +++ b/fs/fuse/Kconfig @@ -52,3 +52,14 @@ config FUSE_DAX If you want to allow mounting a Virtio Filesystem with the "dax" option, answer Y. + +config FUSE_PASSTHROUGH + bool "FUSE passthrough operations support" + default y + depends on FUSE_FS + select FS_STACK + help + This allows bypassing FUSE server by mapping specific FUSE operations + to be performed directly on a backing file. + + If you want to allow passthrough operations, answer Y. diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index b734cc2a5e65..6e0228c6d0cb 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -10,5 +10,6 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o fuse-y += iomode.o fuse-$(CONFIG_FUSE_DAX) += dax.o +fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o virtiofs-y := virtio_fs.o -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 44350256ab943d424d70aa60a34f45060b3a36e8 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file descriptor. If the call succeeds, a backing file identifier is returned. A later change will be using this backing file id in a reply to OPEN request with the flag FOPEN_PASSTHROUGH to setup passthrough of file operations on the open FUSE file to the backing file. The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the backing file by its id. This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH flag is still in progress, the open may fail if the backing file is closed before the fuse file was opened. Setting up backing files requires a server with CAP_SYS_ADMIN privileges. For the backing file to be successfully setup, the backing file must implement both read_iter and write_iter file operations. The limitation on the level of filesystem stacking allowed for the backing file is enforced before setting up the backing file. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/fuse_i.h [Uses kabi_reserve to avoid kabi changes.] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 11 ++- include/uapi/linux/fuse.h | 9 +++ fs/fuse/dev.c | 41 ++++++++++++ fs/fuse/inode.c | 5 ++ fs/fuse/passthrough.c | 136 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 201 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index d7b05e7561b8..16d19fdd0537 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -80,6 +80,7 @@ struct fuse_submount_lookup { /** Container for data related to mapping to backing file */ struct fuse_backing { struct file *file; + struct cred *cred; /** refcount */ refcount_t count; @@ -907,8 +908,12 @@ struct fuse_conn { /* New writepages go into this bucket */ struct fuse_sync_bucket __rcu *curr_bucket; +#ifdef CONFIG_FUSE_PASSTHROUGH + /** IDR for backing files ids */ + struct idr backing_files_map; +#endif + KABI_RESERVE(1) - KABI_RESERVE(2) }; /* @@ -1421,5 +1426,9 @@ static inline struct fuse_backing *fuse_inode_backing_set(struct fuse_inode *fi, struct fuse_backing *fuse_backing_get(struct fuse_backing *fb); void fuse_backing_put(struct fuse_backing *fb); +void fuse_backing_files_init(struct fuse_conn *fc); +void fuse_backing_files_free(struct fuse_conn *fc); +int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map); +int fuse_backing_close(struct fuse_conn *fc, int backing_id); #endif /* _FS_FUSE_I_H */ diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 7bb6219cfda0..1162a47b6a42 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -1057,9 +1057,18 @@ struct fuse_notify_retrieve_in { uint64_t dummy4; }; +struct fuse_backing_map { + int32_t fd; + uint32_t flags; + uint64_t padding; +}; + /* Device ioctls: */ #define FUSE_DEV_IOC_MAGIC 229 #define FUSE_DEV_IOC_CLONE _IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t) +#define FUSE_DEV_IOC_BACKING_OPEN _IOW(FUSE_DEV_IOC_MAGIC, 1, \ + struct fuse_backing_map) +#define FUSE_DEV_IOC_BACKING_CLOSE _IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t) struct fuse_lseek_in { uint64_t fh; diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 117221e4d986..3c842286cdbb 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2285,6 +2285,41 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp) return res; } +static long fuse_dev_ioctl_backing_open(struct file *file, + struct fuse_backing_map __user *argp) +{ + struct fuse_dev *fud = fuse_get_dev(file); + struct fuse_backing_map map; + + if (!fud) + return -EPERM; + + if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + return -EOPNOTSUPP; + + if (copy_from_user(&map, argp, sizeof(map))) + return -EFAULT; + + return fuse_backing_open(fud->fc, &map); +} + +static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) +{ + struct fuse_dev *fud = fuse_get_dev(file); + int backing_id; + + if (!fud) + return -EPERM; + + if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + return -EOPNOTSUPP; + + if (get_user(backing_id, argp)) + return -EFAULT; + + return fuse_backing_close(fud->fc, backing_id); +} + static long fuse_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2294,6 +2329,12 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, case FUSE_DEV_IOC_CLONE: return fuse_dev_ioctl_clone(file, argp); + case FUSE_DEV_IOC_BACKING_OPEN: + return fuse_dev_ioctl_backing_open(file, argp); + + case FUSE_DEV_IOC_BACKING_CLOSE: + return fuse_dev_ioctl_backing_close(file, argp); + default: return -ENOTTY; } diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 89e77b136879..d5f9526b76ec 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -952,6 +952,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; fc->max_pages_limit = FUSE_MAX_MAX_PAGES; + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_backing_files_init(fc); + INIT_LIST_HEAD(&fc->mounts); list_add(&fm->fc_entry, &fc->mounts); fm->fc = fc; @@ -982,6 +985,8 @@ void fuse_conn_put(struct fuse_conn *fc) WARN_ON(atomic_read(&bucket->count) != 1); kfree(bucket); } + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_backing_files_free(fc); call_rcu(&fc->rcu, delayed_release); } } diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index e8639c0a9ac6..7ec92a54c2e0 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -18,8 +18,11 @@ struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) static void fuse_backing_free(struct fuse_backing *fb) { + pr_debug("%s: fb=0x%p\n", __func__, fb); + if (fb->file) fput(fb->file); + put_cred(fb->cred); kfree_rcu(fb, rcu); } @@ -28,3 +31,136 @@ void fuse_backing_put(struct fuse_backing *fb) if (fb && refcount_dec_and_test(&fb->count)) fuse_backing_free(fb); } + +void fuse_backing_files_init(struct fuse_conn *fc) +{ + idr_init(&fc->backing_files_map); +} + +static int fuse_backing_id_alloc(struct fuse_conn *fc, struct fuse_backing *fb) +{ + int id; + + idr_preload(GFP_KERNEL); + spin_lock(&fc->lock); + /* FIXME: xarray might be space inefficient */ + id = idr_alloc_cyclic(&fc->backing_files_map, fb, 1, 0, GFP_ATOMIC); + spin_unlock(&fc->lock); + idr_preload_end(); + + WARN_ON_ONCE(id == 0); + return id; +} + +static struct fuse_backing *fuse_backing_id_remove(struct fuse_conn *fc, + int id) +{ + struct fuse_backing *fb; + + spin_lock(&fc->lock); + fb = idr_remove(&fc->backing_files_map, id); + spin_unlock(&fc->lock); + + return fb; +} + +static int fuse_backing_id_free(int id, void *p, void *data) +{ + struct fuse_backing *fb = p; + + WARN_ON_ONCE(refcount_read(&fb->count) != 1); + fuse_backing_free(fb); + return 0; +} + +void fuse_backing_files_free(struct fuse_conn *fc) +{ + idr_for_each(&fc->backing_files_map, fuse_backing_id_free, NULL); + idr_destroy(&fc->backing_files_map); +} + +int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map) +{ + struct file *file; + struct super_block *backing_sb; + struct fuse_backing *fb = NULL; + int res; + + pr_debug("%s: fd=%d flags=0x%x\n", __func__, map->fd, map->flags); + + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ + res = -EPERM; + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) + goto out; + + res = -EINVAL; + if (map->flags) + goto out; + + file = fget(map->fd); + res = -EBADF; + if (!file) + goto out; + + res = -EOPNOTSUPP; + if (!file->f_op->read_iter || !file->f_op->write_iter) + goto out_fput; + + backing_sb = file_inode(file)->i_sb; + res = -ELOOP; + if (backing_sb->s_stack_depth >= fc->max_stack_depth) + goto out_fput; + + fb = kmalloc(sizeof(struct fuse_backing), GFP_KERNEL); + res = -ENOMEM; + if (!fb) + goto out_fput; + + fb->file = file; + fb->cred = prepare_creds(); + refcount_set(&fb->count, 1); + + res = fuse_backing_id_alloc(fc, fb); + if (res < 0) { + fuse_backing_free(fb); + fb = NULL; + } + +out: + pr_debug("%s: fb=0x%p, ret=%i\n", __func__, fb, res); + + return res; + +out_fput: + fput(file); + goto out; +} + +int fuse_backing_close(struct fuse_conn *fc, int backing_id) +{ + struct fuse_backing *fb = NULL; + int err; + + pr_debug("%s: backing_id=%d\n", __func__, backing_id); + + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ + err = -EPERM; + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) + goto out; + + err = -EINVAL; + if (backing_id <= 0) + goto out; + + err = -ENOENT; + fb = fuse_backing_id_remove(fc, backing_id); + if (!fb) + goto out; + + fuse_backing_put(fb); + err = 0; +out: + pr_debug("%s: fb=0x%p, err=%i\n", __func__, fb, err); + + return err; +} -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit fc8ff397b2a91590031ae08534de627f957005cb category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- In preparation for opening file in passthrough mode, store the fuse_open_out argument in ff->args to be passed into fuse_file_io_open() with the optional backing_id member. This will be used for setting up passthrough to backing file on open reply with FOPEN_PASSTHROUGH flag and a valid backing_id. Opening a file in passthrough mode may fail for several reasons, such as missing capability, conflicting open flags or inode in caching mode. Return EIO from fuse_file_io_open() in those cases. The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed - it mean that read/write operations will go directly to the server, but mmap will be done to the backing file. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 19 ++++++++++++++++--- fs/fuse/dir.c | 12 +++++++----- fs/fuse/file.c | 34 +++++++++++++++------------------- fs/fuse/iomode.c | 44 ++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 78 insertions(+), 31 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 16d19fdd0537..60571a7ffa5e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -214,15 +214,15 @@ enum { struct fuse_conn; struct fuse_mount; -struct fuse_release_args; +union fuse_file_args; /** FUSE specific file data */ struct fuse_file { /** Fuse connection for this file */ struct fuse_mount *fm; - /* Argument space reserved for release */ - struct fuse_release_args *release_args; + /* Argument space reserved for open/release */ + union fuse_file_args *args; /** Kernel file handle guaranteed to be unique */ u64 kh; @@ -324,6 +324,19 @@ struct fuse_args_pages { unsigned int num_pages; }; +struct fuse_release_args { + struct fuse_args args; + struct fuse_release_in inarg; + struct inode *inode; +}; + +union fuse_file_args { + /* Used during open() */ + struct fuse_open_out open_outarg; + /* Used during release() */ + struct fuse_release_args release_args; +}; + #define FUSE_ARGS(args) struct fuse_args args = {} /** The request IO state (for asynchronous processing) */ diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index ca865e7c4b55..1ca3d3643d55 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -619,7 +619,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, FUSE_ARGS(args); struct fuse_forget_link *forget; struct fuse_create_in inarg; - struct fuse_open_out outopen; + struct fuse_open_out *outopenp; struct fuse_entry_out outentry; struct fuse_inode *fi; struct fuse_file *ff; @@ -663,8 +663,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, args.out_numargs = 2; args.out_args[0].size = sizeof(outentry); args.out_args[0].value = &outentry; - args.out_args[1].size = sizeof(outopen); - args.out_args[1].value = &outopen; + /* Store outarg for fuse_finish_open() */ + outopenp = &ff->args->open_outarg; + args.out_args[1].size = sizeof(*outopenp); + args.out_args[1].value = outopenp; err = get_create_ext(&args, dir, entry, mode); if (err) @@ -680,9 +682,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, fuse_invalid_attr(&outentry.attr)) goto out_free_ff; - ff->fh = outopen.fh; + ff->fh = outopenp->fh; ff->nodeid = outentry.nodeid; - ff->open_flags = outopen.open_flags; + ff->open_flags = outopenp->open_flags; inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation, &outentry.attr, ATTR_TIMEOUT(&outentry), 0); if (!inode) { diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3a1751d7f78e..2b047dd351b2 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -49,12 +49,6 @@ static int fuse_send_open(struct fuse_mount *fm, u64 nodeid, return fuse_simple_request(fm, &args); } -struct fuse_release_args { - struct fuse_args args; - struct fuse_release_in inarg; - struct inode *inode; -}; - struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release) { struct fuse_file *ff; @@ -65,9 +59,8 @@ struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release) ff->fm = fm; if (release) { - ff->release_args = kzalloc(sizeof(*ff->release_args), - GFP_KERNEL_ACCOUNT); - if (!ff->release_args) { + ff->args = kzalloc(sizeof(*ff->args), GFP_KERNEL_ACCOUNT); + if (!ff->args) { kfree(ff); return NULL; } @@ -86,7 +79,7 @@ struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release) void fuse_file_free(struct fuse_file *ff) { - kfree(ff->release_args); + kfree(ff->args); mutex_destroy(&ff->readdir.lock); kfree(ff); } @@ -109,7 +102,7 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args, static void fuse_file_put(struct fuse_file *ff, bool sync) { if (refcount_dec_and_test(&ff->count)) { - struct fuse_release_args *ra = ff->release_args; + struct fuse_release_args *ra = &ff->args->release_args; struct fuse_args *args = (ra ? &ra->args : NULL); if (ra && ra->inode) @@ -146,20 +139,21 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, /* Default for no-open */ ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0); if (open) { - struct fuse_open_out outarg; + /* Store outarg for fuse_finish_open() */ + struct fuse_open_out *outargp = &ff->args->open_outarg; int err; - err = fuse_send_open(fm, nodeid, open_flags, opcode, &outarg); + err = fuse_send_open(fm, nodeid, open_flags, opcode, outargp); if (!err) { - ff->fh = outarg.fh; - ff->open_flags = outarg.open_flags; + ff->fh = outargp->fh; + ff->open_flags = outargp->open_flags; } else if (err != -ENOSYS) { fuse_file_free(ff); return ERR_PTR(err); } else { /* No release needed */ - kfree(ff->release_args); - ff->release_args = NULL; + kfree(ff->args); + ff->args = NULL; if (isdir) fc->no_opendir = 1; else @@ -298,7 +292,7 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, unsigned int flags, int opcode, bool sync) { struct fuse_conn *fc = ff->fm->fc; - struct fuse_release_args *ra = ff->release_args; + struct fuse_release_args *ra = &ff->args->release_args; /* Inode is NULL on error path of fuse_create_open() */ if (likely(fi)) { @@ -316,6 +310,8 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, if (!ra) return; + /* ff->args was used for open outarg */ + memset(ff->args, 0, sizeof(*ff->args)); ra->inarg.fh = ff->fh; ra->inarg.flags = flags; ra->args.in_numargs = 1; @@ -338,7 +334,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, unsigned int open_flags, fl_owner_t id, bool isdir) { struct fuse_inode *fi = get_fuse_inode(inode); - struct fuse_release_args *ra = ff->release_args; + struct fuse_release_args *ra = &ff->args->release_args; int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE; fuse_prepare_release(fi, ff, open_flags, opcode, false); diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index ea47c76b9df1..2161bdf91db2 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -31,7 +31,7 @@ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) struct fuse_inode *fi = get_fuse_inode(inode); /* There are no io modes if server does not implement open */ - if (!ff->release_args) + if (!ff->args) return 0; spin_lock(&fi->lock); @@ -103,6 +103,37 @@ void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) spin_unlock(&fi->lock); } +/* + * Open flags that are allowed in combination with FOPEN_PASSTHROUGH. + * A combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO means that read/write + * operations go directly to the server, but mmap is done on the backing file. + * FOPEN_PASSTHROUGH mode should not co-exist with any users of the fuse inode + * page cache, so FOPEN_KEEP_CACHE is a strange and undesired combination. + */ +#define FOPEN_PASSTHROUGH_MASK \ + (FOPEN_PASSTHROUGH | FOPEN_DIRECT_IO | FOPEN_PARALLEL_DIRECT_WRITES | \ + FOPEN_NOFLUSH) + +static int fuse_file_passthrough_open(struct inode *inode, struct file *file) +{ + struct fuse_file *ff = file->private_data; + struct fuse_conn *fc = get_fuse_conn(inode); + int err; + + /* Check allowed conditions for file open in passthrough mode */ + if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) || !fc->passthrough || + (ff->open_flags & ~FOPEN_PASSTHROUGH_MASK)) + return -EINVAL; + + /* TODO: implement backing file open */ + return -EOPNOTSUPP; + + /* First passthrough file open denies caching inode io mode */ + err = fuse_file_uncached_io_start(inode, ff); + + return err; +} + /* Request access to submit new io to inode via open file */ int fuse_file_io_open(struct file *file, struct inode *inode) { @@ -113,7 +144,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode) * io modes are not relevant with DAX and with server that does not * implement open. */ - if (FUSE_IS_DAX(inode) || !ff->release_args) + if (FUSE_IS_DAX(inode) || !ff->args) return 0; /* @@ -123,16 +154,21 @@ int fuse_file_io_open(struct file *file, struct inode *inode) ff->open_flags &= ~FOPEN_PARALLEL_DIRECT_WRITES; /* + * First passthrough file open denies caching inode io mode. * First caching file open enters caching inode io mode. * * Note that if user opens a file open with O_DIRECT, but server did * not specify FOPEN_DIRECT_IO, a later fcntl() could remove O_DIRECT, * so we put the inode in caching mode to prevent parallel dio. */ - if (ff->open_flags & FOPEN_DIRECT_IO) + if ((ff->open_flags & FOPEN_DIRECT_IO) && + !(ff->open_flags & FOPEN_PASSTHROUGH)) return 0; - err = fuse_file_cached_io_start(inode, ff); + if (ff->open_flags & FOPEN_PASSTHROUGH) + err = fuse_file_passthrough_open(inode, file); + else + err = fuse_file_cached_io_start(inode, ff); if (err) goto fail; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 4a90451bbc7f7fde94041fbb9ca96dd915069943 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- After getting a backing file id with FUSE_DEV_IOC_BACKING_OPEN ioctl, a FUSE server can reply to an OPEN request with flag FOPEN_PASSTHROUGH and the backing file id. The FUSE server should reuse the same backing file id for all the open replies of the same FUSE inode and open will fail (with -EIO) if a the server attempts to open the same inode with conflicting io modes or to setup passthrough to two different backing files for the same FUSE inode. Using the same backing file id for several different inodes is allowed. Opening a new file with FOPEN_DIRECT_IO for an inode that is already open for passthrough is allowed, but only if the FOPEN_PASSTHROUGH flag and correct backing file id are specified as well. The read/write IO of such files will not use passthrough operations to the backing file, but mmap, which does not support direct_io, will use the backing file insead of using the page cache as it always did. Even though all FUSE passthrough files of the same inode use the same backing file as a backing inode reference, each FUSE file opens a unique instance of a backing_file object to store the FUSE path that was used to open the inode and the open flags of the specific open file. The per-file, backing_file object is released along with the FUSE file. The inode associated fuse_backing object is released when the last FUSE passthrough file of that inode is released AND when the backing file id is closed by the server using the FUSE_DEV_IOC_BACKING_CLOSE ioctl. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 35 ++++++++++++++++++++++++- fs/fuse/file.c | 9 ++++++- fs/fuse/iomode.c | 60 ++++++++++++++++++++++++++++++++++++++----- fs/fuse/passthrough.c | 59 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 155 insertions(+), 8 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 60571a7ffa5e..a00e1be2ab5b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -270,6 +270,12 @@ struct fuse_file { /** Does file hold a fi->iocachectr refcount? */ enum { IOM_NONE, IOM_CACHED, IOM_UNCACHED } iomode; +#ifdef CONFIG_FUSE_PASSTHROUGH + /** Reference to backing file in passthrough mode */ + struct file *passthrough; + const struct cred *cred; +#endif + /** Has flock been performed on this file? */ bool flock:1; }; @@ -1405,7 +1411,7 @@ int fuse_fileattr_set(struct mnt_idmap *idmap, /* iomode.c */ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff); -int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff); +int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struct fuse_backing *fb); void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff); int fuse_file_io_open(struct file *file, struct inode *inode); @@ -1437,11 +1443,38 @@ static inline struct fuse_backing *fuse_inode_backing_set(struct fuse_inode *fi, #endif } +#ifdef CONFIG_FUSE_PASSTHROUGH struct fuse_backing *fuse_backing_get(struct fuse_backing *fb); void fuse_backing_put(struct fuse_backing *fb); +#else + +static inline struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) +{ + return NULL; +} + +static inline void fuse_backing_put(struct fuse_backing *fb) +{ +} +#endif + void fuse_backing_files_init(struct fuse_conn *fc); void fuse_backing_files_free(struct fuse_conn *fc); int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map); int fuse_backing_close(struct fuse_conn *fc, int backing_id); +struct fuse_backing *fuse_passthrough_open(struct file *file, + struct inode *inode, + int backing_id); +void fuse_passthrough_release(struct fuse_file *ff, struct fuse_backing *fb); + +static inline struct file *fuse_file_passthrough(struct fuse_file *ff) +{ +#ifdef CONFIG_FUSE_PASSTHROUGH + return ff->passthrough; +#else + return NULL; +#endif +} + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2b047dd351b2..d192c44e8a5d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -294,6 +294,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, struct fuse_conn *fc = ff->fm->fc; struct fuse_release_args *ra = &ff->args->release_args; + if (fuse_file_passthrough(ff)) + fuse_passthrough_release(ff, fuse_inode_backing(fi)); + /* Inode is NULL on error path of fuse_create_open() */ if (likely(fi)) { spin_lock(&fi->lock); @@ -1380,7 +1383,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, * have raced, so check it again. */ if (fuse_io_past_eof(iocb, from) || - fuse_file_uncached_io_start(inode, ff) != 0) { + fuse_file_uncached_io_start(inode, ff, NULL) != 0) { inode_unlock_shared(inode); inode_lock(inode); *exclusive = true; @@ -2556,6 +2559,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) if (FUSE_IS_DAX(file_inode(file))) return fuse_dax_mmap(file, vma); + /* TODO: implement mmap to backing file */ + if (fuse_file_passthrough(ff)) + return -ENODEV; + /* * FOPEN_DIRECT_IO handling is special compared to O_DIRECT, * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP. diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index 2161bdf91db2..c653ddcf0578 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -17,7 +17,7 @@ */ static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi) { - return READ_ONCE(fi->iocachectr) < 0; + return READ_ONCE(fi->iocachectr) < 0 && !fuse_inode_backing(fi); } /* @@ -45,6 +45,17 @@ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) wait_event(fi->direct_io_waitq, !fuse_is_io_cache_wait(fi)); spin_lock(&fi->lock); } + + /* + * Check if inode entered passthrough io mode while waiting for parallel + * dio write completion. + */ + if (fuse_inode_backing(fi)) { + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + spin_unlock(&fi->lock); + return -ETXTBSY; + } + WARN_ON(ff->iomode == IOM_UNCACHED); if (ff->iomode == IOM_NONE) { ff->iomode = IOM_CACHED; @@ -71,12 +82,19 @@ static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) } /* Start strictly uncached io mode where cache access is not allowed */ -int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) +int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struct fuse_backing *fb) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_backing *oldfb; int err = 0; spin_lock(&fi->lock); + /* deny conflicting backing files on same fuse inode */ + oldfb = fuse_inode_backing(fi); + if (oldfb && oldfb != fb) { + err = -EBUSY; + goto unlock; + } if (fi->iocachectr > 0) { err = -ETXTBSY; goto unlock; @@ -84,6 +102,14 @@ int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) WARN_ON(ff->iomode != IOM_NONE); fi->iocachectr--; ff->iomode = IOM_UNCACHED; + + /* fuse inode holds a single refcount of backing file */ + if (!oldfb) { + oldfb = fuse_inode_backing_set(fi, fb); + WARN_ON_ONCE(oldfb != NULL); + } else { + fuse_backing_put(fb); + } unlock: spin_unlock(&fi->lock); return err; @@ -92,15 +118,20 @@ int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff) void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_backing *oldfb = NULL; spin_lock(&fi->lock); WARN_ON(fi->iocachectr >= 0); WARN_ON(ff->iomode != IOM_UNCACHED); ff->iomode = IOM_NONE; fi->iocachectr++; - if (!fi->iocachectr) + if (!fi->iocachectr) { wake_up(&fi->direct_io_waitq); + oldfb = fuse_inode_backing_set(fi, NULL); + } spin_unlock(&fi->lock); + if (oldfb) + fuse_backing_put(oldfb); } /* @@ -118,6 +149,7 @@ static int fuse_file_passthrough_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_backing *fb; int err; /* Check allowed conditions for file open in passthrough mode */ @@ -125,11 +157,18 @@ static int fuse_file_passthrough_open(struct inode *inode, struct file *file) (ff->open_flags & ~FOPEN_PASSTHROUGH_MASK)) return -EINVAL; - /* TODO: implement backing file open */ - return -EOPNOTSUPP; + fb = fuse_passthrough_open(file, inode, + ff->args->open_outarg.backing_id); + if (IS_ERR(fb)) + return PTR_ERR(fb); /* First passthrough file open denies caching inode io mode */ - err = fuse_file_uncached_io_start(inode, ff); + err = fuse_file_uncached_io_start(inode, ff, fb); + if (!err) + return 0; + + fuse_passthrough_release(ff, fb); + fuse_backing_put(fb); return err; } @@ -138,6 +177,7 @@ static int fuse_file_passthrough_open(struct inode *inode, struct file *file) int fuse_file_io_open(struct file *file, struct inode *inode) { struct fuse_file *ff = file->private_data; + struct fuse_inode *fi = get_fuse_inode(inode); int err; /* @@ -147,6 +187,14 @@ int fuse_file_io_open(struct file *file, struct inode *inode) if (FUSE_IS_DAX(inode) || !ff->args) return 0; + /* + * Server is expected to use FOPEN_PASSTHROUGH for all opens of an inode + * which is already open for passthrough. + */ + err = -EINVAL; + if (fuse_inode_backing(fi) && !(ff->open_flags & FOPEN_PASSTHROUGH)) + goto fail; + /* * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO. */ diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index 7ec92a54c2e0..dc054d2ab13e 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -8,6 +8,7 @@ #include "fuse_i.h" #include <linux/file.h> +#include <linux/backing-file.h> struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) { @@ -164,3 +165,61 @@ int fuse_backing_close(struct fuse_conn *fc, int backing_id) return err; } + +/* + * Setup passthrough to a backing file. + * + * Returns an fb object with elevated refcount to be stored in fuse inode. + */ +struct fuse_backing *fuse_passthrough_open(struct file *file, + struct inode *inode, + int backing_id) +{ + struct fuse_file *ff = file->private_data; + struct fuse_conn *fc = ff->fm->fc; + struct fuse_backing *fb = NULL; + struct file *backing_file; + int err; + + err = -EINVAL; + if (backing_id <= 0) + goto out; + + rcu_read_lock(); + fb = idr_find(&fc->backing_files_map, backing_id); + fb = fuse_backing_get(fb); + rcu_read_unlock(); + + err = -ENOENT; + if (!fb) + goto out; + + /* Allocate backing file per fuse file to store fuse path */ + backing_file = backing_file_open(&file->f_path, file->f_flags, + &fb->file->f_path, fb->cred); + err = PTR_ERR(backing_file); + if (IS_ERR(backing_file)) { + fuse_backing_put(fb); + goto out; + } + + err = 0; + ff->passthrough = backing_file; + ff->cred = get_cred(fb->cred); +out: + pr_debug("%s: backing_id=%d, fb=0x%p, backing_file=0x%p, err=%i\n", __func__, + backing_id, fb, ff->passthrough, err); + + return err ? ERR_PTR(err) : fb; +} + +void fuse_passthrough_release(struct fuse_file *ff, struct fuse_backing *fb) +{ + pr_debug("%s: fb=0x%p, backing_file=0x%p\n", __func__, + fb, ff->passthrough); + + fput(ff->passthrough); + ff->passthrough = NULL; + put_cred(ff->cred); + ff->cred = NULL; +} -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 57e1176e6086673d31bf0a0dc58e144c8e65e589 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Use the backing file read/write helpers to implement read/write passthrough to a backing file. After read/write, we invalidate a/c/mtime/size attributes. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 3 ++ fs/fuse/file.c | 18 +++++++---- fs/fuse/passthrough.c | 69 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 84 insertions(+), 6 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index a00e1be2ab5b..dbf0a755cc5e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1477,4 +1477,7 @@ static inline struct file *fuse_file_passthrough(struct fuse_file *ff) #endif } +ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter); +ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter); + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d192c44e8a5d..daa94bc4041f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1716,10 +1716,13 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (FUSE_IS_DAX(inode)) return fuse_dax_read_iter(iocb, to); - if (!(ff->open_flags & FOPEN_DIRECT_IO)) - return fuse_cache_read_iter(iocb, to); - else + /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */ + if (ff->open_flags & FOPEN_DIRECT_IO) return fuse_direct_read_iter(iocb, to); + else if (fuse_file_passthrough(ff)) + return fuse_passthrough_read_iter(iocb, to); + else + return fuse_cache_read_iter(iocb, to); } static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) @@ -1734,10 +1737,13 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (FUSE_IS_DAX(inode)) return fuse_dax_write_iter(iocb, from); - if (!(ff->open_flags & FOPEN_DIRECT_IO)) - return fuse_cache_write_iter(iocb, from); - else + /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */ + if (ff->open_flags & FOPEN_DIRECT_IO) return fuse_direct_write_iter(iocb, from); + else if (fuse_file_passthrough(ff)) + return fuse_passthrough_write_iter(iocb, from); + else + return fuse_cache_write_iter(iocb, from); } static void fuse_writepage_free(struct fuse_writepage_args *wpa) diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index dc054d2ab13e..0e5d316bdad3 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -10,6 +10,75 @@ #include <linux/file.h> #include <linux/backing-file.h> +static void fuse_file_accessed(struct file *file) +{ + struct inode *inode = file_inode(file); + + fuse_invalidate_atime(inode); +} + +static void fuse_file_modified(struct file *file) +{ + struct inode *inode = file_inode(file); + + fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); +} + +ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + struct fuse_file *ff = file->private_data; + struct file *backing_file = fuse_file_passthrough(ff); + size_t count = iov_iter_count(iter); + ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ff->cred, + .user_file = file, + .accessed = fuse_file_accessed, + }; + + + pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu\n", __func__, + backing_file, iocb->ki_pos, count); + + if (!count) + return 0; + + ret = backing_file_read_iter(backing_file, iter, iocb, iocb->ki_flags, + &ctx); + + return ret; +} + +ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, + struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct fuse_file *ff = file->private_data; + struct file *backing_file = fuse_file_passthrough(ff); + size_t count = iov_iter_count(iter); + ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ff->cred, + .user_file = file, + .end_write = fuse_file_modified, + }; + + pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu\n", __func__, + backing_file, iocb->ki_pos, count); + + if (!count) + return 0; + + inode_lock(inode); + ret = backing_file_write_iter(backing_file, iter, iocb, iocb->ki_flags, + &ctx); + inode_unlock(inode); + + return ret; +} + struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) { if (fb && refcount_inc_not_zero(&fb->count)) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 5ca73468612d8e0767614992da8decc7f9f48926 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This allows passing fstests generic/249 and generic/591. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/file.c | 29 ++++++++++++++++++++++++++-- fs/fuse/passthrough.c | 45 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 78 insertions(+), 2 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index dbf0a755cc5e..e71411a20ad5 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1479,5 +1479,11 @@ static inline struct file *fuse_file_passthrough(struct fuse_file *ff) ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter); ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter); +ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags); +ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, + size_t len, unsigned int flags); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/file.c b/fs/fuse/file.c index daa94bc4041f..a4ab68c0143d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1746,6 +1746,31 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) return fuse_cache_write_iter(iocb, from); } +static ssize_t fuse_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + struct fuse_file *ff = in->private_data; + + /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */ + if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO)) + return fuse_passthrough_splice_read(in, ppos, pipe, len, flags); + else + return filemap_splice_read(in, ppos, pipe, len, flags); +} + +static ssize_t fuse_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + struct fuse_file *ff = out->private_data; + + /* FOPEN_DIRECT_IO overrides FOPEN_PASSTHROUGH */ + if (fuse_file_passthrough(ff) && !(ff->open_flags & FOPEN_DIRECT_IO)) + return fuse_passthrough_splice_write(pipe, out, ppos, len, flags); + else + return iter_file_splice_write(pipe, out, ppos, len, flags); +} + static void fuse_writepage_free(struct fuse_writepage_args *wpa) { struct fuse_args_pages *ap = &wpa->ia.ap; @@ -3332,8 +3357,8 @@ static const struct file_operations fuse_file_operations = { .lock = fuse_file_lock, .get_unmapped_area = thp_get_unmapped_area, .flock = fuse_file_flock, - .splice_read = filemap_splice_read, - .splice_write = iter_file_splice_write, + .splice_read = fuse_splice_read, + .splice_write = fuse_splice_write, .unlocked_ioctl = fuse_file_ioctl, .compat_ioctl = fuse_file_compat_ioctl, .poll = fuse_file_poll, diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index 0e5d316bdad3..2b119c592f02 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -9,6 +9,7 @@ #include <linux/file.h> #include <linux/backing-file.h> +#include <linux/splice.h> static void fuse_file_accessed(struct file *file) { @@ -79,6 +80,50 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, return ret; } +ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + struct fuse_file *ff = in->private_data; + struct file *backing_file = fuse_file_passthrough(ff); + struct backing_file_ctx ctx = { + .cred = ff->cred, + .user_file = in, + .accessed = fuse_file_accessed, + }; + + pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu, flags=0x%x\n", __func__, + backing_file, ppos ? *ppos : 0, len, flags); + + return backing_file_splice_read(backing_file, ppos, pipe, len, flags, + &ctx); +} + +ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, + size_t len, unsigned int flags) +{ + struct fuse_file *ff = out->private_data; + struct file *backing_file = fuse_file_passthrough(ff); + struct inode *inode = file_inode(out); + ssize_t ret; + struct backing_file_ctx ctx = { + .cred = ff->cred, + .user_file = out, + .end_write = fuse_file_modified, + }; + + pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu, flags=0x%x\n", __func__, + backing_file, ppos ? *ppos : 0, len, flags); + + inode_lock(inode); + ret = backing_file_splice_write(pipe, backing_file, ppos, len, flags, + &ctx); + inode_unlock(inode); + + return ret; +} + struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) { if (fb && refcount_inc_not_zero(&fb->count)) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit fda0b98ef0a6a2e3fe328b869d53002c8c82001b category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- An mmap request for a file open in passthrough mode, maps the memory directly to the backing file. An mmap of a file in direct io mode, usually uses cached mmap and puts the inode in caching io mode, which denies new passthrough opens of that inode, because caching io mode is conflicting with passthrough io mode. For the same reason, trying to mmap a direct io file, while there is a passthrough file open on the same inode will fail with -ENODEV. An mmap of a file in direct io mode, also needs to wait for parallel dio writes in-progress to complete. If a passthrough file is opened, while an mmap of another direct io file is waiting for parallel dio writes to complete, the wait is aborted and mmap fails with -ENODEV. A FUSE server that uses passthrough and direct io opens on the same inode that may also be mmaped, is advised to provide a backing fd also for the files that are open in direct io mode (i.e. use the flags combination FOPEN_DIRECT_IO | FOPEN_PASSTHROUGH), so that mmap will always use the backing file, even if read/write do not passthrough. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 1 + fs/fuse/file.c | 13 ++++++++++--- fs/fuse/passthrough.c | 16 ++++++++++++++++ 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e71411a20ad5..f486de526d0e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1485,5 +1485,6 @@ ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos, ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags); +ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a4ab68c0143d..35876050e1df 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2584,14 +2584,21 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fm->fc; + struct inode *inode = file_inode(file); int rc; /* DAX mmap is superior to direct_io mmap */ - if (FUSE_IS_DAX(file_inode(file))) + if (FUSE_IS_DAX(inode)) return fuse_dax_mmap(file, vma); - /* TODO: implement mmap to backing file */ + /* + * If inode is in passthrough io mode, because it has some file open + * in passthrough mode, either mmap to backing file or fail mmap, + * because mixing cached mmap and passthrough io mode is not allowed. + */ if (fuse_file_passthrough(ff)) + return fuse_passthrough_mmap(file, vma); + else if (fuse_inode_backing(get_fuse_inode(inode))) return -ENODEV; /* @@ -2618,7 +2625,7 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) * Also waits for parallel dio writers to go into serial mode * (exclusive instead of shared lock). */ - rc = fuse_file_cached_io_start(file_inode(file), ff); + rc = fuse_file_cached_io_start(inode, ff); if (rc) return rc; } diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index 2b119c592f02..1567f0323858 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -124,6 +124,22 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, return ret; } +ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct fuse_file *ff = file->private_data; + struct file *backing_file = fuse_file_passthrough(ff); + struct backing_file_ctx ctx = { + .cred = ff->cred, + .user_file = file, + .accessed = fuse_file_accessed, + }; + + pr_debug("%s: backing_file=0x%p, start=%lu, end=%lu\n", __func__, + backing_file, vma->vm_start, vma->vm_end); + + return backing_file_mmap(backing_file, vma, &ctx); +} + struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) { if (fb && refcount_inc_not_zero(&fb->count)) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc5 commit 4864a6dd8320ad856698f93009c89f66ccb1653f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There is a confusion with fuse_file_uncached_io_{start,end} interface. These helpers do two things when called from passthrough open()/release(): 1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode) 2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open) The calls from parallel dio write path need to take a reference on fi->iocachectr, but they should not be changing ff->iomode state, because in this case, the fi->iocachectr reference does not stick around until file release(). Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from parallel dio write path and rename fuse_file_*cached_io_{start,end} helpers to fuse_file_*cached_io_{open,release} to clarify the difference. Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/fuse_i.h | 7 +++--- fs/fuse/file.c | 12 ++++++----- fs/fuse/inode.c | 1 + fs/fuse/iomode.c | 56 +++++++++++++++++++++++++++++++++--------------- 4 files changed, 51 insertions(+), 25 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f486de526d0e..59983175b348 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1410,9 +1410,10 @@ int fuse_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); /* iomode.c */ -int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff); -int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struct fuse_backing *fb); -void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff); +int fuse_file_cached_io_open(struct inode *inode, struct fuse_file *ff); +int fuse_inode_uncached_io_start(struct fuse_inode *fi, + struct fuse_backing *fb); +void fuse_inode_uncached_io_end(struct fuse_inode *fi); int fuse_file_io_open(struct file *file, struct inode *inode); void fuse_file_io_release(struct fuse_file *ff, struct inode *inode); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 35876050e1df..ea13f544c57f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1368,7 +1368,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, bool *exclusive) { struct inode *inode = file_inode(iocb->ki_filp); - struct fuse_file *ff = iocb->ki_filp->private_data; + struct fuse_inode *fi = get_fuse_inode(inode); *exclusive = fuse_dio_wr_exclusive_lock(iocb, from); if (*exclusive) { @@ -1383,7 +1383,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, * have raced, so check it again. */ if (fuse_io_past_eof(iocb, from) || - fuse_file_uncached_io_start(inode, ff, NULL) != 0) { + fuse_inode_uncached_io_start(fi, NULL) != 0) { inode_unlock_shared(inode); inode_lock(inode); *exclusive = true; @@ -1394,13 +1394,13 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) { struct inode *inode = file_inode(iocb->ki_filp); - struct fuse_file *ff = iocb->ki_filp->private_data; + struct fuse_inode *fi = get_fuse_inode(inode); if (exclusive) { inode_unlock(inode); } else { /* Allow opens in caching mode after last parallel dio end */ - fuse_file_uncached_io_end(inode, ff); + fuse_inode_uncached_io_end(fi); inode_unlock_shared(inode); } } @@ -2624,8 +2624,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) * First mmap of direct_io file enters caching inode io mode. * Also waits for parallel dio writers to go into serial mode * (exclusive instead of shared lock). + * After first mmap, the inode stays in caching io mode until + * the direct_io file release. */ - rc = fuse_file_cached_io_start(inode, ff); + rc = fuse_file_cached_io_open(inode, ff); if (rc) return rc; } diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index d5f9526b76ec..a5617b04ebcb 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -175,6 +175,7 @@ static void fuse_evict_inode(struct inode *inode) } } if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) { + WARN_ON(fi->iocachectr != 0); WARN_ON(!list_empty(&fi->write_files)); WARN_ON(!list_empty(&fi->queued_writes)); } diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index c653ddcf0578..98f1fd523dae 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -21,12 +21,13 @@ static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi) } /* - * Start cached io mode. + * Called on cached file open() and on first mmap() of direct_io file. + * Takes cached_io inode mode reference to be dropped on file release. * * Blocks new parallel dio writes and waits for the in-progress parallel dio * writes to complete. */ -int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) +int fuse_file_cached_io_open(struct inode *inode, struct fuse_file *ff) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -67,10 +68,9 @@ int fuse_file_cached_io_start(struct inode *inode, struct fuse_file *ff) return 0; } -static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) +static void fuse_file_cached_io_release(struct fuse_file *ff, + struct fuse_inode *fi) { - struct fuse_inode *fi = get_fuse_inode(inode); - spin_lock(&fi->lock); WARN_ON(fi->iocachectr <= 0); WARN_ON(ff->iomode != IOM_CACHED); @@ -82,9 +82,8 @@ static void fuse_file_cached_io_end(struct inode *inode, struct fuse_file *ff) } /* Start strictly uncached io mode where cache access is not allowed */ -int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struct fuse_backing *fb) +int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backing *fb) { - struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_backing *oldfb; int err = 0; @@ -99,9 +98,7 @@ int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struc err = -ETXTBSY; goto unlock; } - WARN_ON(ff->iomode != IOM_NONE); fi->iocachectr--; - ff->iomode = IOM_UNCACHED; /* fuse inode holds a single refcount of backing file */ if (!oldfb) { @@ -115,15 +112,29 @@ int fuse_file_uncached_io_start(struct inode *inode, struct fuse_file *ff, struc return err; } -void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) +/* Takes uncached_io inode mode reference to be dropped on file release */ +static int fuse_file_uncached_io_open(struct inode *inode, + struct fuse_file *ff, + struct fuse_backing *fb) { struct fuse_inode *fi = get_fuse_inode(inode); + int err; + + err = fuse_inode_uncached_io_start(fi, fb); + if (err) + return err; + + WARN_ON(ff->iomode != IOM_NONE); + ff->iomode = IOM_UNCACHED; + return 0; +} + +void fuse_inode_uncached_io_end(struct fuse_inode *fi) +{ struct fuse_backing *oldfb = NULL; spin_lock(&fi->lock); WARN_ON(fi->iocachectr >= 0); - WARN_ON(ff->iomode != IOM_UNCACHED); - ff->iomode = IOM_NONE; fi->iocachectr++; if (!fi->iocachectr) { wake_up(&fi->direct_io_waitq); @@ -134,6 +145,15 @@ void fuse_file_uncached_io_end(struct inode *inode, struct fuse_file *ff) fuse_backing_put(oldfb); } +/* Drop uncached_io reference from passthrough open */ +static void fuse_file_uncached_io_release(struct fuse_file *ff, + struct fuse_inode *fi) +{ + WARN_ON(ff->iomode != IOM_UNCACHED); + ff->iomode = IOM_NONE; + fuse_inode_uncached_io_end(fi); +} + /* * Open flags that are allowed in combination with FOPEN_PASSTHROUGH. * A combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO means that read/write @@ -163,7 +183,7 @@ static int fuse_file_passthrough_open(struct inode *inode, struct file *file) return PTR_ERR(fb); /* First passthrough file open denies caching inode io mode */ - err = fuse_file_uncached_io_start(inode, ff, fb); + err = fuse_file_uncached_io_open(inode, ff, fb); if (!err) return 0; @@ -216,7 +236,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode) if (ff->open_flags & FOPEN_PASSTHROUGH) err = fuse_file_passthrough_open(inode, file); else - err = fuse_file_cached_io_start(inode, ff); + err = fuse_file_cached_io_open(inode, ff); if (err) goto fail; @@ -236,8 +256,10 @@ int fuse_file_io_open(struct file *file, struct inode *inode) /* No more pending io and no new io possible to inode via open/mmapped file */ void fuse_file_io_release(struct fuse_file *ff, struct inode *inode) { + struct fuse_inode *fi = get_fuse_inode(inode); + /* - * Last parallel dio close allows caching inode io mode. + * Last passthrough file close allows caching inode io mode. * Last caching file close exits caching inode io mode. */ switch (ff->iomode) { @@ -245,10 +267,10 @@ void fuse_file_io_release(struct fuse_file *ff, struct inode *inode) /* Nothing to do */ break; case IOM_UNCACHED: - fuse_file_uncached_io_end(inode, ff); + fuse_file_uncached_io_release(ff, fi); break; case IOM_CACHED: - fuse_file_cached_io_end(inode, ff); + fuse_file_cached_io_release(ff, fi); break; } } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc5 commit 7cc911262835419fe469ebfae89891c0e97c62ef category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Parallel dio write takes a negative refcount of fi->iocachectr and so does open of file in passthrough mode. The refcount of passthrough mode is associated with attach/detach of a fuse_backing object to fuse inode. For parallel dio write, the backing file is irrelevant, so the call to fuse_inode_uncached_io_start() passes a NULL fuse_backing object. Passing a NULL fuse_backing will result in false -EBUSY error if the file is already open in passthrough mode. Allow taking negative fi->iocachectr refcount with NULL fuse_backing, because it does not conflict with an already attached fuse_backing object. Fixes: 4a90451bbc7f ("fuse: implement open in passthrough mode") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/iomode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index 98f1fd523dae..c99e285f3183 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -90,7 +90,7 @@ int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backing *fb) spin_lock(&fi->lock); /* deny conflicting backing files on same fuse inode */ oldfb = fuse_inode_backing(fi); - if (oldfb && oldfb != fb) { + if (fb && oldfb && oldfb != fb) { err = -EBUSY; goto unlock; } @@ -101,7 +101,7 @@ int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backing *fb) fi->iocachectr--; /* fuse inode holds a single refcount of backing file */ - if (!oldfb) { + if (fb && !oldfb) { oldfb = fuse_inode_backing_set(fi, fb); WARN_ON_ONCE(oldfb != NULL); } else { -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9 commit aef8acd79f363ced098cd3bcde0a5978a52607ad category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- To allow us extending the interface in the future. Fixes: 44350256ab94 ("fuse: implement ioctls to manage backing files") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/passthrough.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index 1567f0323858..9666d13884ce 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -225,7 +225,7 @@ int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map) goto out; res = -EINVAL; - if (map->flags) + if (map->flags || map->padding) goto out; file = fget(map->fd); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.13-rc7 commit 03f275adb8fbd7b4ebe96a1ad5044d8e602692dc category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The re-factoring of fuse_dir_open() missed the need to invalidate directory inode page cache with open flag FOPEN_KEEP_CACHE. Fixes: 7de64d521bf92 ("fuse: break up fuse_open_common()") Reported-by: Prince Kumar <princer@google.com> Closes: https://lore.kernel.org/linux-fsdevel/CAEW=TRr7CYb4LtsvQPLj-zx5Y+EYBmGfM24Su... Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250101130037.96680-1-amir73il@gmail.com Reviewed-by: Bernd Schubert <bernd.schubert@fastmail.fm> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/dir.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1ca3d3643d55..2b5567142529 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1660,6 +1660,8 @@ static int fuse_dir_open(struct inode *inode, struct file *file) */ if (ff->open_flags & (FOPEN_STREAM | FOPEN_NONSEEKABLE)) nonseekable_open(inode, file); + if (!(ff->open_flags & FOPEN_KEEP_CACHE)) + invalidate_inode_pages2(inode->i_mapping); } return err; -- 2.39.2

From: yangyun <yangyun50@huawei.com> mainline inclusion from mainline-v6.12-rc1 commit 2f3d8ff457982f4055fe8f7bf19d3821ba22c376 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This may be a typo. The comment has said shared locks are not allowed when this bit is set. If using shared lock, the wait in `fuse_file_cached_io_open` may be forever. Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") CC: stable@vger.kernel.org # v6.9 Signed-off-by: yangyun <yangyun50@huawei.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ea13f544c57f..25d6951ef2c0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1355,7 +1355,7 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from /* shared locks are not allowed with parallel page cache IO */ if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state)) - return false; + return true; /* Parallel dio beyond EOF is not supported, at least for now. */ if (fuse_io_past_eof(iocb, from)) -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.7-rc7 commit 413ba91089c74207313b315e04cf381ffb5b20e4 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- syzbot excercised the forbidden practice of moving the workdir under lowerdir while overlayfs is mounted and tripped a dentry reference leak. Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held") Reported-and-tested-by: syzbot+8608bb4553edb8c78f41@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/copy_up.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index b8119520300d..b6573fa863e9 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -776,15 +776,16 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) path.dentry = temp; err = ovl_copy_up_data(c, &path); /* - * We cannot hold lock_rename() throughout this helper, because or + * We cannot hold lock_rename() throughout this helper, because of * lock ordering with sb_writers, which shouldn't be held when calling * ovl_copy_up_data(), so lock workdir and destdir and make sure that * temp wasn't moved before copy up completion or cleanup. - * If temp was moved, abort without the cleanup. */ ovl_start_write(c->dentry); if (lock_rename(c->workdir, c->destdir) != NULL || temp->d_parent != c->workdir) { + /* temp or workdir moved underneath us? abort without cleanup */ + dput(temp); err = -EIO; goto unlock; } else if (err) { -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 77a28aa476873048024ad56daf8f4f17d58ee48e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- syzbot hit an assertion in copy up data loop which looks like it is the result of a lower file whose size is being changed underneath overlayfs. This type of use case is documented to cause undefined behavior, so returning EIO error for the copy up makes sense, but it should not be causing a WARN_ON assertion. Reported-and-tested-by: syzbot+3abd99031b42acf367ef@syzkaller.appspotmail.com Fixes: ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/overlayfs/copy_up.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index b6573fa863e9..5da41eba056d 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -234,11 +234,11 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen) { loff_t tmp; - if (WARN_ON_ONCE(pos != pos2)) + if (pos != pos2) return -EIO; - if (WARN_ON_ONCE(pos < 0 || len < 0 || totlen < 0)) + if (pos < 0 || len < 0 || totlen < 0) return -EIO; - if (WARN_ON_ONCE(check_add_overflow(pos, len, &tmp))) + if (check_add_overflow(pos, len, &tmp)) return -EIO; return 0; } -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.8-rc5 commit 853b8d7597eea4ccaaefbcf0942cd42fc86d542a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- commit dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") moved the permission hooks from do_clone_file_range() out to its caller vfs_clone_file_range(), but left all the fast sanity checks in do_clone_file_range(). This makes the expensive security hooks be called in situations that they would not have been called before (e.g. fs does not support clone). The only reason for the do_clone_file_range() helper was that overlayfs did not use to be able to call vfs_clone_file_range() from copy up context with sb_writers lock held. However, since commit c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held"), overlayfs just uses an open coded version of vfs_clone_file_range(). Merge_clone_file_range() into vfs_clone_file_range(), restoring the original order of checks as it was before the regressing commit and adapt the overlayfs code to call vfs_clone_file_range() before the permission hooks that were added by commit ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()"). Note that in the merge of do_clone_file_range(), the file_start_write() context was reduced to cover ->remap_file_range() without holding it over the permission hooks, which was the reason for doing the regressing commit in the first place. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com Fixes: dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 3 --- fs/overlayfs/copy_up.c | 14 ++++++-------- fs/remap_range.c | 31 +++++++++---------------------- 3 files changed, 15 insertions(+), 33 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 9dc42354a781..d2b0534fe85f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2166,9 +2166,6 @@ int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); -extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t len, unsigned int remap_flags); extern loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 5da41eba056d..d02b4e3ae724 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -265,20 +265,18 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, if (IS_ERR(old_file)) return PTR_ERR(old_file); + /* Try to use clone_file_range to clone up within the same fs */ + cloned = vfs_clone_file_range(old_file, 0, new_file, 0, len, 0); + if (cloned == len) + goto out_fput; + + /* Couldn't clone, so now we try to copy the data */ error = rw_verify_area(READ, old_file, &old_pos, len); if (!error) error = rw_verify_area(WRITE, new_file, &new_pos, len); if (error) goto out_fput; - /* Try to use clone_file_range to clone up within the same fs */ - ovl_start_write(dentry); - cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0); - ovl_end_write(dentry); - if (cloned == len) - goto out_fput; - /* Couldn't clone, so now we try to copy the data */ - /* Check if lower fs supports seek operation */ if (old_file->f_mode & FMODE_LSEEK) skip_hole = true; diff --git a/fs/remap_range.c b/fs/remap_range.c index 12131f2a6c9e..cd4dbc11125f 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -367,9 +367,9 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, } EXPORT_SYMBOL(generic_remap_file_range_prep); -loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t len, unsigned int remap_flags) +loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int remap_flags) { loff_t ret; @@ -385,23 +385,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, if (!file_in->f_op->remap_file_range) return -EOPNOTSUPP; - ret = file_in->f_op->remap_file_range(file_in, pos_in, - file_out, pos_out, len, remap_flags); - if (ret < 0) - return ret; - - fsnotify_access(file_in); - fsnotify_modify(file_out); - return ret; -} -EXPORT_SYMBOL(do_clone_file_range); - -loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t len, unsigned int remap_flags) -{ - loff_t ret; - ret = remap_verify_area(file_in, pos_in, len, false); if (ret) return ret; @@ -411,10 +394,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in, return ret; file_start_write(file_out); - ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len, - remap_flags); + ret = file_in->f_op->remap_file_range(file_in, pos_in, + file_out, pos_out, len, remap_flags); file_end_write(file_out); + if (ret < 0) + return ret; + fsnotify_access(file_in); + fsnotify_modify(file_out); return ret; } EXPORT_SYMBOL(vfs_clone_file_range); -- 2.39.2

From: Vegard Nossum <vegard.nossum@oracle.com> mainline inclusion from mainline-v6.8-rc1 commit c39e2ae3943d4ee278af4e1b1dcfd5946da1089b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When running 'make htmldocs', I see the following warning: Documentation/filesystems/api-summary:14: ./include/linux/fs.h:1659: WARNING: Definition list ends without a blank line; unexpected unindent. The official guidance [1] seems to be to use lists, which will prevent both the "unexpected unindent" warning as well as ensure that each line is formatted on a separate line in the HTML output instead of being all considered a single paragraph. [1]: https://docs.kernel.org/doc-guide/kernel-doc.html#return-values Fixes: 8802e580ee64 ("fs: create __sb_write_started() helper") Cc: Amir Goldstein <amir73il@gmail.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Link: https://lore.kernel.org/r/20231228100608.3123987-1-vegard.nossum@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/fs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index d2b0534fe85f..d0a1537049d7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1696,9 +1696,9 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level) * @sb: the super we write to * @level: the freeze level * - * > 0 sb freeze level is held - * 0 sb freeze level is not held - * < 0 !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN + * * > 0 - sb freeze level is held + * * 0 - sb freeze level is not held + * * < 0 - !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN */ static inline int __sb_write_started(const struct super_block *sb, int level) { -- 2.39.2

From: Bernd Schubert <bschubert@ddn.com> mainline inclusion from mainline-v6.11-rc7 commit 3ab394b363c5fd14b231e335fb6746ddfb93aaaa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Current design and handling of passthrough is without fuse caching and with that FUSE_WRITEBACK_CACHE is conflicting. Fixes: 7dc4e97a4f9a ("fuse: introduce FUSE_PASSTHROUGH capability") Cc: stable@kernel.org # v6.9 Signed-off-by: Bernd Schubert <bschubert@ddn.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index a5617b04ebcb..d771ce993e3e 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1333,11 +1333,16 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, * on a stacked fs (e.g. overlayfs) themselves and with * max_stack_depth == 1, FUSE fs can be stacked as the * underlying fs of a stacked fs (e.g. overlayfs). + * + * Also don't allow the combination of FUSE_PASSTHROUGH + * and FUSE_WRITEBACK_CACHE, current design doesn't handle + * them together. */ if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) && (flags & FUSE_PASSTHROUGH) && arg->max_stack_depth > 0 && - arg->max_stack_depth <= FILESYSTEM_MAX_STACK_DEPTH) { + arg->max_stack_depth <= FILESYSTEM_MAX_STACK_DEPTH && + !(flags & FUSE_WRITEBACK_CACHE)) { fc->passthrough = 1; fc->max_stack_depth = arg->max_stack_depth; fm->sb->s_stack_depth = arg->max_stack_depth; -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.12-rc5 commit f03b296e8b516dbd63f57fc9056c1b0da1b9a0ff category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- This is needed for extending fuse inode size after fuse passthrough write. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-fsdevel/CAJfpegs=cvZ_NYy6Q_D42XhYS=Sjj5poM1b5T... Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- include/linux/backing-file.h | 2 +- fs/backing-file.c | 8 ++++---- fs/fuse/passthrough.c | 6 +++--- fs/overlayfs/file.c | 9 +++++++-- 4 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h index 3f1fe1774f1b..98e0b6c30193 100644 --- a/include/linux/backing-file.h +++ b/include/linux/backing-file.h @@ -16,7 +16,7 @@ struct backing_file_ctx { const struct cred *cred; struct file *user_file; void (*accessed)(struct file *); - void (*end_write)(struct file *); + void (*end_write)(struct file *, loff_t, ssize_t); }; struct file *backing_file_open(const struct path *user_path, int flags, diff --git a/fs/backing-file.c b/fs/backing-file.c index a681f38d84d8..247fd7217b42 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -57,7 +57,7 @@ struct backing_aio { refcount_t ref; struct kiocb *orig_iocb; /* used for aio completion */ - void (*end_write)(struct file *); + void (*end_write)(struct file *, loff_t, ssize_t); struct work_struct work; long res; }; @@ -86,7 +86,7 @@ static void backing_aio_cleanup(struct backing_aio *aio, long res) struct kiocb *orig_iocb = aio->orig_iocb; if (aio->end_write) - aio->end_write(orig_iocb->ki_filp); + aio->end_write(orig_iocb->ki_filp, iocb->ki_pos, res); orig_iocb->ki_pos = iocb->ki_pos; backing_aio_put(aio); @@ -216,7 +216,7 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter, ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf); if (ctx->end_write) - ctx->end_write(ctx->user_file); + ctx->end_write(ctx->user_file, iocb->ki_pos, ret); } else { struct backing_aio *aio; @@ -291,7 +291,7 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, revert_creds(old_cred); if (ctx->end_write) - ctx->end_write(ctx->user_file); + ctx->end_write(ctx->user_file, ppos ? *ppos : 0, ret); return ret; } diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index 9666d13884ce..f0f87d1c9a94 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -18,7 +18,7 @@ static void fuse_file_accessed(struct file *file) fuse_invalidate_atime(inode); } -static void fuse_file_modified(struct file *file) +static void fuse_passthrough_end_write(struct file *file, loff_t pos, ssize_t ret) { struct inode *inode = file_inode(file); @@ -63,7 +63,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct backing_file_ctx ctx = { .cred = ff->cred, .user_file = file, - .end_write = fuse_file_modified, + .end_write = fuse_passthrough_end_write, }; pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu\n", __func__, @@ -110,7 +110,7 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, struct backing_file_ctx ctx = { .cred = ff->cred, .user_file = out, - .end_write = fuse_file_modified, + .end_write = fuse_passthrough_end_write, }; pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu, flags=0x%x\n", __func__, diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 63f4b22ee955..5c5bf9822f27 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -232,6 +232,11 @@ static void ovl_file_modified(struct file *file) ovl_copyattr(file_inode(file)); } +static void ovl_file_end_write(struct file *file, loff_t pos, ssize_t ret) +{ + ovl_file_modified(file); +} + static void ovl_file_accessed(struct file *file) { struct inode *inode, *upperinode; @@ -292,7 +297,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) struct backing_file_ctx ctx = { .cred = ovl_creds(inode->i_sb), .user_file = file, - .end_write = ovl_file_modified, + .end_write = ovl_file_end_write, }; if (!iov_iter_count(iter)) @@ -362,7 +367,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out, struct backing_file_ctx ctx = { .cred = ovl_creds(inode->i_sb), .user_file = out, - .end_write = ovl_file_modified, + .end_write = ovl_file_end_write, }; inode_lock(inode); -- 2.39.2

From: Amir Goldstein <amir73il@gmail.com> mainline inclusion from mainline-v6.12-rc5 commit 20121d3f58f06e977ca43eb6efe1fb23b1d2f6d9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- yangyun reported that libfuse test test_copy_file_range() copies zero bytes from a newly written file when fuse passthrough is enabled. The reason is that extending passthrough write is not updating the fuse inode size and when vfs_copy_file_range() observes a zero size inode, it returns without calling the filesystem copy_file_range() method. Fix this by adjusting the fuse inode size after an extending passthrough write. This does not provide cache coherency of fuse inode attributes and backing inode attributes, but it should prevent situations where fuse inode size is too small, causing read/copy to be wrongly shortened. Reported-by: yangyun <yangyun50@huawei.com> Closes: https://github.com/libfuse/libfuse/issues/1048 Fixes: 57e1176e6086 ("fuse: implement read/write passthrough") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/fuse/passthrough.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index f0f87d1c9a94..d1b570d39501 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -22,7 +22,7 @@ static void fuse_passthrough_end_write(struct file *file, loff_t pos, ssize_t re { struct inode *inode = file_inode(file); - fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); + fuse_write_update_attr(inode, pos, ret); } ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter) -- 2.39.2

From: Ed Tsai <ed.tsai@mediatek.com> mainline inclusion from mainline-v6.11-rc6 commit 996b37da1e0f51314d4186b326742c2a95a9f0dd category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Filesystems may define their own splice write. Therefore, use the file fops instead of invoking iter_file_splice_write() directly. Signed-off-by: Ed Tsai <ed.tsai@mediatek.com> Link: https://lore.kernel.org/r/20240708072208.25244-1-ed.tsai@mediatek.com Fixes: 5ca73468612d ("fuse: implement splice read/write passthrough") Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> --- fs/backing-file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/backing-file.c b/fs/backing-file.c index 247fd7217b42..e1a4b23b255b 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -280,13 +280,16 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe, if (WARN_ON_ONCE(!(out->f_mode & FMODE_BACKING))) return -EIO; + if (!out->f_op->splice_write) + return -EINVAL; + ret = file_remove_privs(ctx->user_file); if (ret) return ret; old_cred = override_creds(ctx->cred); file_start_write(out); - ret = iter_file_splice_write(pipe, out, ppos, len, flags); + ret = out->f_op->splice_write(pipe, out, ppos, len, flags); file_end_write(out); revert_creds(old_cred); -- 2.39.2

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16136 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/KFF... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16136 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/KFF...
participants (2)
-
patchwork bot
-
Yifan Qiao