Changes since v1: * Corrected patch header format.
This patch set enables erofs/fscache/cachefiles to support ondemand mode, including readahead, domain sharing, failover, etc.
In addition to patches from the open source community, we have made some compatibility modifications and quality hardening for this feature: * Added the dynamic switches of erofs_enabled and cachefiles_ondemand_enabled. * The structure of the cache file directory is modified to be the same as that of the mainline to prevent the problem that the existing cache cannot be found due to kernel upgrade and downgrade. * Use the same xattr as that in the mainline to prevent cache failure caused by kernel upgrade or downgrade. * Various bugfixes...
Al Viro (1): erofs: fix handling kern_mount() failure
Baokun Li (29): openeuler_defconfig: enable erofs ondemand for x86 and arm64 fscache: fix reference count leakage during abort init fscache: fix assertion failure in fscache_put_object() erofs: fix lockdep false positives on initializing erofs_pseudo_mnt erofs: remove erofs_fscache_netfs fscache: rename new_location to new_version cachefiles: use mainline xattr in ondemand mode cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read() cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd cachefiles: add consistency check for copen/cread cachefiles: stop sending new request when dropping object cachefiles: flush all requests for the object that is being dropped fscache: limit fscache_object_max_active to avoid blocking cachefiles: add spin_lock for cachefiles_ondemand_info cachefiles: never get a new anon fd if ondemand_id is valid cachefiles: make on-demand read killable fscache: set the default value of object_max_active to 256 cachefiles: flush all requests after setting CACHEFILES_DEAD cachefiles: call cachefiles_ondemand_init_object() out of dir inode lock cachefiles: defer exposing anon_fd until after copy_to_user() succeeds cachefiles: disallow to complete open requests with uninitialised ondemand_id cachefiles: clear FSCACHE_COOKIE_NO_DATA_YET for the new ondemand object erofs: correct the blknr/blkoff calculation in erofs_read_raw_page() fscache: fix op leak due to abort init after parent ready cachefiles: cyclic allocation of msg_id to avoid reuse fscache: fix assertion failure in cachefiles_put_object() cachefiles: prefault in user pages to aovid ABBA deadlock
David Howells (1): cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
Dawei Li (1): erofs: protect s_inodes with s_inode_list_lock for fscache
Gao Xiang (26): erofs: clean up file headers & footers erofs: introduce chunk-based file on-disk format erofs: support reading chunk-based uncompressed files erofs: fix double free of 'copied' erofs: introduce erofs_sb_has_xxx() helpers erofs: decouple basic mount options from fs_context erofs: add multiple device support erofs: clean up erofs_map_blocks tracepoints erofs: introduce meta buffer operations erofs: use meta buffers for inode operations erofs: use meta buffers for super operations erofs: use meta buffers for xattr operations erofs: use meta buffers for zmap operations erofs: fix misbehavior of unsupported chunk format check erofs: add fscache mode check helper erofs: register fscache volume erofs: add fscache context helper functions erofs: add anonymous inode caching metadata for data blobs erofs: register fscache context for primary data blob erofs: register fscache context for extra data blobs erofs: implement fscache-based metadata read erofs: implement fscache-based data read for non-inline layout erofs: implement fscache-based data read for inline layout erofs: add 'fsid' mount option erofs: scan devices from device table erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
Hou Tao (2): erofs: check the uniqueness of fsid in shared domain in advance cachefiles: flush ondemand_object_worker during clean object
Jeffle Xu (9): erofs: use meta buffers for erofs_read_superblock() cachefiles: notify the user daemon when looking up cookie cachefiles: unbind cachefiles gracefully in on-demand mode cachefiles: notify the user daemon when withdrawing cookie cachefiles: implement on-demand read cachefiles: enable on-demand read mode anolis: cachefiles: skip check for n_children in on-demand mode erofs: make erofs_map_blocks() generally available erofs: leave compressed inodes unsupported in fscache mode for now
Jia Zhu (12): cachefiles: narrow the scope of flushed requests when releasing fd erofs: use kill_anon_super() to kill super in fscache mode erofs: code clean up for fscache erofs: introduce fscache-based domain erofs: introduce a pseudo mnt to manage shared cookies erofs: Support sharing cookies in the same domain erofs: introduce 'domain_id' mount option anolis: cachefiles: introduce object ondemand state anolis: cachefiles: extract ondemand info field from cachefiles_object anolis: cachefiles: resend an open request if the read request's object is closed anolis: cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode anolis: cachefiles: add restore command to recover inflight ondemand read requests
Jingbo Xu (18): anolis: cachefiles: replace BUG_ON() with WARN_ON() anolis: cachefiles: fix volume key setup for cachefiles_open anolis: cachefiles: refactor cachefiles_ondemand_daemon_read() anolis: erofs: fix the name of erofs_fscache_super_index_def anolis: cachefiles: maintain a file descriptor to the backing file anolis: fscache,cachefiles: add fscache_prepare_read() helper anolis: erofs: implement fscache-based data readahead erofs: fix use-after-free of fsid and domain_id string erofs: remove unused EROFS_GET_BLOCKS_RAW flag anolis: cachefiles: optimize on-demand IO path with buffer IO anolis: fscache: export fscache_object_wq anolis: cachefiles: reset object->private to NULL when it's freed anolis: cachefiles: add missing lock protection when polling anolis: cachefiles: fix potential NULL in error path erofs: relinquish volume with mutex held erofs: maintain cookies of share domain in self-contained list erofs: remove unused device mapping in meta routine erofs: unify anonymous inodes for blob
Mikulas Patocka (1): wait_on_bit: add an acquire memory barrier
Sun Ke (1): cachefiles: fix error return code in cachefiles_ondemand_copen()
Xin Yin (1): cachefiles: make on-demand request distribution fairer
Yu Kuai (10): fscache: add new helper to determine cachefile location fscache: generate new key_hash for new location cachefiles: factor out helper to generate acc from cachefiles_cook_key() cachefiles: factor out helper to generate csum from cachefiles_cook_key() cachefiles: use volume key directly in cachefiles_cook_key() for new location cachefiles: skip acc in cachefiles_cook_key() for new location cachefiles: skip volum csum cachefiles_cook_key() for new location cachefiles: use key_hash as csum for new location cachefiles: handle the unprintable case for new location in cachefiles_cook_key() erofs: switch to use new location
Yue Hu (1): erofs: don't use erofs_map_blocks() any more
Zizhi Wo (20): erofs: fix page unlock advance during readahead erofs: add erofs switch to better control it erofs: add erofs_ondemand switch fscache: fix kernel BUG at __fscache_read_or_alloc_page fscache: add a waiting mechanism when duplicate cookies are detected fscache: Fix trace UAF in fscache_cookie_put() cachefiles: Fix NULL pointer dereference in object->file cachefiles: Restrict monitor calls to read_page cachefiles: Fix cookie reference count leakage error fscache: Add the unhash_cookie mechanism to fscache_drop_object() cachefiles: Add restrictions to cachefiles_daemon_cull() cachefiles: Change the mark inactive sequence in erofs ondemand mode cachefiles: Set object to close if ondemand_id < 0 in copen s390: provide arch_test_bit_acquire() for architecture s390 wait_on_bit: Add wait_on_bit_acquire() to provide memory barrier fscache: add a memory barrier for FSCACHE_COOKIE_LOOKING_UP fscache/cachefiles: add a memory barrier for waking and waiting fscache/cachefiles: add a memory barrier for page_write fscache: modify fscache_hash_cookie() to enhance security cachefiles: modify inappropriate error return value in cachefiles_daemon_secctx
Documentation/atomic_bitops.txt | 10 +- Documentation/filesystems/erofs.rst | 28 +- arch/arm64/configs/openeuler_defconfig | 9 +- arch/s390/include/asm/bitops.h | 7 + arch/x86/configs/openeuler_defconfig | 9 +- arch/x86/include/asm/bitops.h | 21 + fs/Makefile | 2 + fs/cachefiles/Kconfig | 12 + fs/cachefiles/Makefile | 1 + fs/cachefiles/bind.c | 28 +- fs/cachefiles/daemon.c | 134 +++- fs/cachefiles/interface.c | 34 +- fs/cachefiles/internal.h | 132 ++- fs/cachefiles/key.c | 167 +++- fs/cachefiles/namei.c | 45 +- fs/cachefiles/ondemand.c | 752 ++++++++++++++++++ fs/cachefiles/rdwr.c | 201 ++++- fs/cachefiles/xattr.c | 179 ++++- fs/erofs/Kconfig | 35 +- fs/erofs/Makefile | 2 +- fs/erofs/compress.h | 2 - fs/erofs/data.c | 259 ++++-- fs/erofs/decompressor.c | 5 +- fs/erofs/dir.c | 2 - fs/erofs/erofs_fs.h | 68 +- fs/erofs/fscache.c | 568 +++++++++++++ fs/erofs/inode.c | 98 ++- fs/erofs/internal.h | 161 +++- fs/erofs/namei.c | 2 - fs/erofs/super.c | 353 ++++++-- fs/erofs/tagptr.h | 3 - fs/erofs/utils.c | 2 - fs/erofs/xattr.c | 141 +--- fs/erofs/xattr.h | 2 - fs/erofs/zdata.c | 35 +- fs/erofs/zdata.h | 1 - fs/erofs/zmap.c | 48 +- fs/erofs/zpvec.h | 2 - fs/fs_ctl.c | 43 + fs/fscache/cookie.c | 149 ++-- fs/fscache/internal.h | 1 + fs/fscache/main.c | 17 +- fs/fscache/object.c | 39 +- fs/fscache/page.c | 90 ++- .../bitops/instrumented-non-atomic.h | 12 + include/asm-generic/bitops/non-atomic.h | 14 + include/linux/fs.h | 17 + include/linux/fscache-cache.h | 8 + include/linux/fscache.h | 60 +- include/linux/wait_bit.h | 25 + include/trace/events/cachefiles.h | 10 +- include/trace/events/erofs.h | 7 +- include/uapi/linux/cachefiles.h | 69 ++ kernel/sched/wait_bit.c | 40 + lib/radix-tree.c | 1 + 55 files changed, 3614 insertions(+), 548 deletions(-) create mode 100644 fs/cachefiles/ondemand.c create mode 100644 fs/erofs/fscache.c create mode 100644 fs/fs_ctl.c create mode 100644 include/uapi/linux/cachefiles.h
From: Yue Hu huyue2@yulong.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/edd7e1eee421972399681ee65fb8447...
--------------------------------
ANBZ: #1666
commit 8137824eddd2e790c61c70c20d70a087faca95fa upstream.
Currently, erofs_map_blocks() will be called only from erofs_{bmap, read_raw_page} which are all for uncompressed files. So, the compression branch in erofs_map_blocks() is pointless. Let's remove it and use erofs_map_blocks_flatmode() directly. Also update related comments.
Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Yue Hu huyue2@yulong.com Signed-off-by: Gao Xiang hsiangkao@redhat.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 19 ++----------------- fs/erofs/internal.h | 6 ++---- 2 files changed, 4 insertions(+), 21 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index ea4f693bee22..76c47c34b09f 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -109,21 +109,6 @@ static int erofs_map_blocks_flatmode(struct inode *inode, return err; }
-int erofs_map_blocks(struct inode *inode, - struct erofs_map_blocks *map, int flags) -{ - if (erofs_inode_is_data_compressed(EROFS_I(inode)->datalayout)) { - int err = z_erofs_map_blocks_iter(inode, map, flags); - - if (map->mpage) { - put_page(map->mpage); - map->mpage = NULL; - } - return err; - } - return erofs_map_blocks_flatmode(inode, map, flags); -} - static inline struct bio *erofs_read_raw_page(struct bio *bio, struct address_space *mapping, struct page *page, @@ -159,7 +144,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, erofs_blk_t blknr; unsigned int blkoff;
- err = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + err = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW); if (err) goto err_out;
@@ -326,7 +311,7 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block) return 0; }
- if (!erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW)) + if (!erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW)) return erofs_blknr(map.m_pa);
return 0; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 5b3a9f5c282d..c0eb1145dfde 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -258,7 +258,7 @@ extern const struct address_space_operations erofs_raw_access_aops; extern const struct address_space_operations z_erofs_aops;
/* - * Logical to physical block mapping, used by erofs_map_blocks() + * Logical to physical block mapping * * Different with other file systems, it is used for 2 access modes: * @@ -305,7 +305,7 @@ struct erofs_map_blocks { struct page *mpage; };
-/* Flags used by erofs_map_blocks() */ +/* Flags used by erofs_map_blocks_flatmode() */ #define EROFS_GET_BLOCKS_RAW 0x0001
/* zmap.c */ @@ -327,8 +327,6 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode, /* data.c */ struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr);
-int erofs_map_blocks(struct inode *, struct erofs_map_blocks *, int); - /* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid) {
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/7aa3cb9337a8e35dad1eb21ed1798e2...
--------------------------------
ANBZ: #1666
commit c5fcb51111b85323cafe3f02784f7f0bf6a7cf07 upstream.
- Remove my outdated misleading email address;
- Get rid of all unnecessary trailing newline by accident.
Link: https://lore.kernel.org/r/20210602160634.10757-1-xiang@kernel.org Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/Kconfig | 1 - fs/erofs/compress.h | 2 -- fs/erofs/data.c | 2 -- fs/erofs/decompressor.c | 2 -- fs/erofs/dir.c | 2 -- fs/erofs/erofs_fs.h | 2 -- fs/erofs/inode.c | 2 -- fs/erofs/internal.h | 2 -- fs/erofs/namei.c | 2 -- fs/erofs/super.c | 2 -- fs/erofs/tagptr.h | 3 --- fs/erofs/utils.c | 2 -- fs/erofs/xattr.c | 2 -- fs/erofs/xattr.h | 1 - fs/erofs/zdata.c | 2 -- fs/erofs/zdata.h | 1 - fs/erofs/zmap.c | 2 -- fs/erofs/zpvec.h | 2 -- 18 files changed, 34 deletions(-)
diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index 74b0aaa7114c..f68447fbd246 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -89,4 +89,3 @@ config EROFS_FS_CLUSTER_PAGE_LIMIT into 8k-unit, hard limit should not be configured less than 2. Otherwise, the image will be refused to mount on this kernel. - diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h index 3d452443c545..e9b1aa0afe0e 100644 --- a/fs/erofs/compress.h +++ b/fs/erofs/compress.h @@ -2,7 +2,6 @@ /* * Copyright (C) 2019 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_FS_COMPRESS_H #define __EROFS_FS_COMPRESS_H @@ -57,4 +56,3 @@ int z_erofs_decompress(struct z_erofs_decompress_req *rq, struct list_head *pagepool);
#endif - diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 76c47c34b09f..60f74fad069b 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "internal.h" #include <linux/prefetch.h> @@ -323,4 +322,3 @@ const struct address_space_operations erofs_raw_access_aops = { .readahead = erofs_raw_access_readahead, .bmap = erofs_bmap, }; - diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 36693924db18..964b074d0182 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2019 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "compress.h" #include <linux/module.h> @@ -350,4 +349,3 @@ int z_erofs_decompress(struct z_erofs_decompress_req *rq, return z_erofs_shifted_transform(rq, pagepool); return z_erofs_decompress_generic(rq, pagepool); } - diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c index 2776bb832127..eee9b0b31b63 100644 --- a/fs/erofs/dir.c +++ b/fs/erofs/dir.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "internal.h"
@@ -139,4 +138,3 @@ const struct file_operations erofs_dir_fops = { .read = generic_read_dir, .iterate_shared = erofs_readdir, }; - diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index e8d04d808fa6..dcf463c7010c 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -4,7 +4,6 @@ * * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_FS_H #define __EROFS_FS_H @@ -317,4 +316,3 @@ static inline void erofs_check_ondisk_layout_definitions(void) }
#endif - diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 0a94a52a119f..5f9c6fb79ac0 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "xattr.h"
@@ -373,4 +372,3 @@ const struct inode_operations erofs_fast_symlink_iops = { .listxattr = erofs_listxattr, .get_acl = erofs_get_acl, }; - diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index c0eb1145dfde..4ab73f7263c8 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_INTERNAL_H #define __EROFS_INTERNAL_H @@ -401,4 +400,3 @@ static inline void z_erofs_exit_zip_subsystem(void) {} #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#endif /* __EROFS_INTERNAL_H */ - diff --git a/fs/erofs/namei.c b/fs/erofs/namei.c index 5f8cc7346c69..624b1b89867a 100644 --- a/fs/erofs/namei.c +++ b/fs/erofs/namei.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "xattr.h"
@@ -247,4 +246,3 @@ const struct inode_operations erofs_dir_iops = { .listxattr = erofs_listxattr, .get_acl = erofs_get_acl, }; - diff --git a/fs/erofs/super.c b/fs/erofs/super.c index f31a08d86be8..ad760c6c0182 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include <linux/module.h> #include <linux/buffer_head.h> @@ -608,4 +607,3 @@ module_exit(erofs_module_exit); MODULE_DESCRIPTION("Enhanced ROM File System"); MODULE_AUTHOR("Gao Xiang, Chao Yu, Miao Xie, CONSUMER BG, HUAWEI Inc."); MODULE_LICENSE("GPL"); - diff --git a/fs/erofs/tagptr.h b/fs/erofs/tagptr.h index a72897c86744..64ceb7270b5c 100644 --- a/fs/erofs/tagptr.h +++ b/fs/erofs/tagptr.h @@ -1,8 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* * A tagged pointer implementation - * - * Copyright (C) 2018 Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_FS_TAGPTR_H #define __EROFS_FS_TAGPTR_H @@ -107,4 +105,3 @@ tagptr_init(o, cmpxchg(&ptptr->v, o.v, n.v)); }) *ptptr; })
#endif /* __EROFS_FS_TAGPTR_H */ - diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c index 5c11199d753a..f22cfa31a3c3 100644 --- a/fs/erofs/utils.c +++ b/fs/erofs/utils.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "internal.h" #include <linux/pagevec.h> @@ -294,4 +293,3 @@ void erofs_exit_shrinker(void) unregister_shrinker(&erofs_shrinker_info); } #endif /* !CONFIG_EROFS_FS_ZIP */ - diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c index 47314a26767a..8dd54b420a1d 100644 --- a/fs/erofs/xattr.c +++ b/fs/erofs/xattr.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include <linux/security.h> #include "xattr.h" @@ -709,4 +708,3 @@ struct posix_acl *erofs_get_acl(struct inode *inode, int type) return acl; } #endif - diff --git a/fs/erofs/xattr.h b/fs/erofs/xattr.h index 815304bd335f..366dcb400525 100644 --- a/fs/erofs/xattr.h +++ b/fs/erofs/xattr.h @@ -2,7 +2,6 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_XATTR_H #define __EROFS_XATTR_H diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index da133950d514..38776205cd2f 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "zdata.h" #include "compress.h" @@ -1362,4 +1361,3 @@ const struct address_space_operations z_erofs_aops = { .readpage = z_erofs_readpage, .readahead = z_erofs_readahead, }; - diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h index 68c9b29fc0ca..68da309d5ad7 100644 --- a/fs/erofs/zdata.h +++ b/fs/erofs/zdata.h @@ -2,7 +2,6 @@ /* * Copyright (C) 2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_FS_ZDATA_H #define __EROFS_FS_ZDATA_H diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index a5537a9f8f36..6b9987205c90 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -2,7 +2,6 @@ /* * Copyright (C) 2018-2019 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #include "internal.h" #include <asm/unaligned.h> @@ -476,4 +475,3 @@ int z_erofs_map_blocks_iter(struct inode *inode, DBG_BUGON(err < 0 && err != -ENOMEM); return err; } - diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h index 52898176ef31..b05464f4a808 100644 --- a/fs/erofs/zpvec.h +++ b/fs/erofs/zpvec.h @@ -2,7 +2,6 @@ /* * Copyright (C) 2018 HUAWEI, Inc. * https://www.huawei.com/ - * Created by Gao Xiang gaoxiang25@huawei.com */ #ifndef __EROFS_FS_ZPVEC_H #define __EROFS_FS_ZPVEC_H @@ -158,4 +157,3 @@ z_erofs_pagevec_dequeue(struct z_erofs_pagevec_ctor *ctor, return tagptr_unfold_ptr(t); } #endif -
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/efa097b6d8c0014a5994e4959e56eff...
--------------------------------
ANBZ: #1666
commit 2a9dc7a8fec6ca287e2c038f9441e24269e10b5f upstream.
Currently, uncompressed data except for tail-packing inline is consecutive on disk.
In order to support chunk-based data deduplication, add a new corresponding inode data layout.
In the future, the data source of chunks can be either (un)compressed.
Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.co... Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- Documentation/filesystems/erofs.rst | 16 ++++++++-- fs/erofs/erofs_fs.h | 48 +++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst index bf145171c2bf..ed2d9b33a94f 100644 --- a/Documentation/filesystems/erofs.rst +++ b/Documentation/filesystems/erofs.rst @@ -153,13 +153,14 @@ may not. All metadatas can be now observed in two different spaces (views):
Xattrs, extents, data inline are followed by the corresponding inode with proper alignment, and they could be optional for different data mappings. - _currently_ total 4 valid data mappings are supported: + _currently_ total 5 data layouts are supported:
== ==================================================================== 0 flat file data without data inline (no extent); 1 fixed-sized output data compression (with non-compacted indexes); 2 flat file data with tail packing data inline (no extent); - 3 fixed-sized output data compression (with compacted indexes, v5.3+). + 3 fixed-sized output data compression (with compacted indexes, v5.3+); + 4 chunk-based file (v5.15+). == ====================================================================
The size of the optional xattrs is indicated by i_xattr_count in inode @@ -211,6 +212,17 @@ Note that apart from the offset of the first filename, nameoff0 also indicates the total number of directory entries in this block since it is no need to introduce another on-disk field at all.
+Chunk-based file +---------------- +In order to support chunk-based data deduplication, a new inode data layout has +been supported since Linux v5.15: Files are split in equal-sized data chunks +with ``extents`` area of the inode metadata indicating how to get the chunk +data: these can be simply as a 4-byte block address array or in the 8-byte +chunk index form (see struct erofs_inode_chunk_index in erofs_fs.h for more +details.) + +By the way, chunk-based files are all uncompressed for now. + Compression ----------- Currently, EROFS supports 4KB fixed-sized output transparent file compression, diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index dcf463c7010c..a13f17d33da8 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -4,6 +4,7 @@ * * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021, Alibaba Cloud */ #ifndef __EROFS_FS_H #define __EROFS_FS_H @@ -17,7 +18,10 @@ * be incompatible with this kernel version. */ #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001 -#define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING +#define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004 +#define EROFS_ALL_FEATURE_INCOMPAT \ + (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \ + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
/* 128-byte erofs on-disk super block */ struct erofs_super_block { @@ -51,13 +55,16 @@ struct erofs_super_block { * inode, [xattrs], last_inline_data, ... | ... | no-holed data * 3 - inode compression D: * inode, [xattrs], map_header, extents ... | ... - * 4~7 - reserved + * 4 - inode chunk-based E: + * inode, [xattrs], chunk indexes ... | ... + * 5~7 - reserved */ enum { EROFS_INODE_FLAT_PLAIN = 0, EROFS_INODE_FLAT_COMPRESSION_LEGACY = 1, EROFS_INODE_FLAT_INLINE = 2, EROFS_INODE_FLAT_COMPRESSION = 3, + EROFS_INODE_CHUNK_BASED = 4, EROFS_INODE_DATALAYOUT_MAX };
@@ -77,6 +84,19 @@ static inline bool erofs_inode_is_data_compressed(unsigned int datamode) #define EROFS_I_ALL \ ((1 << (EROFS_I_DATALAYOUT_BIT + EROFS_I_DATALAYOUT_BITS)) - 1)
+/* indicate chunk blkbits, thus 'chunksize = blocksize << chunk blkbits' */ +#define EROFS_CHUNK_FORMAT_BLKBITS_MASK 0x001F +/* with chunk indexes or just a 4-byte blkaddr array */ +#define EROFS_CHUNK_FORMAT_INDEXES 0x0020 + +#define EROFS_CHUNK_FORMAT_ALL \ + (EROFS_CHUNK_FORMAT_BLKBITS_MASK | EROFS_CHUNK_FORMAT_INDEXES) + +struct erofs_inode_chunk_info { + __le16 format; /* chunk blkbits, etc. */ + __le16 reserved; +}; + /* 32-byte reduced form of an ondisk inode */ struct erofs_inode_compact { __le16 i_format; /* inode format hints */ @@ -94,6 +114,9 @@ struct erofs_inode_compact {
/* for device files, used to indicate old/new device # */ __le32 rdev; + + /* for chunk-based files, it contains the summary info */ + struct erofs_inode_chunk_info c; } i_u; __le32 i_ino; /* only used for 32-bit stat compatibility */ __le16 i_uid; @@ -122,6 +145,9 @@ struct erofs_inode_extended {
/* for device files, used to indicate old/new device # */ __le32 rdev; + + /* for chunk-based files, it contains the summary info */ + struct erofs_inode_chunk_info c; } i_u;
/* only used for 32-bit stat compatibility */ @@ -191,6 +217,19 @@ static inline unsigned int erofs_xattr_entry_size(struct erofs_xattr_entry *e) e->e_name_len + le16_to_cpu(e->e_value_size)); }
+/* represent a zeroed chunk (hole) */ +#define EROFS_NULL_ADDR -1 + +/* 4-byte block address array */ +#define EROFS_BLOCK_MAP_ENTRY_SIZE sizeof(__le32) + +/* 8-byte inode chunk indexes */ +struct erofs_inode_chunk_index { + __le16 advise; /* always 0, don't care for now */ + __le16 device_id; /* back-end storage id, always 0 for now */ + __le32 blkaddr; /* start block address of this inode chunk */ +}; + /* available compression algorithm types (for h_algorithmtype) */ enum { Z_EROFS_COMPRESSION_LZ4 = 0, @@ -307,9 +346,14 @@ static inline void erofs_check_ondisk_layout_definitions(void) BUILD_BUG_ON(sizeof(struct erofs_inode_extended) != 64); BUILD_BUG_ON(sizeof(struct erofs_xattr_ibody_header) != 12); BUILD_BUG_ON(sizeof(struct erofs_xattr_entry) != 4); + BUILD_BUG_ON(sizeof(struct erofs_inode_chunk_info) != 4); + BUILD_BUG_ON(sizeof(struct erofs_inode_chunk_index) != 8); BUILD_BUG_ON(sizeof(struct z_erofs_map_header) != 8); BUILD_BUG_ON(sizeof(struct z_erofs_vle_decompressed_index) != 8); BUILD_BUG_ON(sizeof(struct erofs_dirent) != 12); + /* keep in sync between 2 index structures for better extendibility */ + BUILD_BUG_ON(sizeof(struct erofs_inode_chunk_index) != + sizeof(struct z_erofs_vle_decompressed_index));
BUILD_BUG_ON(BIT(Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS) < Z_EROFS_VLE_CLUSTER_TYPE_MAX - 1);
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/c5305cc0a9c206b10825c1b00b6f84c...
--------------------------------
ANBZ: #1666
commit c5aa903a59db274554718cddfda9039913409ec9 upstream.
Add runtime support for chunk-based uncompressed files described in the previous patch.
Link: https://lore.kernel.org/r/20210820100019.208490-2-hsiangkao@linux.alibaba.co... Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 92 +++++++++++++++++++++++++++++++++++++++------ fs/erofs/inode.c | 18 ++++++++- fs/erofs/internal.h | 5 +++ 3 files changed, 103 insertions(+), 12 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 60f74fad069b..deed631dad0e 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -2,6 +2,7 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021, Alibaba Cloud */ #include "internal.h" #include <linux/prefetch.h> @@ -59,13 +60,6 @@ static int erofs_map_blocks_flatmode(struct inode *inode, nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE); lastblk = nblocks - tailendpacking;
- if (offset >= inode->i_size) { - /* leave out-of-bound access unmapped */ - map->m_flags = 0; - map->m_plen = 0; - goto out; - } - /* there is no hole in flatmode */ map->m_flags = EROFS_MAP_MAPPED;
@@ -100,14 +94,90 @@ static int erofs_map_blocks_flatmode(struct inode *inode, goto err_out; }
-out: map->m_llen = map->m_plen; - err_out: trace_erofs_map_blocks_flatmode_exit(inode, map, flags, 0); return err; }
+static int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags) +{ + struct super_block *sb = inode->i_sb; + struct erofs_inode *vi = EROFS_I(inode); + struct erofs_inode_chunk_index *idx; + struct page *page; + u64 chunknr; + unsigned int unit; + erofs_off_t pos; + int err = 0; + + if (map->m_la >= inode->i_size) { + /* leave out-of-bound access unmapped */ + map->m_flags = 0; + map->m_plen = 0; + goto out; + } + + if (vi->datalayout != EROFS_INODE_CHUNK_BASED) + return erofs_map_blocks_flatmode(inode, map, flags); + + if (vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES) + unit = sizeof(*idx); /* chunk index */ + else + unit = EROFS_BLOCK_MAP_ENTRY_SIZE; /* block map */ + + chunknr = map->m_la >> vi->chunkbits; + pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize + + vi->xattr_isize, unit) + unit * chunknr; + + page = erofs_get_meta_page(inode->i_sb, erofs_blknr(pos)); + if (IS_ERR(page)) + return PTR_ERR(page); + + map->m_la = chunknr << vi->chunkbits; + map->m_plen = min_t(erofs_off_t, 1UL << vi->chunkbits, + roundup(inode->i_size - map->m_la, EROFS_BLKSIZ)); + + /* handle block map */ + if (!(vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES)) { + __le32 *blkaddr = page_address(page) + erofs_blkoff(pos); + + if (le32_to_cpu(*blkaddr) == EROFS_NULL_ADDR) { + map->m_flags = 0; + } else { + map->m_pa = blknr_to_addr(le32_to_cpu(*blkaddr)); + map->m_flags = EROFS_MAP_MAPPED; + } + goto out_unlock; + } + /* parse chunk indexes */ + idx = page_address(page) + erofs_blkoff(pos); + switch (le32_to_cpu(idx->blkaddr)) { + case EROFS_NULL_ADDR: + map->m_flags = 0; + break; + default: + /* only one device is supported for now */ + if (idx->device_id) { + erofs_err(sb, "invalid device id %u @ %llu for nid %llu", + le16_to_cpu(idx->device_id), + chunknr, vi->nid); + err = -EFSCORRUPTED; + goto out_unlock; + } + map->m_pa = blknr_to_addr(le32_to_cpu(idx->blkaddr)); + map->m_flags = EROFS_MAP_MAPPED; + break; + } +out_unlock: + unlock_page(page); + put_page(page); +out: + map->m_llen = map->m_plen; + return err; +} + static inline struct bio *erofs_read_raw_page(struct bio *bio, struct address_space *mapping, struct page *page, @@ -143,7 +213,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, erofs_blk_t blknr; unsigned int blkoff;
- err = erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW); + err = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); if (err) goto err_out;
@@ -310,7 +380,7 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block) return 0; }
- if (!erofs_map_blocks_flatmode(inode, &map, EROFS_GET_BLOCKS_RAW)) + if (!erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW)) return erofs_blknr(map.m_pa);
return 0; diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 5f9c6fb79ac0..ec2f7aac1085 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -2,6 +2,7 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021, Alibaba Cloud */ #include "xattr.h"
@@ -122,7 +123,9 @@ static struct page *erofs_read_inode(struct inode *inode, /* total blocks for compressed files */ if (erofs_inode_is_data_compressed(vi->datalayout)) nblks = le32_to_cpu(die->i_u.compressed_blocks); - + else if (vi->datalayout == EROFS_INODE_CHUNK_BASED) + /* fill chunked inode summary info */ + vi->chunkformat = le16_to_cpu(die->i_u.c.format); kfree(copied); break; case EROFS_INODE_LAYOUT_COMPACT: @@ -160,6 +163,8 @@ static struct page *erofs_read_inode(struct inode *inode, inode->i_size = le32_to_cpu(dic->i_size); if (erofs_inode_is_data_compressed(vi->datalayout)) nblks = le32_to_cpu(dic->i_u.compressed_blocks); + else if (vi->datalayout == EROFS_INODE_CHUNK_BASED) + vi->chunkformat = le16_to_cpu(dic->i_u.c.format); break; default: erofs_err(inode->i_sb, @@ -169,6 +174,17 @@ static struct page *erofs_read_inode(struct inode *inode, goto err_out; }
+ if (vi->datalayout == EROFS_INODE_CHUNK_BASED) { + if (!(vi->chunkformat & EROFS_CHUNK_FORMAT_ALL)) { + erofs_err(inode->i_sb, + "unsupported chunk format %x of nid %llu", + vi->chunkformat, vi->nid); + err = -EOPNOTSUPP; + goto err_out; + } + vi->chunkbits = LOG_BLOCK_SIZE + + (vi->chunkformat & EROFS_CHUNK_FORMAT_BLKBITS_MASK); + } inode->i_mtime.tv_sec = inode->i_ctime.tv_sec; inode->i_atime.tv_sec = inode->i_ctime.tv_sec; inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4ab73f7263c8..83d6c01575f8 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -2,6 +2,7 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021, Alibaba Cloud */ #ifndef __EROFS_INTERNAL_H #define __EROFS_INTERNAL_H @@ -209,6 +210,10 @@ struct erofs_inode {
union { erofs_blk_t raw_blkaddr; + struct { + unsigned short chunkformat; + unsigned char chunkbits; + }; #ifdef CONFIG_EROFS_FS_ZIP struct { unsigned short z_advise;
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/6dcb63ca8ce52fb63d736624f9b8e01...
--------------------------------
ANBZ: #1666
commit 1266b4a7ecb679587dc4d098abe56ea53313d569 upstream.
Dan reported a new smatch warning [1] "fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'"
Due to new chunk-based format handling logic, the error path can be called after kfree(copied).
Set "copied = NULL" after kfree(copied) to fix this.
[1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com
Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/inode.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index ec2f7aac1085..24b2e5da0893 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -127,6 +127,7 @@ static struct page *erofs_read_inode(struct inode *inode, /* fill chunked inode summary info */ vi->chunkformat = le16_to_cpu(die->i_u.c.format); kfree(copied); + copied = NULL; break; case EROFS_INODE_LAYOUT_COMPACT: vi->inode_isize = sizeof(struct erofs_inode_compact);
From: Gao Xiang hsiangkao@redhat.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/005f1649d3387a5ae14da062bfb1b92...
--------------------------------
ANBZ: #1666
commit de06a6a375414be03ce5b1054f2d836591923a1d upstream.
Introduce erofs_sb_has_xxx() to make long checks short, especially for later big pcluster & LZMA features.
Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang hsiangkao@redhat.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/decompressor.c | 3 +-- fs/erofs/internal.h | 9 +++++++++ fs/erofs/super.c | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 964b074d0182..45be046c322f 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -133,8 +133,7 @@ static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out, support_0padding = false;
/* decompression inplace is only safe when 0padding is enabled */ - if (EROFS_SB(rq->sb)->feature_incompat & - EROFS_FEATURE_INCOMPAT_LZ4_0PADDING) { + if (erofs_sb_has_lz4_0padding(EROFS_SB(rq->sb))) { support_0padding = true;
while (!src[inputmargin & ~PAGE_MASK]) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 83d6c01575f8..1406a897938c 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -187,6 +187,15 @@ static inline erofs_off_t iloc(struct erofs_sb_info *sbi, erofs_nid_t nid) return blknr_to_addr(sbi->meta_blkaddr) + (nid << sbi->islotbits); }
+#define EROFS_FEATURE_FUNCS(name, compat, feature) \ +static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \ +{ \ + return sbi->feature_##compat & EROFS_FEATURE_##feature; \ +} + +EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) +EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM) + /* atomic flag definitions */ #define EROFS_I_EA_INITED_BIT 0 #define EROFS_I_Z_INITED_BIT 1 diff --git a/fs/erofs/super.c b/fs/erofs/super.c index ad760c6c0182..299bd0865c6d 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -148,7 +148,7 @@ static int erofs_read_superblock(struct super_block *sb) }
sbi->feature_compat = le32_to_cpu(dsb->feature_compat); - if (sbi->feature_compat & EROFS_FEATURE_COMPAT_SB_CHKSUM) { + if (erofs_sb_has_sb_chksum(sbi)) { ret = erofs_superblock_csum_verify(sb, data); if (ret) goto out;
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/88c61829ed8ca6bbb996831f2110b08...
--------------------------------
ANBZ: #1666
commit e62424651f43cb37e17ca26a7ee9ee42675f24bd upstream.
Previously, EROFS mount options are all in the basic types, so erofs_fs_context can be directly copied with assignment. However, when the multiple device feature is introduced, it's hard to handle multiple device information like the other basic mount options.
Let's separate basic mount option usage from fs_context, thus multiple device information can be handled gracefully then.
No logic changes.
Link: https://lore.kernel.org/r/20211007070224.12833-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/internal.h | 16 ++++++++++------ fs/erofs/super.c | 40 +++++++++++++++++++--------------------- fs/erofs/xattr.c | 4 ++-- fs/erofs/zdata.c | 4 ++-- 4 files changed, 33 insertions(+), 31 deletions(-)
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 1406a897938c..eeb7adc500c5 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -46,7 +46,7 @@ typedef u64 erofs_off_t; /* data type for filesystem-wide blocks number */ typedef u32 erofs_blk_t;
-struct erofs_fs_context { +struct erofs_mount_opts { #ifdef CONFIG_EROFS_FS_ZIP /* current strategy of how to use managed cache */ unsigned char cache_strategy; @@ -57,7 +57,13 @@ struct erofs_fs_context { unsigned int mount_opt; };
+struct erofs_fs_context { + struct erofs_mount_opts opt; +}; + struct erofs_sb_info { + struct erofs_mount_opts opt; /* options */ + #ifdef CONFIG_EROFS_FS_ZIP /* list for all registered superblocks, mainly for shrinker */ struct list_head list; @@ -92,8 +98,6 @@ struct erofs_sb_info { u8 volume_name[16]; /* volume name */ u32 feature_compat; u32 feature_incompat; - - struct erofs_fs_context ctx; /* options */ };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) @@ -103,9 +107,9 @@ struct erofs_sb_info { #define EROFS_MOUNT_XATTR_USER 0x00000010 #define EROFS_MOUNT_POSIX_ACL 0x00000020
-#define clear_opt(ctx, option) ((ctx)->mount_opt &= ~EROFS_MOUNT_##option) -#define set_opt(ctx, option) ((ctx)->mount_opt |= EROFS_MOUNT_##option) -#define test_opt(ctx, option) ((ctx)->mount_opt & EROFS_MOUNT_##option) +#define clear_opt(opt, option) ((opt)->mount_opt &= ~EROFS_MOUNT_##option) +#define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option) +#define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)
enum { EROFS_ZIP_CACHE_DISABLED, diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 299bd0865c6d..9747fb0d6ca2 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -198,14 +198,14 @@ static int erofs_read_superblock(struct super_block *sb) static void erofs_default_options(struct erofs_fs_context *ctx) { #ifdef CONFIG_EROFS_FS_ZIP - ctx->cache_strategy = EROFS_ZIP_CACHE_READAROUND; - ctx->max_sync_decompress_pages = 3; + ctx->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND; + ctx->opt.max_sync_decompress_pages = 3; #endif #ifdef CONFIG_EROFS_FS_XATTR - set_opt(ctx, XATTR_USER); + set_opt(&ctx->opt, XATTR_USER); #endif #ifdef CONFIG_EROFS_FS_POSIX_ACL - set_opt(ctx, POSIX_ACL); + set_opt(&ctx->opt, POSIX_ACL); #endif }
@@ -246,9 +246,9 @@ static int erofs_fc_parse_param(struct fs_context *fc, case Opt_user_xattr: #ifdef CONFIG_EROFS_FS_XATTR if (result.boolean) - set_opt(ctx, XATTR_USER); + set_opt(&ctx->opt, XATTR_USER); else - clear_opt(ctx, XATTR_USER); + clear_opt(&ctx->opt, XATTR_USER); #else errorfc(fc, "{,no}user_xattr options not supported"); #endif @@ -256,16 +256,16 @@ static int erofs_fc_parse_param(struct fs_context *fc, case Opt_acl: #ifdef CONFIG_EROFS_FS_POSIX_ACL if (result.boolean) - set_opt(ctx, POSIX_ACL); + set_opt(&ctx->opt, POSIX_ACL); else - clear_opt(ctx, POSIX_ACL); + clear_opt(&ctx->opt, POSIX_ACL); #else errorfc(fc, "{,no}acl options not supported"); #endif break; case Opt_cache_strategy: #ifdef CONFIG_EROFS_FS_ZIP - ctx->cache_strategy = result.uint_32; + ctx->opt.cache_strategy = result.uint_32; #else errorfc(fc, "compression not supported, cache_strategy ignored"); #endif @@ -354,6 +354,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) return -ENOMEM;
sb->s_fs_info = sbi; + sbi->opt = ctx->opt; err = erofs_read_superblock(sb); if (err) return err; @@ -365,13 +366,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_op = &erofs_sops; sb->s_xattr = erofs_xattr_handlers;
- if (test_opt(ctx, POSIX_ACL)) + if (test_opt(&sbi->opt, POSIX_ACL)) sb->s_flags |= SB_POSIXACL; else sb->s_flags &= ~SB_POSIXACL;
- sbi->ctx = *ctx; - #ifdef CONFIG_EROFS_FS_ZIP xa_init(&sbi->managed_pslots); #endif @@ -415,12 +414,12 @@ static int erofs_fc_reconfigure(struct fs_context *fc)
DBG_BUGON(!sb_rdonly(sb));
- if (test_opt(ctx, POSIX_ACL)) + if (test_opt(&ctx->opt, POSIX_ACL)) fc->sb_flags |= SB_POSIXACL; else fc->sb_flags &= ~SB_POSIXACL;
- sbi->ctx = *ctx; + sbi->opt = ctx->opt;
fc->sb_flags |= SB_RDONLY; return 0; @@ -448,7 +447,6 @@ static int erofs_init_fs_context(struct fs_context *fc) erofs_default_options(fc->fs_private);
fc->ops = &erofs_context_ops; - return 0; }
@@ -568,26 +566,26 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf) static int erofs_show_options(struct seq_file *seq, struct dentry *root) { struct erofs_sb_info *sbi __maybe_unused = EROFS_SB(root->d_sb); - struct erofs_fs_context *ctx __maybe_unused = &sbi->ctx; + struct erofs_mount_opts *opt __maybe_unused = &sbi->opt;
#ifdef CONFIG_EROFS_FS_XATTR - if (test_opt(ctx, XATTR_USER)) + if (test_opt(opt, XATTR_USER)) seq_puts(seq, ",user_xattr"); else seq_puts(seq, ",nouser_xattr"); #endif #ifdef CONFIG_EROFS_FS_POSIX_ACL - if (test_opt(ctx, POSIX_ACL)) + if (test_opt(opt, POSIX_ACL)) seq_puts(seq, ",acl"); else seq_puts(seq, ",noacl"); #endif #ifdef CONFIG_EROFS_FS_ZIP - if (ctx->cache_strategy == EROFS_ZIP_CACHE_DISABLED) + if (opt->cache_strategy == EROFS_ZIP_CACHE_DISABLED) seq_puts(seq, ",cache_strategy=disabled"); - else if (ctx->cache_strategy == EROFS_ZIP_CACHE_READAHEAD) + else if (opt->cache_strategy == EROFS_ZIP_CACHE_READAHEAD) seq_puts(seq, ",cache_strategy=readahead"); - else if (ctx->cache_strategy == EROFS_ZIP_CACHE_READAROUND) + else if (opt->cache_strategy == EROFS_ZIP_CACHE_READAROUND) seq_puts(seq, ",cache_strategy=readaround"); #endif return 0; diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c index 8dd54b420a1d..59148ffc0c1f 100644 --- a/fs/erofs/xattr.c +++ b/fs/erofs/xattr.c @@ -429,7 +429,7 @@ static int shared_getxattr(struct inode *inode, struct getxattr_iter *it)
static bool erofs_xattr_user_list(struct dentry *dentry) { - return test_opt(&EROFS_SB(dentry->d_sb)->ctx, XATTR_USER); + return test_opt(&EROFS_SB(dentry->d_sb)->opt, XATTR_USER); }
static bool erofs_xattr_trusted_list(struct dentry *dentry) @@ -476,7 +476,7 @@ static int erofs_xattr_generic_get(const struct xattr_handler *handler,
switch (handler->flags) { case EROFS_XATTR_INDEX_USER: - if (!test_opt(&sbi->ctx, XATTR_USER)) + if (!test_opt(&sbi->opt, XATTR_USER)) return -EOPNOTSUPP; break; case EROFS_XATTR_INDEX_TRUSTED: diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 38776205cd2f..17b7c8ecf9a5 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -611,7 +611,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, goto err_out;
/* preload all compressed pages (maybe downgrade role if necessary) */ - if (should_alloc_managed_pages(fe, sbi->ctx.cache_strategy, map->m_la)) + if (should_alloc_managed_pages(fe, sbi->opt.cache_strategy, map->m_la)) cache_strategy = DELAYEDALLOC; else cache_strategy = DONTALLOC; @@ -1307,7 +1307,7 @@ static void z_erofs_readahead(struct readahead_control *rac) struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
unsigned int nr_pages = readahead_count(rac); - bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages); + bool sync = (nr_pages <= sbi->opt.max_sync_decompress_pages); struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode); struct page *page, *head = NULL; LIST_HEAD(pagepool);
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/72920d6cd9da7f1cdb45f1c21025a90...
--------------------------------
ANBZ: #1666
commit dfeab2e95a75a424adf39992ac62dcb9e9517d4a upstream.
In order to support multi-layer container images, add multiple device feature to EROFS. Two ways are available to use for now:
- Devices can be mapped into 32-bit global block address space; - Device ID can be specified with the chunk indexes format.
Note that it assumes no extent would cross device boundary and mkfs should take care of it seriously.
In the future, a dedicated device manager could be introduced then thus extra devices can be automatically scanned by UUID as well.
Link: https://lore.kernel.org/r/20211014081010.43485-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com [ Huang Jianan: remove dax and iomap related code based on 5.10. ] Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- Documentation/filesystems/erofs.rst | 12 ++- fs/erofs/Kconfig | 26 +++-- fs/erofs/data.c | 68 ++++++++++--- fs/erofs/erofs_fs.h | 25 ++++- fs/erofs/internal.h | 34 ++++++- fs/erofs/super.c | 153 ++++++++++++++++++++++++++-- fs/erofs/zdata.c | 20 +++- 7 files changed, 291 insertions(+), 47 deletions(-)
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst index ed2d9b33a94f..693d7a32845a 100644 --- a/Documentation/filesystems/erofs.rst +++ b/Documentation/filesystems/erofs.rst @@ -19,9 +19,10 @@ It is designed as a better filesystem solution for the following scenarios: immutable and bit-for-bit identical to the official golden image for their releases due to security and other considerations and
- - hope to save some extra storage space with guaranteed end-to-end performance - by using reduced metadata and transparent file compression, especially - for those embedded devices with limited memory (ex, smartphone); + - hope to minimize extra storage space with guaranteed end-to-end performance + by using compact layout, transparent file compression and direct access, + especially for those embedded devices with limited memory and high-density + hosts with numerous containers;
Here is the main features of EROFS:
@@ -51,7 +52,9 @@ Here is the main features of EROFS: - Support POSIX.1e ACLs by using xattrs;
- Support transparent file compression as an option: - LZ4 algorithm with 4 KB fixed-sized output compression for high performance. + LZ4 algorithm with 4 KB fixed-sized output compression for high performance; + + - Multiple device support for multi-layer container images.
The following git tree provides the file system user-space tools under development (ex, formatting tool mkfs.erofs): @@ -84,6 +87,7 @@ cache_strategy=%s Select a strategy for cached decompression from now on: It still does in-place I/O decompression for the rest compressed physical clusters. ========== ============================================= +device=%s Specify a path to an extra device to be used together. =================== =========================================================
On-disk details diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index f68447fbd246..4dc00a2320e4 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -5,16 +5,22 @@ config EROFS_FS depends on BLOCK select LIBCRC32C help - EROFS (Enhanced Read-Only File System) is a lightweight - read-only file system with modern designs (eg. page-sized - blocks, inline xattrs/data, etc.) for scenarios which need - high-performance read-only requirements, e.g. Android OS - for mobile phones and LIVECDs. - - It also provides fixed-sized output compression support, - which improves storage density, keeps relatively higher - compression ratios, which is more useful to achieve high - performance for embedded devices with limited memory. + EROFS (Enhanced Read-Only File System) is a lightweight read-only + file system with modern designs (e.g. no buffer heads, inline + xattrs/data, chunk-based deduplication, multiple devices, etc.) for + scenarios which need high-performance read-only solutions, e.g. + smartphones with Android OS, LiveCDs and high-density hosts with + numerous containers; + + It also provides fixed-sized output compression support in order to + improve storage density as well as keep relatively higher compression + ratios and implements in-place decompression to reuse the file page + for compressed data temporarily with proper strategies, which is + quite useful to ensure guaranteed end-to-end runtime decompression + performance under extremely memory pressure without extra cost. + + See the documentation at file:Documentation/filesystems/erofs.rst + for more details.
If unsure, say N.
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index deed631dad0e..0e897cfe96ca 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -112,6 +112,7 @@ static int erofs_map_blocks(struct inode *inode, erofs_off_t pos; int err = 0;
+ map->m_deviceid = 0; if (map->m_la >= inode->i_size) { /* leave out-of-bound access unmapped */ map->m_flags = 0; @@ -158,14 +159,8 @@ static int erofs_map_blocks(struct inode *inode, map->m_flags = 0; break; default: - /* only one device is supported for now */ - if (idx->device_id) { - erofs_err(sb, "invalid device id %u @ %llu for nid %llu", - le16_to_cpu(idx->device_id), - chunknr, vi->nid); - err = -EFSCORRUPTED; - goto out_unlock; - } + map->m_deviceid = le16_to_cpu(idx->device_id) & + EROFS_SB(sb)->device_id_mask; map->m_pa = blknr_to_addr(le32_to_cpu(idx->blkaddr)); map->m_flags = EROFS_MAP_MAPPED; break; @@ -178,6 +173,46 @@ static int erofs_map_blocks(struct inode *inode, return err; }
+int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) +{ + struct erofs_dev_context *devs = EROFS_SB(sb)->devs; + struct erofs_device_info *dif; + int id; + + /* primary device by default */ + map->m_bdev = sb->s_bdev; + + if (map->m_deviceid) { + down_read(&devs->rwsem); + dif = idr_find(&devs->tree, map->m_deviceid - 1); + if (!dif) { + up_read(&devs->rwsem); + return -ENODEV; + } + map->m_bdev = dif->bdev; + up_read(&devs->rwsem); + } else if (devs->extra_devices) { + down_read(&devs->rwsem); + idr_for_each_entry(&devs->tree, dif, id) { + erofs_off_t startoff, length; + + if (!dif->mapped_blkaddr) + continue; + startoff = blknr_to_addr(dif->mapped_blkaddr); + length = blknr_to_addr(dif->blocks); + + if (map->m_pa >= startoff && + map->m_pa < startoff + length) { + map->m_pa -= startoff; + map->m_bdev = dif->bdev; + break; + } + } + up_read(&devs->rwsem); + } + return 0; +} + static inline struct bio *erofs_read_raw_page(struct bio *bio, struct address_space *mapping, struct page *page, @@ -210,6 +245,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, struct erofs_map_blocks map = { .m_la = blknr_to_addr(current_block), }; + struct erofs_map_dev mdev; erofs_blk_t blknr; unsigned int blkoff;
@@ -217,6 +253,14 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, if (err) goto err_out;
+ mdev = (struct erofs_map_dev) { + .m_deviceid = map.m_deviceid, + .m_pa = map.m_pa, + }; + err = erofs_map_dev(sb, &mdev); + if (err) + goto err_out; + /* zero out the holed page */ if (!(map.m_flags & EROFS_MAP_MAPPED)) { zero_user_segment(page, 0, PAGE_SIZE); @@ -229,8 +273,8 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* for RAW access mode, m_plen must be equal to m_llen */ DBG_BUGON(map.m_plen != map.m_llen);
- blknr = erofs_blknr(map.m_pa); - blkoff = erofs_blkoff(map.m_pa); + blknr = erofs_blknr(mdev.m_pa); + blkoff = erofs_blkoff(mdev.m_pa);
/* deal with inline page */ if (map.m_flags & EROFS_MAP_META) { @@ -239,7 +283,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
DBG_BUGON(map.m_plen > PAGE_SIZE);
- ipage = erofs_get_meta_page(inode->i_sb, blknr); + ipage = erofs_get_meta_page(sb, blknr);
if (IS_ERR(ipage)) { err = PTR_ERR(ipage); @@ -275,7 +319,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, bio = bio_alloc(GFP_NOIO, nblocks);
bio->bi_end_io = erofs_readendio; - bio_set_dev(bio, sb->s_bdev); + bio_set_dev(bio, mdev.m_bdev); bio->bi_iter.bi_sector = (sector_t)blknr << LOG_SECTORS_PER_BLOCK; bio->bi_opf = REQ_OP_READ | (ra ? REQ_RAHEAD : 0); diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index a13f17d33da8..b15a62bf2c46 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -19,9 +19,24 @@ */ #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004 +#define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE 0x00000008 #define EROFS_ALL_FEATURE_INCOMPAT \ (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \ - EROFS_FEATURE_INCOMPAT_CHUNKED_FILE) + EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \ + EROFS_FEATURE_INCOMPAT_DEVICE_TABLE) + +#define EROFS_SB_EXTSLOT_SIZE 16 + +struct erofs_deviceslot { + union { + u8 uuid[16]; /* used for device manager later */ + u8 userdata[64]; /* digest(sha256), etc. */ + } u; + __le32 blocks; /* total fs blocks of this device */ + __le32 mapped_blkaddr; /* map starting at mapped_blkaddr */ + u8 reserved[56]; +}; +#define EROFS_DEVT_SLOT_SIZE sizeof(struct erofs_deviceslot)
/* 128-byte erofs on-disk super block */ struct erofs_super_block { @@ -42,7 +57,10 @@ struct erofs_super_block { __u8 uuid[16]; /* 128-bit uuid for volume */ __u8 volume_name[16]; /* volume name */ __le32 feature_incompat; - __u8 reserved2[44]; + __le16 reserved2; + __le16 extra_devices; /* # of devices besides the primary device */ + __le16 devt_slotoff; /* startoff = devt_slotoff * devt_slotsize */ + __u8 reserved3[38]; };
/* @@ -226,7 +244,7 @@ static inline unsigned int erofs_xattr_entry_size(struct erofs_xattr_entry *e) /* 8-byte inode chunk indexes */ struct erofs_inode_chunk_index { __le16 advise; /* always 0, don't care for now */ - __le16 device_id; /* back-end storage id, always 0 for now */ + __le16 device_id; /* back-end storage id (with bits masked) */ __le32 blkaddr; /* start block address of this inode chunk */ };
@@ -354,6 +372,7 @@ static inline void erofs_check_ondisk_layout_definitions(void) /* keep in sync between 2 index structures for better extendibility */ BUILD_BUG_ON(sizeof(struct erofs_inode_chunk_index) != sizeof(struct z_erofs_vle_decompressed_index)); + BUILD_BUG_ON(sizeof(struct erofs_deviceslot) != 128);
BUILD_BUG_ON(BIT(Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS) < Z_EROFS_VLE_CLUSTER_TYPE_MAX - 1); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index eeb7adc500c5..954a1cfb9509 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -13,6 +13,7 @@ #include <linux/pagemap.h> #include <linux/bio.h> #include <linux/buffer_head.h> +#include <linux/idr.h> #include <linux/magic.h> #include <linux/slab.h> #include <linux/vmalloc.h> @@ -46,6 +47,14 @@ typedef u64 erofs_off_t; /* data type for filesystem-wide blocks number */ typedef u32 erofs_blk_t;
+struct erofs_device_info { + char *path; + struct block_device *bdev; + + u32 blocks; + u32 mapped_blkaddr; +}; + struct erofs_mount_opts { #ifdef CONFIG_EROFS_FS_ZIP /* current strategy of how to use managed cache */ @@ -57,13 +66,20 @@ struct erofs_mount_opts { unsigned int mount_opt; };
+struct erofs_dev_context { + struct idr tree; + struct rw_semaphore rwsem; + + unsigned int extra_devices; +}; + struct erofs_fs_context { struct erofs_mount_opts opt; + struct erofs_dev_context *devs; };
struct erofs_sb_info { struct erofs_mount_opts opt; /* options */ - #ifdef CONFIG_EROFS_FS_ZIP /* list for all registered superblocks, mainly for shrinker */ struct list_head list; @@ -77,11 +93,15 @@ struct erofs_sb_info { /* pseudo inode to manage cached pages */ struct inode *managed_cache; #endif /* CONFIG_EROFS_FS_ZIP */ - u32 blocks; + struct erofs_dev_context *devs; + u64 total_blocks; + u32 primarydevice_blocks; + u32 meta_blkaddr; #ifdef CONFIG_EROFS_FS_XATTR u32 xattr_blkaddr; #endif + u16 device_id_mask; /* valid bits of device id to be used */
/* inode slot unit size in bit shift */ unsigned char islotbits; @@ -198,6 +218,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \ }
EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) +EROFS_FEATURE_FUNCS(device_table, incompat, INCOMPAT_DEVICE_TABLE) EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
/* atomic flag definitions */ @@ -317,6 +338,7 @@ struct erofs_map_blocks { erofs_off_t m_pa, m_la; u64 m_plen, m_llen;
+ unsigned short m_deviceid; unsigned int m_flags;
struct page *mpage; @@ -341,8 +363,16 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode, } #endif /* !CONFIG_EROFS_FS_ZIP */
+struct erofs_map_dev { + struct block_device *m_bdev; + + erofs_off_t m_pa; + unsigned int m_deviceid; +}; + /* data.c */ struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr); +int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
/* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 9747fb0d6ca2..8ba942170b65 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -11,6 +11,7 @@ #include <linux/crc32c.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> +#include <linux/blkdev.h> #include "xattr.h"
#define CREATE_TRACE_POINTS @@ -121,6 +122,78 @@ static bool check_layout_compatibility(struct super_block *sb, return true; }
+static int erofs_init_devices(struct super_block *sb, + struct erofs_super_block *dsb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + unsigned int ondisk_extradevs; + erofs_off_t pos; + struct page *page = NULL; + struct erofs_device_info *dif; + struct erofs_deviceslot *dis; + void *ptr; + int id, err = 0; + + sbi->total_blocks = sbi->primarydevice_blocks; + if (!erofs_sb_has_device_table(sbi)) + ondisk_extradevs = 0; + else + ondisk_extradevs = le16_to_cpu(dsb->extra_devices); + + if (ondisk_extradevs != sbi->devs->extra_devices) { + erofs_err(sb, "extra devices don't match (ondisk %u, given %u)", + ondisk_extradevs, sbi->devs->extra_devices); + return -EINVAL; + } + if (!ondisk_extradevs) + return 0; + + sbi->device_id_mask = roundup_pow_of_two(ondisk_extradevs + 1) - 1; + pos = le16_to_cpu(dsb->devt_slotoff) * EROFS_DEVT_SLOT_SIZE; + down_read(&sbi->devs->rwsem); + idr_for_each_entry(&sbi->devs->tree, dif, id) { + erofs_blk_t blk = erofs_blknr(pos); + struct block_device *bdev; + + if (!page || page->index != blk) { + if (page) { + kunmap(page); + unlock_page(page); + put_page(page); + } + + page = erofs_get_meta_page(sb, blk); + if (IS_ERR(page)) { + up_read(&sbi->devs->rwsem); + return PTR_ERR(page); + } + ptr = kmap(page); + } + dis = ptr + erofs_blkoff(pos); + + bdev = blkdev_get_by_path(dif->path, + FMODE_READ | FMODE_EXCL, + sb->s_type); + if (IS_ERR(bdev)) { + err = PTR_ERR(bdev); + goto err_out; + } + dif->bdev = bdev; + dif->blocks = le32_to_cpu(dis->blocks); + dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr); + sbi->total_blocks += dif->blocks; + pos += EROFS_DEVT_SLOT_SIZE; + } +err_out: + up_read(&sbi->devs->rwsem); + if (page) { + kunmap(page); + unlock_page(page); + put_page(page); + } + return err; +} + static int erofs_read_superblock(struct super_block *sb) { struct erofs_sb_info *sbi; @@ -166,7 +239,7 @@ static int erofs_read_superblock(struct super_block *sb) if (!check_layout_compatibility(sb, dsb)) goto out;
- sbi->blocks = le32_to_cpu(dsb->blocks); + sbi->primarydevice_blocks = le32_to_cpu(dsb->blocks); sbi->meta_blkaddr = le32_to_cpu(dsb->meta_blkaddr); #ifdef CONFIG_EROFS_FS_XATTR sbi->xattr_blkaddr = le32_to_cpu(dsb->xattr_blkaddr); @@ -187,7 +260,9 @@ static int erofs_read_superblock(struct super_block *sb) ret = -EFSCORRUPTED; goto out; } - ret = 0; + + /* handle multiple devices */ + ret = erofs_init_devices(sb, dsb); out: kunmap(page); put_page(page); @@ -213,6 +288,7 @@ enum { Opt_user_xattr, Opt_acl, Opt_cache_strategy, + Opt_device, Opt_err };
@@ -228,15 +304,17 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { fsparam_flag_no("acl", Opt_acl), fsparam_enum("cache_strategy", Opt_cache_strategy, erofs_param_cache_strategy), + fsparam_string("device", Opt_device), {} };
static int erofs_fc_parse_param(struct fs_context *fc, struct fs_parameter *param) { - struct erofs_fs_context *ctx __maybe_unused = fc->fs_private; + struct erofs_fs_context *ctx = fc->fs_private; struct fs_parse_result result; - int opt; + struct erofs_device_info *dif; + int opt, ret;
opt = fs_parse(fc, erofs_fs_parameters, param, &result); if (opt < 0) @@ -270,6 +348,25 @@ static int erofs_fc_parse_param(struct fs_context *fc, errorfc(fc, "compression not supported, cache_strategy ignored"); #endif break; + case Opt_device: + dif = kzalloc(sizeof(*dif), GFP_KERNEL); + if (!dif) + return -ENOMEM; + dif->path = kstrdup(param->string, GFP_KERNEL); + if (!dif->path) { + kfree(dif); + return -ENOMEM; + } + down_write(&ctx->devs->rwsem); + ret = idr_alloc(&ctx->devs->tree, dif, 0, 0, GFP_KERNEL); + up_write(&ctx->devs->rwsem); + if (ret < 0) { + kfree(dif->path); + kfree(dif); + return ret; + } + ++ctx->devs->extra_devices; + break; default: return -ENOPARAM; } @@ -355,6 +452,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi; sbi->opt = ctx->opt; + sbi->devs = ctx->devs; + ctx->devs = NULL; + err = erofs_read_superblock(sb); if (err) return err; @@ -425,9 +525,32 @@ static int erofs_fc_reconfigure(struct fs_context *fc) return 0; }
+static int erofs_release_device_info(int id, void *ptr, void *data) +{ + struct erofs_device_info *dif = ptr; + + if (dif->bdev) + blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL); + kfree(dif->path); + kfree(dif); + return 0; +} + +static void erofs_free_dev_context(struct erofs_dev_context *devs) +{ + if (!devs) + return; + idr_for_each(&devs->tree, &erofs_release_device_info, NULL); + idr_destroy(&devs->tree); + kfree(devs); +} + static void erofs_fc_free(struct fs_context *fc) { - kfree(fc->fs_private); + struct erofs_fs_context *ctx = fc->fs_private; + + erofs_free_dev_context(ctx->devs); + kfree(ctx); }
static const struct fs_context_operations erofs_context_ops = { @@ -439,13 +562,20 @@ static const struct fs_context_operations erofs_context_ops = {
static int erofs_init_fs_context(struct fs_context *fc) { - fc->fs_private = kzalloc(sizeof(struct erofs_fs_context), GFP_KERNEL); - if (!fc->fs_private) - return -ENOMEM; + struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
- /* set default mount options */ - erofs_default_options(fc->fs_private); + if (!ctx) + return -ENOMEM; + ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL); + if (!ctx->devs) { + kfree(ctx); + return -ENOMEM; + } + fc->fs_private = ctx;
+ idr_init(&ctx->devs->tree); + init_rwsem(&ctx->devs->rwsem); + erofs_default_options(ctx); fc->ops = &erofs_context_ops; return 0; } @@ -465,6 +595,7 @@ static void erofs_kill_sb(struct super_block *sb) sbi = EROFS_SB(sb); if (!sbi) return; + erofs_free_dev_context(sbi->devs); kfree(sbi); sb->s_fs_info = NULL; } @@ -551,7 +682,7 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_type = sb->s_magic; buf->f_bsize = EROFS_BLKSIZ; - buf->f_blocks = sbi->blocks; + buf->f_blocks = sbi->total_blocks; buf->f_bfree = buf->f_bavail = 0;
buf->f_files = ULLONG_MAX; diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 17b7c8ecf9a5..eeb286935f8e 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1162,8 +1162,9 @@ static void z_erofs_submit_queue(struct super_block *sb, struct z_erofs_decompressqueue *q[NR_JOBQUEUES]; void *bi_private; z_erofs_next_pcluster_t owned_head = f->clt.owned_head; - /* since bio will be NULL, no need to initialize last_index */ + /* bio is NULL initially, so no need to initialize last_{index,bdev} */ pgoff_t last_index; + struct block_device *last_bdev; unsigned int nr_bios = 0; struct bio *bio = NULL;
@@ -1175,6 +1176,7 @@ static void z_erofs_submit_queue(struct super_block *sb, q[JQ_SUBMIT]->head = owned_head;
do { + struct erofs_map_dev mdev; struct z_erofs_pcluster *pcl; pgoff_t cur, end; unsigned int i = 0; @@ -1186,7 +1188,13 @@ static void z_erofs_submit_queue(struct super_block *sb,
pcl = container_of(owned_head, struct z_erofs_pcluster, next);
- cur = pcl->obj.index; + /* no device id here, thus it will always succeed */ + mdev = (struct erofs_map_dev) { + .m_pa = blknr_to_addr(pcl->obj.index), + }; + (void)erofs_map_dev(sb, &mdev); + + cur = erofs_blknr(mdev.m_pa); end = cur + BIT(pcl->clusterbits);
/* close the main owned chain at first */ @@ -1202,7 +1210,8 @@ static void z_erofs_submit_queue(struct super_block *sb, if (!page) continue;
- if (bio && cur != last_index + 1) { + if (bio && (cur != last_index + 1 || + last_bdev != mdev.m_bdev)) { submit_bio_retry: submit_bio(bio); bio = NULL; @@ -1210,9 +1219,10 @@ static void z_erofs_submit_queue(struct super_block *sb,
if (!bio) { bio = bio_alloc(GFP_NOIO, BIO_MAX_PAGES); - bio->bi_end_io = z_erofs_decompressqueue_endio; - bio_set_dev(bio, sb->s_bdev); + + bio_set_dev(bio, mdev.m_bdev); + last_bdev = mdev.m_bdev; bio->bi_iter.bi_sector = (sector_t)cur << LOG_SECTORS_PER_BLOCK; bio->bi_private = bi_private;
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d5953c04b593dc0fff4807f50d3c2df...
--------------------------------
ANBZ: #1666
commit 469407a3b5ed9390cfacb0363d1cc926a51f6a14 upstream.
Since the new type of chunk-based files is introduced, there is no need to leave flatmode tracepoints.
Rename to erofs_map_blocks instead.
Link: https://lore.kernel.org/r/20211209012918.30337-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu huyue2@yulong.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 39 ++++++++++++++++-------------------- include/trace/events/erofs.h | 4 ++-- 2 files changed, 19 insertions(+), 24 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 0e897cfe96ca..125698c6f6e2 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -49,20 +49,16 @@ static int erofs_map_blocks_flatmode(struct inode *inode, struct erofs_map_blocks *map, int flags) { - int err = 0; erofs_blk_t nblocks, lastblk; u64 offset = map->m_la; struct erofs_inode *vi = EROFS_I(inode); bool tailendpacking = (vi->datalayout == EROFS_INODE_FLAT_INLINE);
- trace_erofs_map_blocks_flatmode_enter(inode, map, flags); - nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE); lastblk = nblocks - tailendpacking;
/* there is no hole in flatmode */ map->m_flags = EROFS_MAP_MAPPED; - if (offset < blknr_to_addr(lastblk)) { map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la; map->m_plen = blknr_to_addr(lastblk) - offset; @@ -74,30 +70,23 @@ static int erofs_map_blocks_flatmode(struct inode *inode, vi->xattr_isize + erofs_blkoff(map->m_la); map->m_plen = inode->i_size - offset;
- /* inline data should be located in one meta block */ - if (erofs_blkoff(map->m_pa) + map->m_plen > PAGE_SIZE) { + /* inline data should be located in the same meta block */ + if (erofs_blkoff(map->m_pa) + map->m_plen > EROFS_BLKSIZ) { erofs_err(inode->i_sb, "inline data cross block boundary @ nid %llu", vi->nid); DBG_BUGON(1); - err = -EFSCORRUPTED; - goto err_out; + return -EFSCORRUPTED; } - map->m_flags |= EROFS_MAP_META; } else { erofs_err(inode->i_sb, "internal error @ nid: %llu (size %llu), m_la 0x%llx", vi->nid, inode->i_size, map->m_la); DBG_BUGON(1); - err = -EIO; - goto err_out; + return -EIO; } - - map->m_llen = map->m_plen; -err_out: - trace_erofs_map_blocks_flatmode_exit(inode, map, flags, 0); - return err; + return 0; }
static int erofs_map_blocks(struct inode *inode, @@ -112,6 +101,7 @@ static int erofs_map_blocks(struct inode *inode, erofs_off_t pos; int err = 0;
+ trace_erofs_map_blocks_enter(inode, map, flags); map->m_deviceid = 0; if (map->m_la >= inode->i_size) { /* leave out-of-bound access unmapped */ @@ -120,8 +110,10 @@ static int erofs_map_blocks(struct inode *inode, goto out; }
- if (vi->datalayout != EROFS_INODE_CHUNK_BASED) - return erofs_map_blocks_flatmode(inode, map, flags); + if (vi->datalayout != EROFS_INODE_CHUNK_BASED) { + err = erofs_map_blocks_flatmode(inode, map, flags); + goto out; + }
if (vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES) unit = sizeof(*idx); /* chunk index */ @@ -133,9 +125,10 @@ static int erofs_map_blocks(struct inode *inode, vi->xattr_isize, unit) + unit * chunknr;
page = erofs_get_meta_page(inode->i_sb, erofs_blknr(pos)); - if (IS_ERR(page)) - return PTR_ERR(page); - + if (IS_ERR(page)) { + err = PTR_ERR(page); + goto out; + } map->m_la = chunknr << vi->chunkbits; map->m_plen = min_t(erofs_off_t, 1UL << vi->chunkbits, roundup(inode->i_size - map->m_la, EROFS_BLKSIZ)); @@ -169,7 +162,9 @@ static int erofs_map_blocks(struct inode *inode, unlock_page(page); put_page(page); out: - map->m_llen = map->m_plen; + if (!err) + map->m_llen = map->m_plen; + trace_erofs_map_blocks_exit(inode, map, flags, 0); return err; }
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h index db4f2cec8360..717ddd17acb1 100644 --- a/include/trace/events/erofs.h +++ b/include/trace/events/erofs.h @@ -169,7 +169,7 @@ DECLARE_EVENT_CLASS(erofs__map_blocks_enter, __entry->flags ? show_map_flags(__entry->flags) : "NULL") );
-DEFINE_EVENT(erofs__map_blocks_enter, erofs_map_blocks_flatmode_enter, +DEFINE_EVENT(erofs__map_blocks_enter, erofs_map_blocks_enter, TP_PROTO(struct inode *inode, struct erofs_map_blocks *map, unsigned flags),
@@ -221,7 +221,7 @@ DECLARE_EVENT_CLASS(erofs__map_blocks_exit, show_mflags(__entry->mflags), __entry->ret) );
-DEFINE_EVENT(erofs__map_blocks_exit, erofs_map_blocks_flatmode_exit, +DEFINE_EVENT(erofs__map_blocks_exit, erofs_map_blocks_exit, TP_PROTO(struct inode *inode, struct erofs_map_blocks *map, unsigned flags, int ret),
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/5af84129f73c550f84ce568646603d0...
--------------------------------
ANBZ: #1666
commit fdf80a4793021c2f27953b3075f401a497519ba4 upstream.
In order to support subpage and folio for all uncompressed files, introduce meta buffer descriptors, which can be effectively stored on stack, in place of meta page operations.
This converts the uncompressed data path to meta buffers.
Link: https://lore.kernel.org/r/20220102040017.51352-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com [ Huang Jianan: kill the unnecessary kunmap_atomic in erofs_read_raw_page() ] Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 95 ++++++++++++++++++++++++++++++++++----------- fs/erofs/internal.h | 13 +++++++ 2 files changed, 86 insertions(+), 22 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 125698c6f6e2..7b855137be8b 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -32,19 +32,75 @@ static void erofs_readendio(struct bio *bio) bio_put(bio); }
-struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr) +static struct page *erofs_read_meta_page(struct super_block *sb, pgoff_t index) { struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping; struct page *page;
- page = read_cache_page_gfp(mapping, blkaddr, + page = read_cache_page_gfp(mapping, index, mapping_gfp_constraint(mapping, ~__GFP_FS)); + return page; +} + +struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr) +{ + struct page *page = erofs_read_meta_page(sb, blkaddr); + /* should already be PageUptodate */ if (!IS_ERR(page)) lock_page(page); return page; }
+void erofs_unmap_metabuf(struct erofs_buf *buf) +{ + if (buf->kmap_type == EROFS_KMAP) + kunmap(buf->page); + else if (buf->kmap_type == EROFS_KMAP_ATOMIC) + kunmap_atomic(buf->base); + buf->base = NULL; + buf->kmap_type = EROFS_NO_KMAP; +} + +void erofs_put_metabuf(struct erofs_buf *buf) +{ + if (!buf->page) + return; + erofs_unmap_metabuf(buf); + put_page(buf->page); + buf->page = NULL; +} + +void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, + erofs_blk_t blkaddr, enum erofs_kmap_type type) +{ + erofs_off_t offset = blknr_to_addr(blkaddr); + pgoff_t index = offset >> PAGE_SHIFT; + struct page *page = buf->page; + + if (!page || page->index != index) { + erofs_put_metabuf(buf); + page = erofs_read_meta_page(sb, index); + if (IS_ERR(page)) + return page; + /* should already be PageUptodate, no need to lock page */ + buf->page = page; + } + if (buf->kmap_type == EROFS_NO_KMAP) { + if (type == EROFS_KMAP) + buf->base = kmap(page); + else if (type == EROFS_KMAP_ATOMIC) + buf->base = kmap_atomic(page); + buf->kmap_type = type; + } else if (buf->kmap_type != type) { + DBG_BUGON(1); + return ERR_PTR(-EFAULT); + } + if (type == EROFS_NO_KMAP) + return NULL; + return buf->base + (offset & ~PAGE_MASK); +} + static int erofs_map_blocks_flatmode(struct inode *inode, struct erofs_map_blocks *map, int flags) @@ -54,7 +110,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode, struct erofs_inode *vi = EROFS_I(inode); bool tailendpacking = (vi->datalayout == EROFS_INODE_FLAT_INLINE);
- nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE); + nblocks = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ); lastblk = nblocks - tailendpacking;
/* there is no hole in flatmode */ @@ -95,10 +151,11 @@ static int erofs_map_blocks(struct inode *inode, struct super_block *sb = inode->i_sb; struct erofs_inode *vi = EROFS_I(inode); struct erofs_inode_chunk_index *idx; - struct page *page; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; u64 chunknr; unsigned int unit; erofs_off_t pos; + void *kaddr; int err = 0;
trace_erofs_map_blocks_enter(inode, map, flags); @@ -124,9 +181,9 @@ static int erofs_map_blocks(struct inode *inode, pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize + vi->xattr_isize, unit) + unit * chunknr;
- page = erofs_get_meta_page(inode->i_sb, erofs_blknr(pos)); - if (IS_ERR(page)) { - err = PTR_ERR(page); + kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), EROFS_KMAP); + if (IS_ERR(kaddr)) { + err = PTR_ERR(kaddr); goto out; } map->m_la = chunknr << vi->chunkbits; @@ -135,7 +192,7 @@ static int erofs_map_blocks(struct inode *inode,
/* handle block map */ if (!(vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES)) { - __le32 *blkaddr = page_address(page) + erofs_blkoff(pos); + __le32 *blkaddr = kaddr + erofs_blkoff(pos);
if (le32_to_cpu(*blkaddr) == EROFS_NULL_ADDR) { map->m_flags = 0; @@ -146,7 +203,7 @@ static int erofs_map_blocks(struct inode *inode, goto out_unlock; } /* parse chunk indexes */ - idx = page_address(page) + erofs_blkoff(pos); + idx = kaddr + erofs_blkoff(pos); switch (le32_to_cpu(idx->blkaddr)) { case EROFS_NULL_ADDR: map->m_flags = 0; @@ -159,8 +216,7 @@ static int erofs_map_blocks(struct inode *inode, break; } out_unlock: - unlock_page(page); - put_page(page); + erofs_put_metabuf(&buf); out: if (!err) map->m_llen = map->m_plen; @@ -274,30 +330,25 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* deal with inline page */ if (map.m_flags & EROFS_MAP_META) { void *vsrc, *vto; - struct page *ipage; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
DBG_BUGON(map.m_plen > PAGE_SIZE);
- ipage = erofs_get_meta_page(sb, blknr); - - if (IS_ERR(ipage)) { - err = PTR_ERR(ipage); + vsrc = erofs_read_metabuf(&buf, inode->i_sb, + blknr, EROFS_KMAP_ATOMIC); + if (IS_ERR(vsrc)) { + err = PTR_ERR(vsrc); goto err_out; } - - vsrc = kmap_atomic(ipage); vto = kmap_atomic(page); memcpy(vto, vsrc + blkoff, map.m_plen); memset(vto + map.m_plen, 0, PAGE_SIZE - map.m_plen); kunmap_atomic(vto); - kunmap_atomic(vsrc); flush_dcache_page(page);
SetPageUptodate(page); - /* TODO: could we unlock the page earlier? */ - unlock_page(ipage); - put_page(ipage);
+ erofs_put_metabuf(&buf); /* imply err = 0, see erofs_map_blocks */ goto has_updated; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 954a1cfb9509..ca6e127e1096 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -200,6 +200,19 @@ static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) #error erofs cannot be used in this platform #endif
+enum erofs_kmap_type { + EROFS_NO_KMAP, /* don't map the buffer */ + EROFS_KMAP, /* use kmap() to map the buffer */ + EROFS_KMAP_ATOMIC, /* use kmap_atomic() to map the buffer */ +}; + +struct erofs_buf { + struct page *page; + void *base; + enum erofs_kmap_type kmap_type; +}; +#define __EROFS_BUF_INITIALIZER ((struct erofs_buf){ .page = NULL }) + #define ROOT_NID(sb) ((sb)->root_nid)
#define erofs_blknr(addr) ((addr) / EROFS_BLKSIZ)
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/961287873c84a2a80d4a38e67df32b0...
--------------------------------
ANBZ: #1666
commit c521e3ad6cc980df6f3bdd2616808ecb973af880 upstream.
Get rid of old erofs_get_meta_page() within inode operations by using on-stack meta buffers in order to prepare subpage and folio features.
Link: https://lore.kernel.org/r/20220102040017.51352-3-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu huyue2@yulong.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/inode.c | 68 +++++++++++++++++++++------------------------ fs/erofs/internal.h | 3 ++ 2 files changed, 35 insertions(+), 36 deletions(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 24b2e5da0893..c1abf4c74f37 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -13,8 +13,8 @@ * the inode payload page if it's an extended inode) in order to fill * inline data if possible. */ -static struct page *erofs_read_inode(struct inode *inode, - unsigned int *ofs) +static void *erofs_read_inode(struct erofs_buf *buf, + struct inode *inode, unsigned int *ofs) { struct super_block *sb = inode->i_sb; struct erofs_sb_info *sbi = EROFS_SB(sb); @@ -22,7 +22,7 @@ static struct page *erofs_read_inode(struct inode *inode, const erofs_off_t inode_loc = iloc(sbi, vi->nid);
erofs_blk_t blkaddr, nblks = 0; - struct page *page; + void *kaddr; struct erofs_inode_compact *dic; struct erofs_inode_extended *die, *copied = NULL; unsigned int ifmt; @@ -34,14 +34,14 @@ static struct page *erofs_read_inode(struct inode *inode, erofs_dbg("%s, reading inode nid %llu at %u of blkaddr %u", __func__, vi->nid, *ofs, blkaddr);
- page = erofs_get_meta_page(sb, blkaddr); - if (IS_ERR(page)) { + kaddr = erofs_read_metabuf(buf, sb, blkaddr, EROFS_KMAP); + if (IS_ERR(kaddr)) { erofs_err(sb, "failed to get inode (nid: %llu) page, err %ld", - vi->nid, PTR_ERR(page)); - return page; + vi->nid, PTR_ERR(kaddr)); + return kaddr; }
- dic = page_address(page) + *ofs; + dic = kaddr + *ofs; ifmt = le16_to_cpu(dic->i_format);
if (ifmt & ~EROFS_I_ALL) { @@ -62,12 +62,12 @@ static struct page *erofs_read_inode(struct inode *inode, switch (erofs_inode_version(ifmt)) { case EROFS_INODE_LAYOUT_EXTENDED: vi->inode_isize = sizeof(struct erofs_inode_extended); - /* check if the inode acrosses page boundary */ - if (*ofs + vi->inode_isize <= PAGE_SIZE) { + /* check if the extended inode acrosses block boundary */ + if (*ofs + vi->inode_isize <= EROFS_BLKSIZ) { *ofs += vi->inode_isize; die = (struct erofs_inode_extended *)dic; } else { - const unsigned int gotten = PAGE_SIZE - *ofs; + const unsigned int gotten = EROFS_BLKSIZ - *ofs;
copied = kmalloc(vi->inode_isize, GFP_NOFS); if (!copied) { @@ -75,18 +75,16 @@ static struct page *erofs_read_inode(struct inode *inode, goto err_out; } memcpy(copied, dic, gotten); - unlock_page(page); - put_page(page); - - page = erofs_get_meta_page(sb, blkaddr + 1); - if (IS_ERR(page)) { - erofs_err(sb, "failed to get inode payload page (nid: %llu), err %ld", - vi->nid, PTR_ERR(page)); + kaddr = erofs_read_metabuf(buf, sb, blkaddr + 1, + EROFS_KMAP); + if (IS_ERR(kaddr)) { + erofs_err(sb, "failed to get inode payload block (nid: %llu), err %ld", + vi->nid, PTR_ERR(kaddr)); kfree(copied); - return page; + return kaddr; } *ofs = vi->inode_isize - gotten; - memcpy((u8 *)copied + gotten, page_address(page), *ofs); + memcpy((u8 *)copied + gotten, kaddr, *ofs); die = copied; } vi->xattr_isize = erofs_xattr_ibody_size(die->i_xattr_icount); @@ -196,7 +194,7 @@ static struct page *erofs_read_inode(struct inode *inode, inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9; else inode->i_blocks = nblks << LOG_SECTORS_PER_BLOCK; - return page; + return kaddr;
bogusimode: erofs_err(inode->i_sb, "bogus i_mode (%o) @ nid %llu", @@ -205,12 +203,11 @@ static struct page *erofs_read_inode(struct inode *inode, err_out: DBG_BUGON(1); kfree(copied); - unlock_page(page); - put_page(page); + erofs_put_metabuf(buf); return ERR_PTR(err); }
-static int erofs_fill_symlink(struct inode *inode, void *data, +static int erofs_fill_symlink(struct inode *inode, void *kaddr, unsigned int m_pofs) { struct erofs_inode *vi = EROFS_I(inode); @@ -218,7 +215,7 @@ static int erofs_fill_symlink(struct inode *inode, void *data,
/* if it cannot be handled with fast symlink scheme */ if (vi->datalayout != EROFS_INODE_FLAT_INLINE || - inode->i_size >= PAGE_SIZE) { + inode->i_size >= EROFS_BLKSIZ) { inode->i_op = &erofs_symlink_iops; return 0; } @@ -228,8 +225,8 @@ static int erofs_fill_symlink(struct inode *inode, void *data, return -ENOMEM;
m_pofs += vi->xattr_isize; - /* inline symlink data shouldn't cross page boundary as well */ - if (m_pofs + inode->i_size > PAGE_SIZE) { + /* inline symlink data shouldn't cross block boundary */ + if (m_pofs + inode->i_size > EROFS_BLKSIZ) { kfree(lnk); erofs_err(inode->i_sb, "inline data cross block boundary @ nid %llu", @@ -237,8 +234,7 @@ static int erofs_fill_symlink(struct inode *inode, void *data, DBG_BUGON(1); return -EFSCORRUPTED; } - - memcpy(lnk, data + m_pofs, inode->i_size); + memcpy(lnk, kaddr + m_pofs, inode->i_size); lnk[inode->i_size] = '\0';
inode->i_link = lnk; @@ -249,16 +245,17 @@ static int erofs_fill_symlink(struct inode *inode, void *data, static int erofs_fill_inode(struct inode *inode, int isdir) { struct erofs_inode *vi = EROFS_I(inode); - struct page *page; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; + void *kaddr; unsigned int ofs; int err = 0;
trace_erofs_fill_inode(inode, isdir);
/* read inode base data from disk */ - page = erofs_read_inode(inode, &ofs); - if (IS_ERR(page)) - return PTR_ERR(page); + kaddr = erofs_read_inode(&buf, inode, &ofs); + if (IS_ERR(kaddr)) + return PTR_ERR(kaddr);
/* setup the new inode */ switch (inode->i_mode & S_IFMT) { @@ -271,7 +268,7 @@ static int erofs_fill_inode(struct inode *inode, int isdir) inode->i_fop = &erofs_dir_fops; break; case S_IFLNK: - err = erofs_fill_symlink(inode, page_address(page), ofs); + err = erofs_fill_symlink(inode, kaddr, ofs); if (err) goto out_unlock; inode_nohighmem(inode); @@ -295,8 +292,7 @@ static int erofs_fill_inode(struct inode *inode, int isdir) inode->i_mapping->a_ops = &erofs_raw_access_aops;
out_unlock: - unlock_page(page); - put_page(page); + erofs_put_metabuf(&buf); return err; }
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index ca6e127e1096..8a495bc4df1e 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -385,6 +385,9 @@ struct erofs_map_dev {
/* data.c */ struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr); +void erofs_put_metabuf(struct erofs_buf *buf); +void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, + erofs_blk_t blkaddr, enum erofs_kmap_type type); int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
/* inode.c */
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/59d167b952d09dd96b13422f0245159...
--------------------------------
ANBZ: #1666
commit 2b5379f7860d8e95571a4837ac4c07167b4233bd upstream.
Get rid of old erofs_get_meta_page() within super operations by using on-stack meta buffers in order to prepare subpage and folio features.
Link: https://lore.kernel.org/r/20220102081317.109797-1-hsiangkao@linux.alibaba.co... Reviewed-by: Yue Hu huyue2@yulong.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/super.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 8ba942170b65..fd7c73adabca 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -2,6 +2,7 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021, Alibaba Cloud */ #include <linux/module.h> #include <linux/buffer_head.h> @@ -128,7 +129,7 @@ static int erofs_init_devices(struct super_block *sb, struct erofs_sb_info *sbi = EROFS_SB(sb); unsigned int ondisk_extradevs; erofs_off_t pos; - struct page *page = NULL; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; struct erofs_device_info *dif; struct erofs_deviceslot *dis; void *ptr; @@ -152,22 +153,13 @@ static int erofs_init_devices(struct super_block *sb, pos = le16_to_cpu(dsb->devt_slotoff) * EROFS_DEVT_SLOT_SIZE; down_read(&sbi->devs->rwsem); idr_for_each_entry(&sbi->devs->tree, dif, id) { - erofs_blk_t blk = erofs_blknr(pos); struct block_device *bdev;
- if (!page || page->index != blk) { - if (page) { - kunmap(page); - unlock_page(page); - put_page(page); - } - - page = erofs_get_meta_page(sb, blk); - if (IS_ERR(page)) { - up_read(&sbi->devs->rwsem); - return PTR_ERR(page); - } - ptr = kmap(page); + ptr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), + EROFS_KMAP); + if (IS_ERR(ptr)) { + err = PTR_ERR(ptr); + break; } dis = ptr + erofs_blkoff(pos);
@@ -186,11 +178,7 @@ static int erofs_init_devices(struct super_block *sb, } err_out: up_read(&sbi->devs->rwsem); - if (page) { - kunmap(page); - unlock_page(page); - put_page(page); - } + erofs_put_metabuf(&buf); return err; }
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/93f1a84e1ec80ff5d623e3ccab57050...
--------------------------------
ANBZ: #1666
commit bb88e8da00253bea0e7f0f4cdfd7910572d7799f upstream.
Get rid of old erofs_get_meta_page() within xattr operations by using on-stack meta buffers in order to prepare subpage and folio features.
Link: https://lore.kernel.org/r/20220102040017.51352-5-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/xattr.c | 135 ++++++++++++++--------------------------------- fs/erofs/xattr.h | 1 - 2 files changed, 41 insertions(+), 95 deletions(-)
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c index 59148ffc0c1f..7c55336959cb 100644 --- a/fs/erofs/xattr.c +++ b/fs/erofs/xattr.c @@ -2,39 +2,20 @@ /* * Copyright (C) 2017-2018 HUAWEI, Inc. * https://www.huawei.com/ + * Copyright (C) 2021-2022, Alibaba Cloud */ #include <linux/security.h> #include "xattr.h"
struct xattr_iter { struct super_block *sb; - struct page *page; + struct erofs_buf buf; void *kaddr;
erofs_blk_t blkaddr; unsigned int ofs; };
-static inline void xattr_iter_end(struct xattr_iter *it, bool atomic) -{ - /* the only user of kunmap() is 'init_inode_xattrs' */ - if (!atomic) - kunmap(it->page); - else - kunmap_atomic(it->kaddr); - - unlock_page(it->page); - put_page(it->page); -} - -static inline void xattr_iter_end_final(struct xattr_iter *it) -{ - if (!it->page) - return; - - xattr_iter_end(it, true); -} - static int init_inode_xattrs(struct inode *inode) { struct erofs_inode *const vi = EROFS_I(inode); @@ -43,7 +24,6 @@ static int init_inode_xattrs(struct inode *inode) struct erofs_xattr_ibody_header *ih; struct super_block *sb; struct erofs_sb_info *sbi; - bool atomic_map; int ret = 0;
/* the most case is that xattrs of this inode are initialized. */ @@ -91,26 +71,23 @@ static int init_inode_xattrs(struct inode *inode)
sb = inode->i_sb; sbi = EROFS_SB(sb); + it.buf = __EROFS_BUF_INITIALIZER; it.blkaddr = erofs_blknr(iloc(sbi, vi->nid) + vi->inode_isize); it.ofs = erofs_blkoff(iloc(sbi, vi->nid) + vi->inode_isize);
- it.page = erofs_get_meta_page(sb, it.blkaddr); - if (IS_ERR(it.page)) { - ret = PTR_ERR(it.page); + /* read in shared xattr array (non-atomic, see kmalloc below) */ + it.kaddr = erofs_read_metabuf(&it.buf, sb, it.blkaddr, EROFS_KMAP); + if (IS_ERR(it.kaddr)) { + ret = PTR_ERR(it.kaddr); goto out_unlock; }
- /* read in shared xattr array (non-atomic, see kmalloc below) */ - it.kaddr = kmap(it.page); - atomic_map = false; - ih = (struct erofs_xattr_ibody_header *)(it.kaddr + it.ofs); - vi->xattr_shared_count = ih->h_shared_count; vi->xattr_shared_xattrs = kmalloc_array(vi->xattr_shared_count, sizeof(uint), GFP_KERNEL); if (!vi->xattr_shared_xattrs) { - xattr_iter_end(&it, atomic_map); + erofs_put_metabuf(&it.buf); ret = -ENOMEM; goto out_unlock; } @@ -122,25 +99,22 @@ static int init_inode_xattrs(struct inode *inode) if (it.ofs >= EROFS_BLKSIZ) { /* cannot be unaligned */ DBG_BUGON(it.ofs != EROFS_BLKSIZ); - xattr_iter_end(&it, atomic_map);
- it.page = erofs_get_meta_page(sb, ++it.blkaddr); - if (IS_ERR(it.page)) { + it.kaddr = erofs_read_metabuf(&it.buf, sb, ++it.blkaddr, + EROFS_KMAP); + if (IS_ERR(it.kaddr)) { kfree(vi->xattr_shared_xattrs); vi->xattr_shared_xattrs = NULL; - ret = PTR_ERR(it.page); + ret = PTR_ERR(it.kaddr); goto out_unlock; } - - it.kaddr = kmap_atomic(it.page); - atomic_map = true; it.ofs = 0; } vi->xattr_shared_xattrs[i] = le32_to_cpu(*(__le32 *)(it.kaddr + it.ofs)); it.ofs += sizeof(__le32); } - xattr_iter_end(&it, atomic_map); + erofs_put_metabuf(&it.buf);
/* paired with smp_mb() at the beginning of the function. */ smp_mb(); @@ -172,19 +146,11 @@ static inline int xattr_iter_fixup(struct xattr_iter *it) if (it->ofs < EROFS_BLKSIZ) return 0;
- xattr_iter_end(it, true); - it->blkaddr += erofs_blknr(it->ofs); - - it->page = erofs_get_meta_page(it->sb, it->blkaddr); - if (IS_ERR(it->page)) { - int err = PTR_ERR(it->page); - - it->page = NULL; - return err; - } - - it->kaddr = kmap_atomic(it->page); + it->kaddr = erofs_read_metabuf(&it->buf, it->sb, it->blkaddr, + EROFS_KMAP_ATOMIC); + if (IS_ERR(it->kaddr)) + return PTR_ERR(it->kaddr); it->ofs = erofs_blkoff(it->ofs); return 0; } @@ -207,11 +173,10 @@ static int inline_xattr_iter_begin(struct xattr_iter *it, it->blkaddr = erofs_blknr(iloc(sbi, vi->nid) + inline_xattr_ofs); it->ofs = erofs_blkoff(iloc(sbi, vi->nid) + inline_xattr_ofs);
- it->page = erofs_get_meta_page(inode->i_sb, it->blkaddr); - if (IS_ERR(it->page)) - return PTR_ERR(it->page); - - it->kaddr = kmap_atomic(it->page); + it->kaddr = erofs_read_metabuf(&it->buf, inode->i_sb, it->blkaddr, + EROFS_KMAP_ATOMIC); + if (IS_ERR(it->kaddr)) + return PTR_ERR(it->kaddr); return vi->xattr_isize - xattr_header_sz; }
@@ -272,7 +237,7 @@ static int xattr_foreach(struct xattr_iter *it, it->ofs = 0; }
- slice = min_t(unsigned int, PAGE_SIZE - it->ofs, + slice = min_t(unsigned int, EROFS_BLKSIZ - it->ofs, entry.e_name_len - processed);
/* handle name */ @@ -307,7 +272,7 @@ static int xattr_foreach(struct xattr_iter *it, it->ofs = 0; }
- slice = min_t(unsigned int, PAGE_SIZE - it->ofs, + slice = min_t(unsigned int, EROFS_BLKSIZ - it->ofs, value_sz - processed); op->value(it, processed, it->kaddr + it->ofs, slice); it->ofs += slice; @@ -386,8 +351,6 @@ static int inline_getxattr(struct inode *inode, struct getxattr_iter *it) if (ret != -ENOATTR) break; } - xattr_iter_end_final(&it->it); - return ret ? ret : it->buffer_size; }
@@ -404,26 +367,16 @@ static int shared_getxattr(struct inode *inode, struct getxattr_iter *it) xattrblock_addr(sbi, vi->xattr_shared_xattrs[i]);
it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]); - - if (!i || blkaddr != it->it.blkaddr) { - if (i) - xattr_iter_end(&it->it, true); - - it->it.page = erofs_get_meta_page(sb, blkaddr); - if (IS_ERR(it->it.page)) - return PTR_ERR(it->it.page); - - it->it.kaddr = kmap_atomic(it->it.page); - it->it.blkaddr = blkaddr; - } + it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr, + EROFS_KMAP_ATOMIC); + if (IS_ERR(it->it.kaddr)) + return PTR_ERR(it->it.kaddr); + it->it.blkaddr = blkaddr;
ret = xattr_foreach(&it->it, &find_xattr_handlers, NULL); if (ret != -ENOATTR) break; } - if (vi->xattr_shared_count) - xattr_iter_end_final(&it->it); - return ret ? ret : it->buffer_size; }
@@ -452,10 +405,11 @@ int erofs_getxattr(struct inode *inode, int index, return ret;
it.index = index; - it.name.len = strlen(name); if (it.name.len > EROFS_NAME_LEN) return -ERANGE; + + it.it.buf = __EROFS_BUF_INITIALIZER; it.name.name = name;
it.buffer = buffer; @@ -465,6 +419,7 @@ int erofs_getxattr(struct inode *inode, int index, ret = inline_getxattr(inode, &it); if (ret == -ENOATTR) ret = shared_getxattr(inode, &it); + erofs_put_metabuf(&it.it.buf); return ret; }
@@ -607,7 +562,6 @@ static int inline_listxattr(struct listxattr_iter *it) if (ret) break; } - xattr_iter_end_final(&it->it); return ret ? ret : it->buffer_ofs; }
@@ -625,25 +579,16 @@ static int shared_listxattr(struct listxattr_iter *it) xattrblock_addr(sbi, vi->xattr_shared_xattrs[i]);
it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]); - if (!i || blkaddr != it->it.blkaddr) { - if (i) - xattr_iter_end(&it->it, true); - - it->it.page = erofs_get_meta_page(sb, blkaddr); - if (IS_ERR(it->it.page)) - return PTR_ERR(it->it.page); - - it->it.kaddr = kmap_atomic(it->it.page); - it->it.blkaddr = blkaddr; - } + it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr, + EROFS_KMAP_ATOMIC); + if (IS_ERR(it->it.kaddr)) + return PTR_ERR(it->it.kaddr); + it->it.blkaddr = blkaddr;
ret = xattr_foreach(&it->it, &list_xattr_handlers, NULL); if (ret) break; } - if (vi->xattr_shared_count) - xattr_iter_end_final(&it->it); - return ret ? ret : it->buffer_ofs; }
@@ -659,6 +604,7 @@ ssize_t erofs_listxattr(struct dentry *dentry, if (ret) return ret;
+ it.it.buf = __EROFS_BUF_INITIALIZER; it.dentry = dentry; it.buffer = buffer; it.buffer_size = buffer_size; @@ -667,9 +613,10 @@ ssize_t erofs_listxattr(struct dentry *dentry, it.it.sb = dentry->d_sb;
ret = inline_listxattr(&it); - if (ret < 0 && ret != -ENOATTR) - return ret; - return shared_listxattr(&it); + if (ret >= 0 || ret == -ENOATTR) + ret = shared_listxattr(&it); + erofs_put_metabuf(&it.it.buf); + return ret; }
#ifdef CONFIG_EROFS_FS_POSIX_ACL diff --git a/fs/erofs/xattr.h b/fs/erofs/xattr.h index 366dcb400525..50e283d0526b 100644 --- a/fs/erofs/xattr.h +++ b/fs/erofs/xattr.h @@ -86,4 +86,3 @@ struct posix_acl *erofs_get_acl(struct inode *inode, int type); #endif
#endif -
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/fc0e0a6efb525928abcad54374c6dbf...
--------------------------------
ANBZ: #1666
commit 09c543798c3cde19aae575a0f76d5fc7c130ff18 upstream.
Get rid of old erofs_get_meta_page() within zmap operations by using on-stack meta buffers in order to prepare subpage and folio features.
Finally, erofs_get_meta_page() is useless. Get rid of it!
Link: https://lore.kernel.org/r/20220102040017.51352-6-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu huyue2@yulong.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org [ Gao Xiang: adapt anolis codebase which is mainly based on 5.10. ] Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 10 ---------- fs/erofs/internal.h | 6 +++--- fs/erofs/zdata.c | 9 ++------- fs/erofs/zmap.c | 46 +++++++++++---------------------------------- 4 files changed, 16 insertions(+), 55 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 7b855137be8b..ac331d1c2e2f 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -42,16 +42,6 @@ static struct page *erofs_read_meta_page(struct super_block *sb, pgoff_t index) return page; }
-struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr) -{ - struct page *page = erofs_read_meta_page(sb, blkaddr); - - /* should already be PageUptodate */ - if (!IS_ERR(page)) - lock_page(page); - return page; -} - void erofs_unmap_metabuf(struct erofs_buf *buf) { if (buf->kmap_type == EROFS_KMAP) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 8a495bc4df1e..efaa98a84508 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -348,13 +348,13 @@ enum { #define EROFS_MAP_FULL_MAPPED (1 << BH_FullMapped)
struct erofs_map_blocks { + struct erofs_buf buf; + erofs_off_t m_pa, m_la; u64 m_plen, m_llen;
unsigned short m_deviceid; unsigned int m_flags; - - struct page *mpage; };
/* Flags used by erofs_map_blocks_flatmode() */ @@ -384,7 +384,7 @@ struct erofs_map_dev { };
/* data.c */ -struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr); +void erofs_unmap_metabuf(struct erofs_buf *buf); void erofs_put_metabuf(struct erofs_buf *buf); void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, erofs_blk_t blkaddr, enum erofs_kmap_type type); diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index eeb286935f8e..c93a511e21fb 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1303,9 +1303,7 @@ static int z_erofs_readpage(struct file *file, struct page *page) if (err) erofs_err(inode->i_sb, "failed to read, err [%d]", err);
- if (f.map.mpage) - put_page(f.map.mpage); - + erofs_put_metabuf(&f.map.buf); /* clean up the remaining free pages */ put_pages_list(&pagepool); return err; @@ -1359,10 +1357,7 @@ static void z_erofs_readahead(struct readahead_control *rac) (void)z_erofs_collector_end(&f.clt);
z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync); - - if (f.map.mpage) - put_page(f.map.mpage); - + erofs_put_metabuf(&f.map.buf); /* clean up the remaining free pages */ put_pages_list(&pagepool); } diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index 6b9987205c90..840e795a7c70 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -31,7 +31,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) struct super_block *const sb = inode->i_sb; int err; erofs_off_t pos; - struct page *page; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; void *kaddr; struct z_erofs_map_header *h;
@@ -55,14 +55,13 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize + vi->xattr_isize, 8); - page = erofs_get_meta_page(sb, erofs_blknr(pos)); - if (IS_ERR(page)) { - err = PTR_ERR(page); + kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), + EROFS_KMAP_ATOMIC); + if (IS_ERR(kaddr)) { + err = PTR_ERR(kaddr); goto out_unlock; }
- kaddr = kmap_atomic(page); - h = kaddr + erofs_blkoff(pos); vi->z_advise = le16_to_cpu(h->h_advise); vi->z_algorithmtype[0] = h->h_algorithmtype & 15; @@ -92,9 +91,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) smp_mb(); set_bit(EROFS_I_Z_INITED_BIT, &vi->flags); unmap_done: - kunmap_atomic(kaddr); - unlock_page(page); - put_page(page); + erofs_put_metabuf(&buf); out_unlock: clear_and_wake_up_bit(EROFS_I_BL_Z_BIT, &vi->flags); return err; @@ -117,31 +114,11 @@ static int z_erofs_reload_indexes(struct z_erofs_maprecorder *m, erofs_blk_t eblk) { struct super_block *const sb = m->inode->i_sb; - struct erofs_map_blocks *const map = m->map; - struct page *mpage = map->mpage; - - if (mpage) { - if (mpage->index == eblk) { - if (!m->kaddr) - m->kaddr = kmap_atomic(mpage); - return 0; - }
- if (m->kaddr) { - kunmap_atomic(m->kaddr); - m->kaddr = NULL; - } - put_page(mpage); - } - - mpage = erofs_get_meta_page(sb, eblk); - if (IS_ERR(mpage)) { - map->mpage = NULL; - return PTR_ERR(mpage); - } - m->kaddr = kmap_atomic(mpage); - unlock_page(mpage); - map->mpage = mpage; + m->kaddr = erofs_read_metabuf(&m->map->buf, sb, eblk, + EROFS_KMAP_ATOMIC); + if (IS_ERR(m->kaddr)) + return PTR_ERR(m->kaddr); return 0; }
@@ -461,8 +438,7 @@ int z_erofs_map_blocks_iter(struct inode *inode, map->m_flags |= EROFS_MAP_MAPPED;
unmap_out: - if (m.kaddr) - kunmap_atomic(m.kaddr); + erofs_unmap_metabuf(&m.map->buf);
out: erofs_dbg("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/5def3ba1b8e34139e2d7bf3b872b16b...
--------------------------------
ANBZ: #1666
commit ed6e0401e68bdfe08de9b44968fb235ff10ccee1 upstream.
The only change is that, meta buffers read cache page without __GFP_FS flag, which shall not matter.
Link: https://lore.kernel.org/r/20220209060108.43051-7-jefflexu@linux.alibaba.com Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/super.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index fd7c73adabca..33a223f908a2 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -185,21 +185,19 @@ static int erofs_init_devices(struct super_block *sb, static int erofs_read_superblock(struct super_block *sb) { struct erofs_sb_info *sbi; - struct page *page; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; struct erofs_super_block *dsb; unsigned int blkszbits; void *data; int ret;
- page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL); - if (IS_ERR(page)) { + data = erofs_read_metabuf(&buf, sb, 0, EROFS_KMAP); + if (IS_ERR(data)) { erofs_err(sb, "cannot read erofs superblock"); - return PTR_ERR(page); + return PTR_ERR(data); }
sbi = EROFS_SB(sb); - - data = kmap(page); dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
ret = -EINVAL; @@ -252,8 +250,7 @@ static int erofs_read_superblock(struct super_block *sb) /* handle multiple devices */ ret = erofs_init_devices(sb, dsb); out: - kunmap(page); - put_page(page); + erofs_put_metabuf(&buf); return ret; }
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/22dd900069a3974bae72f8f59d9487a...
--------------------------------
ANBZ: #6079
commit d705117ddd724a9d4877e338e4587010ab6a1c62 upstream.
Unsupported chunk format should be checked with "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)"
Found when checking with 4k-byte blockmap (although currently mkfs uses inode chunk indexes format by default.)
Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.co... Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1988 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index c1abf4c74f37..b1aca34c47a6 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -174,7 +174,7 @@ static void *erofs_read_inode(struct erofs_buf *buf, }
if (vi->datalayout == EROFS_INODE_CHUNK_BASED) { - if (!(vi->chunkformat & EROFS_CHUNK_FORMAT_ALL)) { + if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL) { erofs_err(inode->i_sb, "unsupported chunk format %x of nid %llu", vi->chunkformat, vi->nid);
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/29a07ee10787
--------------------------------
ANBZ: #1666
commit c8383054506c77b814489c09877b5db83fd4abf2 upstream.
Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution.
The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon.
As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached.
Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com Acked-by: David Howells dhowells@redhat.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/Kconfig | 12 + fs/cachefiles/Makefile | 1 + fs/cachefiles/daemon.c | 87 ++++++- fs/cachefiles/internal.h | 51 ++++ fs/cachefiles/namei.c | 10 + fs/cachefiles/ondemand.c | 388 ++++++++++++++++++++++++++++++ include/trace/events/cachefiles.h | 2 + include/uapi/linux/cachefiles.h | 68 ++++++ lib/radix-tree.c | 1 + 9 files changed, 607 insertions(+), 13 deletions(-) create mode 100644 fs/cachefiles/ondemand.c create mode 100644 include/uapi/linux/cachefiles.h
diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig index ff9ca55a9ae9..12174e2616f8 100644 --- a/fs/cachefiles/Kconfig +++ b/fs/cachefiles/Kconfig @@ -38,3 +38,15 @@ config CACHEFILES_HISTOGRAM
See Documentation/filesystems/caching/cachefiles.rst for more information. + +config CACHEFILES_ONDEMAND + bool "Support for on-demand read" + depends on CACHEFILES + default n + help + This permits userspace to enable the cachefiles on-demand read mode. + In this mode, when a cache miss occurs, responsibility for fetching + the data lies with the cachefiles backend instead of with the netfs + and is delegated to userspace. + + If unsure, say N. diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile index 891dedda5905..c247d8b5e4f9 100644 --- a/fs/cachefiles/Makefile +++ b/fs/cachefiles/Makefile @@ -15,5 +15,6 @@ cachefiles-y := \ xattr.o
cachefiles-$(CONFIG_CACHEFILES_HISTOGRAM) += proc.o +cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o
obj-$(CONFIG_CACHEFILES) := cachefiles.o diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 752c1e43416f..1c85c8dcc0c4 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -73,6 +73,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = { { "inuse", cachefiles_daemon_inuse }, { "secctx", cachefiles_daemon_secctx }, { "tag", cachefiles_daemon_tag }, +#ifdef CONFIG_CACHEFILES_ONDEMAND + { "copen", cachefiles_ondemand_copen }, +#endif { "", NULL } };
@@ -106,6 +109,9 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) rwlock_init(&cache->active_lock); init_waitqueue_head(&cache->daemon_pollwq);
+ INIT_RADIX_TREE(&cache->reqs, GFP_ATOMIC); + idr_init(&cache->ondemand_ids); + /* set default caching limits * - limit at 1% free space and/or free files * - cull below 5% free space and/or free files @@ -123,6 +129,44 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) return 0; }
+static void cachefiles_flush_reqs(struct cachefiles_cache *cache) +{ + void **slot; + struct radix_tree_iter iter; + struct cachefiles_req *req; + + /* + * Make sure the following two operations won't be reordered. + * 1) set CACHEFILES_DEAD bit + * 2) flush requests in the xarray + * Otherwise the request may be enqueued after xarray has been + * flushed, leaving the orphan request never being completed. + * + * CPU 1 CPU 2 + * ===== ===== + * flush requests in the xarray + * test CACHEFILES_DEAD bit + * enqueue the request + * set CACHEFILES_DEAD bit + */ + smp_mb(); + + xa_lock(&cache->reqs); + radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { + req = radix_tree_deref_slot_protected(slot, + &cache->reqs.xa_lock); + BUG_ON(!req); + radix_tree_delete(&cache->reqs, iter.index); + req->error = -EIO; + complete(&req->done); + } + xa_unlock(&cache->reqs); + + xa_lock(&cache->ondemand_ids.idr_rt); + idr_destroy(&cache->ondemand_ids); + xa_unlock(&cache->ondemand_ids.idr_rt); +} + /* * release a cache */ @@ -136,6 +180,8 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
set_bit(CACHEFILES_DEAD, &cache->flags);
+ if (cachefiles_in_ondemand_mode(cache)) + cachefiles_flush_reqs(cache); cachefiles_daemon_unbind(cache);
ASSERT(!cache->active_nodes.rb_node); @@ -151,23 +197,14 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file) return 0; }
-/* - * read the cache state - */ -static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, - size_t buflen, loff_t *pos) +static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen, loff_t *pos) { - struct cachefiles_cache *cache = file->private_data; unsigned long long b_released; unsigned f_released; char buffer[256]; int n;
- //_enter(",,%zu,", buflen); - - if (!test_bit(CACHEFILES_READY, &cache->flags)) - return 0; - /* check how much space the cache has */ cachefiles_has_space(cache, 0, 0);
@@ -205,6 +242,25 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, return n; }
+/* + * read the cache state + */ +static ssize_t cachefiles_daemon_read(struct file *file, + char __user *_buffer, size_t buflen, loff_t *pos) +{ + struct cachefiles_cache *cache = file->private_data; + + //_enter(",,%zu,", buflen); + + if (!test_bit(CACHEFILES_READY, &cache->flags)) + return 0; + + if (cachefiles_in_ondemand_mode(cache)) + return cachefiles_ondemand_daemon_read(cache, _buffer, buflen, pos); + else + return cachefiles_do_daemon_read(cache, _buffer, buflen, pos); +} + /* * command the cache */ @@ -296,8 +352,13 @@ static __poll_t cachefiles_daemon_poll(struct file *file, poll_wait(file, &cache->daemon_pollwq, poll); mask = 0;
- if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) - mask |= EPOLLIN; + if (cachefiles_in_ondemand_mode(cache)) { + if (!radix_tree_empty(&cache->reqs)) + mask |= EPOLLIN; + } else { + if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) + mask |= EPOLLIN; + }
if (test_bit(CACHEFILES_CULLING, &cache->flags)) mask |= EPOLLOUT; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index cf9bd6401c2d..fed71ad90808 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -18,6 +18,8 @@ #include <linux/cred.h> #include <linux/workqueue.h> #include <linux/security.h> +#include <linux/cachefiles.h> +#include <linux/idr.h>
struct cachefiles_cache; struct cachefiles_object; @@ -45,10 +47,15 @@ struct cachefiles_object { uint8_t new; /* T if object new */ spinlock_t work_lock; struct rb_node active_node; /* link in active tree (dentry is key) */ +#ifdef CONFIG_CACHEFILES_ONDEMAND + int ondemand_id; +#endif };
extern struct kmem_cache *cachefiles_object_jar;
+#define CACHEFILES_ONDEMAND_ID_CLOSED -1 + /* * Cache files cache definition */ @@ -84,11 +91,30 @@ struct cachefiles_cache { #define CACHEFILES_DEAD 1 /* T if cache dead */ #define CACHEFILES_CULLING 2 /* T if cull engaged */ #define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */ +#define CACHEFILES_ONDEMAND_MODE 4 /* T if in on-demand read mode */ char *rootdirname; /* name of cache root directory */ char *secctx; /* LSM security context */ char *tag; /* cache binding tag */ + struct radix_tree_root reqs; /* xarray of pending on-demand requests */ + struct idr ondemand_ids; /* xarray for ondemand_id allocation */ + u32 ondemand_id_next; };
+static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache) +{ + return IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) && + test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); +} + +struct cachefiles_req { + struct cachefiles_object *object; + struct completion done; + int error; + struct cachefiles_msg msg; +}; + +#define CACHEFILES_REQ_NEW 0 + /* * backing file read tracking */ @@ -217,6 +243,31 @@ extern int cachefiles_allocate_pages(struct fscache_retrieval *, extern int cachefiles_write_page(struct fscache_storage *, struct page *); extern void cachefiles_uncache_page(struct fscache_object *, struct page *);
+/* + * ondemand.c + */ +#ifdef CONFIG_CACHEFILES_ONDEMAND +extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen, loff_t *pos); + +extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, + char *args); + +extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); + +#else +static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen, loff_t *pos) +{ + return -EOPNOTSUPP; +} + +static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object) +{ + return 0; +} +#endif + /* * security.c */ diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index ecc8ecbbfa5a..22a409669fd0 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -592,6 +592,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, if (ret < 0) goto no_space_error;
+ ret = cachefiles_ondemand_init_object(object); + if (ret < 0) + goto create_error; + path.dentry = dir; ret = security_path_mknod(&path, next, S_IFREG, 0); if (ret < 0) @@ -636,6 +640,12 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, if (!object->new) { _debug("validate '%pd'", next);
+ ret = cachefiles_ondemand_init_object(object); + if (ret < 0) { + object->dentry = NULL; + goto error; + } + ret = cachefiles_check_object_xattr(object, auxdata); if (ret == -ESTALE) { /* delete the object (the deleter drops the directory diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c new file mode 100644 index 000000000000..ff2f00cfa5d4 --- /dev/null +++ b/fs/cachefiles/ondemand.c @@ -0,0 +1,388 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include <linux/fdtable.h> +#include <linux/file.h> +#include <linux/anon_inodes.h> +#include <linux/uio.h> +#include "internal.h" + +static int cachefiles_ondemand_fd_release(struct inode *inode, + struct file *file) +{ + struct cachefiles_object *object = file->private_data; + int object_id = object->ondemand_id; + struct cachefiles_cache *cache; + + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + xa_lock(&cache->ondemand_ids.idr_rt); + idr_remove(&cache->ondemand_ids, object_id); + xa_unlock(&cache->ondemand_ids.idr_rt); + object->fscache.cache->ops->put_object(&object->fscache, + cachefiles_obj_put_ondemand_fd); + return 0; +} + +static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, + struct iov_iter *iter) +{ + struct cachefiles_object *object = kiocb->ki_filp->private_data; + struct cachefiles_cache *cache; + size_t len = iter->count; + loff_t pos = kiocb->ki_pos; + struct path path; + struct file *file; + int ret; + + if (!object->backer) + return -ENOBUFS; + + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + /* write data to the backing filesystem and let it store it in its + * own time */ + path.mnt = cache->mnt; + path.dentry = object->backer; + file = dentry_open(&path, O_RDWR | O_LARGEFILE | O_DIRECT, + cache->cache_cred); + if (IS_ERR(file)) + return -ENOBUFS; + + ret = vfs_iter_write(file, iter, &pos, 0); + fput(file); + if (ret != len) + return -EIO; + return len; +} + +static const struct file_operations cachefiles_ondemand_fd_fops = { + .owner = THIS_MODULE, + .release = cachefiles_ondemand_fd_release, + .write_iter = cachefiles_ondemand_fd_write_iter, +}; + +/* + * OPEN request Completion (copen) + * - command: "copen <id>,<cache_size>" + * <cache_size> indicates the object size if >=0, error code if negative + */ +int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) +{ + struct cachefiles_req *req; + struct fscache_cookie *cookie; + char *pid, *psize; + unsigned long id; + long size; + int ret; + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return -EOPNOTSUPP; + + if (!*args) { + pr_err("Empty id specified\n"); + return -EINVAL; + } + + pid = args; + psize = strchr(args, ','); + if (!psize) { + pr_err("Cache size is not specified\n"); + return -EINVAL; + } + + *psize = 0; + psize++; + + ret = kstrtoul(pid, 0, &id); + if (ret) + return ret; + + xa_lock(&cache->reqs); + req = radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs); + if (!req) + return -EINVAL; + + /* fail OPEN request if copen format is invalid */ + ret = kstrtol(psize, 0, &size); + if (ret) { + req->error = ret; + goto out; + } + + /* fail OPEN request if daemon reports an error */ + if (size < 0) { + if (!IS_ERR_VALUE(size)) + size = -EINVAL; + req->error = size; + goto out; + } + + cookie = req->object->fscache.cookie; + fscache_set_store_limit(&req->object->fscache, size); + if (size) + clear_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags); + else + set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags); + +out: + complete(&req->done); + return ret; +} + +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) +{ + struct cachefiles_object *object = req->object; + struct cachefiles_cache *cache; + struct cachefiles_open *load; + struct file *file; + u32 object_id; + int ret, fd; + + ret = object->fscache.cache->ops->grab_object(&object->fscache, + cachefiles_obj_get_ondemand_fd) ? 0 : -EAGAIN; + if (ret) + return ret; + + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + idr_preload(GFP_KERNEL); + xa_lock(&cache->ondemand_ids.idr_rt); + ret = idr_alloc_cyclic(&cache->ondemand_ids, NULL, + 1, INT_MAX, GFP_ATOMIC); + xa_unlock(&cache->ondemand_ids.idr_rt); + idr_preload_end(); + if (ret < 0) + goto err; + object_id = ret; + + fd = get_unused_fd_flags(O_WRONLY); + if (fd < 0) { + ret = fd; + goto err_free_id; + } + + file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops, + object, O_WRONLY); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto err_put_fd; + } + + file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; + fd_install(fd, file); + + load = (void *)req->msg.data; + load->fd = fd; + req->msg.object_id = object_id; + object->ondemand_id = object_id; + return 0; + +err_put_fd: + put_unused_fd(fd); +err_free_id: + xa_lock(&cache->ondemand_ids.idr_rt); + idr_remove(&cache->ondemand_ids, object_id); + xa_unlock(&cache->ondemand_ids.idr_rt); +err: + object->fscache.cache->ops->put_object(&object->fscache, + cachefiles_obj_put_ondemand_fd); + return ret; +} + +ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, + char __user *_buffer, size_t buflen, loff_t *pos) +{ + struct cachefiles_req *req; + struct cachefiles_msg *msg; + unsigned long id = 0; + size_t n; + int ret = 0; + struct radix_tree_iter iter; + void **slot; + + /* + * Search for a request that has not ever been processed, to prevent + * requests from being processed repeatedly. + */ + xa_lock(&cache->reqs); + radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, + CACHEFILES_REQ_NEW) { + req = radix_tree_deref_slot_protected(slot, + &cache->reqs.xa_lock); + + msg = &req->msg; + n = msg->len; + + if (n > buflen) { + xa_unlock(&cache->reqs); + return -EMSGSIZE; + } + + radix_tree_iter_tag_clear(&cache->reqs, &iter, + CACHEFILES_REQ_NEW); + xa_unlock(&cache->reqs); + + id = iter.index; + msg->msg_id = id; + + if (msg->opcode == CACHEFILES_OP_OPEN) { + ret = cachefiles_ondemand_get_fd(req); + if (ret) + goto error; + } + + if (copy_to_user(_buffer, msg, n) != 0) { + ret = -EFAULT; + goto err_put_fd; + } + return n; + } + xa_unlock(&cache->reqs); + return 0; + +err_put_fd: + if (msg->opcode == CACHEFILES_OP_OPEN) + __close_fd(current->files, + ((struct cachefiles_open *)msg->data)->fd); +error: + xa_lock(&cache->reqs); + radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs); + req->error = ret; + complete(&req->done); + return ret; +} + +typedef int (*init_req_fn)(struct cachefiles_req *req, void *private); + +static int cachefiles_ondemand_send_req(struct cachefiles_object *object, + enum cachefiles_opcode opcode, + size_t data_len, + init_req_fn init_req, + void *private) +{ + static atomic64_t global_index = ATOMIC64_INIT(0); + struct cachefiles_cache *cache; + struct cachefiles_req *req; + long id; + int ret; + + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return 0; + + if (test_bit(CACHEFILES_DEAD, &cache->flags)) + return -EIO; + + req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->object = object; + init_completion(&req->done); + req->msg.opcode = opcode; + req->msg.len = sizeof(struct cachefiles_msg) + data_len; + + ret = init_req(req, private); + if (ret) + goto out; + + /* + * Stop enqueuing the request when daemon is dying. The + * following two operations need to be atomic as a whole. + * 1) check cache state, and + * 2) enqueue request if cache is alive. + * Otherwise the request may be enqueued after xarray has been + * flushed, leaving the orphan request never being completed. + * + * CPU 1 CPU 2 + * ===== ===== + * test CACHEFILES_DEAD bit + * set CACHEFILES_DEAD bit + * flush requests in the xarray + * enqueue the request + */ + xa_lock(&cache->reqs); + + if (test_bit(CACHEFILES_DEAD, &cache->flags)) { + xa_unlock(&cache->reqs); + ret = -EIO; + goto out; + } + + /* coupled with the barrier in cachefiles_flush_reqs() */ + smp_mb(); + + while (radix_tree_insert(&cache->reqs, + id = atomic64_read(&global_index), req)) + atomic64_inc(&global_index); + + radix_tree_tag_set(&cache->reqs, id, CACHEFILES_REQ_NEW); + xa_unlock(&cache->reqs); + + wake_up_all(&cache->daemon_pollwq); + wait_for_completion(&req->done); + ret = req->error; +out: + kfree(req); + return ret; +} + +static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + struct fscache_cookie *cookie = object->fscache.cookie; + struct fscache_cookie *volume; + struct cachefiles_open *load = (void *)req->msg.data; + size_t volume_key_size, cookie_key_size; + char *cookie_key, *volume_key; + + /* Cookie key is binary data, which is netfs specific. */ + cookie_key_size = cookie->key_len; + if (cookie->key_len <= sizeof(cookie->inline_key)) + cookie_key = cookie->inline_key; + else + cookie_key = cookie->key; + + volume = object->fscache.parent->cookie; + volume_key_size = volume->key_len + 1; + if (volume_key_size <= sizeof(cookie->inline_key)) + volume_key = volume->inline_key; + else + volume_key = volume->key; + + load->volume_key_size = volume_key_size; + load->cookie_key_size = cookie_key_size; + memcpy(load->data, volume_key, volume->key_len); + load->data[volume_key_size - 1] = '\0'; + memcpy(load->data + volume_key_size, cookie_key, cookie_key_size); + return 0; +} + +int cachefiles_ondemand_init_object(struct cachefiles_object *object) +{ + struct fscache_cookie *cookie = object->fscache.cookie; + size_t volume_key_size, cookie_key_size, data_len; + + /* + * CacheFiles will firstly check the cache file under the root cache + * directory. If the coherency check failed, it will fallback to + * creating a new tmpfile as the cache file. Reuse the previously + * allocated object ID if any. + */ + if (object->ondemand_id > 0 || object->type == FSCACHE_COOKIE_TYPE_INDEX) + return 0; + + volume_key_size = object->fscache.parent->cookie->key_len + 1; + cookie_key_size = cookie->key_len; + data_len = sizeof(struct cachefiles_open) + volume_key_size + cookie_key_size; + + return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN, + data_len, cachefiles_ondemand_init_open_req, NULL); +} diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index 5d9de24cb9c0..d09e369e9d1e 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -21,6 +21,8 @@ enum cachefiles_obj_ref_trace { cachefiles_obj_put_wait_retry = fscache_obj_ref__nr_traces, cachefiles_obj_put_wait_timeo, + cachefiles_obj_get_ondemand_fd, + cachefiles_obj_put_ondemand_fd, cachefiles_obj_ref__nr_traces };
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h new file mode 100644 index 000000000000..78caa73e5343 --- /dev/null +++ b/include/uapi/linux/cachefiles.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _LINUX_CACHEFILES_H +#define _LINUX_CACHEFILES_H + +#include <linux/types.h> +#include <linux/ioctl.h> + +/* + * Fscache ensures that the maximum length of cookie key is 255. The volume key + * is controlled by netfs, and generally no bigger than 255. + */ +#define CACHEFILES_MSG_MAX_SIZE 1024 + +enum cachefiles_opcode { + CACHEFILES_OP_OPEN, + CACHEFILES_OP_CLOSE, + CACHEFILES_OP_READ, +}; + +/* + * Message Header + * + * @msg_id a unique ID identifying this message + * @opcode message type, CACHEFILE_OP_* + * @len message length, including message header and following data + * @object_id a unique ID identifying a cache file + * @data message type specific payload + */ +struct cachefiles_msg { + __u32 msg_id; + __u32 opcode; + __u32 len; + __u32 object_id; + __u8 data[]; +}; + +/* + * @data contains the volume_key followed directly by the cookie_key. volume_key + * is a NUL-terminated string; @volume_key_size indicates the size of the volume + * key in bytes. cookie_key is binary data, which is netfs specific; + * @cookie_key_size indicates the size of the cookie key in bytes. + * + * @fd identifies an anon_fd referring to the cache file. + */ +struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; +}; + +/* + * @off indicates the starting offset of the requested file range + * @len indicates the length of the requested file range + */ +struct cachefiles_read { + __u64 off; + __u64 len; +}; + +/* + * Reply for READ request + * @arg for this ioctl is the @id field of READ request. + */ +#define CACHEFILES_IOC_READ_COMPLETE _IOW(0x98, 1, int) + +#endif diff --git a/lib/radix-tree.c b/lib/radix-tree.c index cbc691525236..28145bdf6f3f 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1059,6 +1059,7 @@ void radix_tree_iter_tag_clear(struct radix_tree_root *root, { node_tag_clear(root, iter->node, tag, iter_offset(iter)); } +EXPORT_SYMBOL(radix_tree_iter_tag_clear);
/** * radix_tree_tag_get - get a tag on a radix tree node
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/419c7d84a2d2
--------------------------------
ANBZ: #1666
commit d11b0b043b4008d64abaf1a26eea3dbcd906ee59 upstream.
Add a refcount to avoid the deadlock in on-demand read mode. The on-demand read mode will pin the corresponding cachefiles object for each anonymous fd. The cachefiles object is unpinned when the anonymous fd gets closed. When the user daemon exits and the fd of "/dev/cachefiles" device node gets closed, it will wait for all cahcefiles objects getting withdrawn. Then if there's any anonymous fd getting closed after the fd of the device node, the user daemon will hang forever, waiting for all objects getting withdrawn.
To fix this, add a refcount indicating if there's any object pinned by anonymous fds. The cachefiles cache gets unbound and withdrawn when the refcount is decreased to 0. It won't change the behaviour of the original mode, in which case the cachefiles cache gets unbound and withdrawn as long as the fd of the device node gets closed.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220509074028.74954-4-jefflexu@linux.alibaba.com Acked-by: David Howells dhowells@redhat.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 23 +++++++++++++++++------ fs/cachefiles/internal.h | 3 +++ fs/cachefiles/ondemand.c | 3 +++ 3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 1c85c8dcc0c4..1a3a9fda60c8 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -108,6 +108,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) cache->active_nodes = RB_ROOT; rwlock_init(&cache->active_lock); init_waitqueue_head(&cache->daemon_pollwq); + refcount_set(&cache->unbind_pincount, 1);
INIT_RADIX_TREE(&cache->reqs, GFP_ATOMIC); idr_init(&cache->ondemand_ids); @@ -167,6 +168,21 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache) xa_unlock(&cache->ondemand_ids.idr_rt); }
+void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache) +{ + if (refcount_dec_and_test(&cache->unbind_pincount)) { + cachefiles_daemon_unbind(cache); + ASSERT(!cache->active_nodes.rb_node); + cachefiles_open = 0; + kfree(cache); + } +} + +void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache) +{ + refcount_inc(&cache->unbind_pincount); +} + /* * release a cache */ @@ -182,17 +198,12 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
if (cachefiles_in_ondemand_mode(cache)) cachefiles_flush_reqs(cache); - cachefiles_daemon_unbind(cache); - - ASSERT(!cache->active_nodes.rb_node);
/* clean up the control file interface */ cache->cachefilesd = NULL; file->private_data = NULL; - cachefiles_open = 0; - - kfree(cache);
+ cachefiles_put_unbind_pincount(cache); _leave(""); return 0; } diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index fed71ad90808..52188b42081f 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -95,6 +95,7 @@ struct cachefiles_cache { char *rootdirname; /* name of cache root directory */ char *secctx; /* LSM security context */ char *tag; /* cache binding tag */ + refcount_t unbind_pincount;/* refcount to do daemon unbind */ struct radix_tree_root reqs; /* xarray of pending on-demand requests */ struct idr ondemand_ids; /* xarray for ondemand_id allocation */ u32 ondemand_id_next; @@ -167,6 +168,8 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache); * daemon.c */ extern const struct file_operations cachefiles_daemon_fops; +extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache); +extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
extern int cachefiles_has_space(struct cachefiles_cache *cache, unsigned fnr, unsigned bnr); diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index ff2f00cfa5d4..d134f22e3818 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -21,6 +21,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, xa_unlock(&cache->ondemand_ids.idr_rt); object->fscache.cache->ops->put_object(&object->fscache, cachefiles_obj_put_ondemand_fd); + cachefiles_put_unbind_pincount(cache); return 0; }
@@ -178,6 +179,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) load->fd = fd; req->msg.object_id = object_id; object->ondemand_id = object_id; + + cachefiles_get_unbind_pincount(cache); return 0;
err_put_fd:
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d665b925541c
--------------------------------
ANBZ: #1666
commit 324b954ac80cff0d11ddb6bde9b6631e45e98620 upstream.
Notify the user daemon that cookie is going to be withdrawn, providing a hint that the associated anonymous fd can be closed.
Be noted that this is only a hint. The user daemon may close the associated anonymous fd when receiving the CLOSE request, then it will receive another anonymous fd when the cookie gets looked up. Or it may ignore the CLOSE request, and keep writing data through the anonymous fd. However the next time the cookie gets looked up, the user daemon will still receive another new anonymous fd.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Acked-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20220425122143.56815-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 2 ++ fs/cachefiles/internal.h | 5 +++++ fs/cachefiles/ondemand.c | 40 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 4cea5fbf695e..99f50bc59b3c 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -280,6 +280,8 @@ static void cachefiles_drop_object(struct fscache_object *_object) ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000); #endif
+ cachefiles_ondemand_clean_object(object); + /* We need to tidy the object up if we did in fact manage to open it. * It's possible for us to get here before the object is fully * initialised if the parent goes away or the object gets retired diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 52188b42081f..c09a405ef42b 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -257,6 +257,7 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args);
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); +extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
#else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, @@ -269,6 +270,10 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje { return 0; } + +static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object) +{ +} #endif
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index d134f22e3818..26bdcbba92c5 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -241,6 +241,14 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, ret = -EFAULT; goto err_put_fd; } + + /* CLOSE request has no reply */ + if (msg->opcode == CACHEFILES_OP_CLOSE) { + xa_lock(&cache->reqs); + radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs); + complete(&req->done); + } return n; } xa_unlock(&cache->reqs); @@ -321,6 +329,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, /* coupled with the barrier in cachefiles_flush_reqs() */ smp_mb();
+ if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) { + WARN_ON_ONCE(object->ondemand_id == 0); + xa_unlock(&cache->reqs); + ret = -EIO; + goto out; + } + while (radix_tree_insert(&cache->reqs, id = atomic64_read(&global_index), req)) atomic64_inc(&global_index); @@ -368,6 +383,25 @@ static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req, return 0; }
+static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + int object_id = object->ondemand_id; + + /* + * It's possible that object id is still 0 if the cookie looking up + * phase failed before OPEN request has ever been sent. Also avoid + * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means + * anon_fd has already been closed. + */ + if (object_id <= 0) + return -ENOENT; + + req->msg.object_id = object_id; + return 0; +} + int cachefiles_ondemand_init_object(struct cachefiles_object *object) { struct fscache_cookie *cookie = object->fscache.cookie; @@ -389,3 +423,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object) return cachefiles_ondemand_send_req(object, CACHEFILES_OP_OPEN, data_len, cachefiles_ondemand_init_open_req, NULL); } + +void cachefiles_ondemand_clean_object(struct cachefiles_object *object) +{ + cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, + cachefiles_ondemand_init_close_req, NULL); +}
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/04adba02e0fb
--------------------------------
ANBZ: #1666
commit 9032b6e8589f269743984aac53e82e4835be16dc upstream.
Implement the data plane of on-demand read mode.
The early implementation [1] place the entry to cachefiles_ondemand_read() in fscache_read(). However, fscache_read() can only detect if the requested file range is fully cache miss, whilst we need to notify the user daemon as long as there's a hole inside the requested file range.
Thus the entry is now placed in cachefiles_prepare_read(). When working in on-demand read mode, once a hole detected, the read routine will send a READ request to the user daemon. The user daemon needs to fetch the data and write it to the cache file. After sending the READ request, the read routine will hang there, until the READ request is handled by the user daemon. Then it will retry to read from the same file range. If no progress encountered, the read routine will fail then.
A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand read should be done when a cache miss encountered.
[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.co... #v8
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Acked-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 8 +++- fs/cachefiles/ondemand.c | 86 +++++++++++++++++++++++++++++++++++ fs/cachefiles/rdwr.c | 21 +++++++-- fs/fscache/page.c | 4 +- include/linux/fscache-cache.h | 1 + include/linux/fscache.h | 18 +++++++- 6 files changed, 129 insertions(+), 9 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index c09a405ef42b..7440c3e4fd14 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -258,7 +258,8 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); - +extern int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len); #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) @@ -274,6 +275,11 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object) { } +static inline int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len) +{ + return -EOPNOTSUPP; +} #endif
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 26bdcbba92c5..adc257b80b8e 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -11,11 +11,32 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct cachefiles_object *object = file->private_data; int object_id = object->ondemand_id; struct cachefiles_cache *cache; + void **slot; + struct radix_tree_iter iter; + struct cachefiles_req *req;
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+ xa_lock(&cache->reqs); object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + /* + * Flush all pending READ requests since their completion depends on + * anon_fd. + */ + radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { + req = radix_tree_deref_slot_protected(slot, + &cache->reqs.xa_lock); + BUG_ON(!req); + + if (req->msg.opcode == CACHEFILES_OP_READ) { + req->error = -EIO; + complete(&req->done); + radix_tree_iter_delete(&cache->reqs, &iter, slot); + } + } + xa_unlock(&cache->reqs); + xa_lock(&cache->ondemand_ids.idr_rt); idr_remove(&cache->ondemand_ids, object_id); xa_unlock(&cache->ondemand_ids.idr_rt); @@ -58,10 +79,39 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, return len; }
+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg) +{ + struct cachefiles_object *object = filp->private_data; + struct cachefiles_cache *cache; + struct cachefiles_req *req; + unsigned long id; + + if (ioctl != CACHEFILES_IOC_READ_COMPLETE) + return -EINVAL; + + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return -EOPNOTSUPP; + + id = arg; + xa_lock(&cache->reqs); + req = radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs); + if (!req) + return -EINVAL; + + complete(&req->done); + return 0; +} + static const struct file_operations cachefiles_ondemand_fd_fops = { .owner = THIS_MODULE, .release = cachefiles_ondemand_fd_release, .write_iter = cachefiles_ondemand_fd_write_iter, + .unlocked_ioctl = cachefiles_ondemand_fd_ioctl, };
/* @@ -402,6 +452,32 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, return 0; }
+struct cachefiles_read_ctx { + loff_t off; + size_t len; +}; + +static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, + void *private) +{ + struct cachefiles_object *object = req->object; + struct cachefiles_read *load = (void *)req->msg.data; + struct cachefiles_read_ctx *read_ctx = private; + int object_id = object->ondemand_id; + + /* Stop enqueuing requests when daemon has closed anon_fd. */ + if (object_id <= 0) { + WARN_ON_ONCE(object_id == 0); + pr_info_once("READ: anonymous fd closed prematurely.\n"); + return -EIO; + } + + req->msg.object_id = object_id; + load->off = read_ctx->off; + load->len = read_ctx->len; + return 0; +} + int cachefiles_ondemand_init_object(struct cachefiles_object *object) { struct fscache_cookie *cookie = object->fscache.cookie; @@ -429,3 +505,13 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object) cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, cachefiles_ondemand_init_close_req, NULL); } + +int cachefiles_ondemand_read(struct cachefiles_object *object, + loff_t pos, size_t len) +{ + struct cachefiles_read_ctx read_ctx = {pos, len}; + + return cachefiles_ondemand_send_req(object, CACHEFILES_OP_READ, + sizeof(struct cachefiles_read), + cachefiles_ondemand_init_read_req, &read_ctx); +} diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 8ffc40e84a59..7cfbbeee9e87 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -233,12 +233,13 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, struct cachefiles_one_read *monitor; struct address_space *bmapping; struct page *newpage, *backpage; + pgoff_t index = op->offset >> PAGE_SHIFT; int ret;
_enter("");
_debug("read back %p{%lu,%d}", - netpage, netpage->index, page_count(netpage)); + netpage, index, page_count(netpage));
monitor = kzalloc(sizeof(*monitor), cachefiles_gfp); if (!monitor) @@ -254,7 +255,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, newpage = NULL;
for (;;) { - backpage = find_get_page(bmapping, netpage->index); + backpage = find_get_page(bmapping, index); if (backpage) goto backing_page_already_present;
@@ -265,7 +266,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, }
ret = add_to_page_cache_lru(newpage, bmapping, - netpage->index, cachefiles_gfp); + index, cachefiles_gfp); if (ret == 0) goto installed_new_backing_page; if (ret != -EEXIST) @@ -399,6 +400,8 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, sector_t block; unsigned shift; int ret, ret2; + bool again = true; + loff_t pos = op->offset;
object = container_of(op->op.object, struct cachefiles_object, fscache); @@ -426,14 +429,15 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, * enough for this as it doesn't indicate errors, but it's all we've * got for the moment */ - block = page->index; +retry: + block = pos >> PAGE_SHIFT; block <<= shift;
ret2 = bmap(inode, &block); ASSERT(ret2 == 0);
_debug("%llx -> %llx", - (unsigned long long) (page->index << shift), + (unsigned long long) (pos >> PAGE_SHIFT << shift), (unsigned long long) block);
if (block) { @@ -441,6 +445,13 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, * read from disk */ ret = cachefiles_read_backing_file_one(object, op, page); } else if (cachefiles_has_space(cache, 0, 1) == 0) { + if (cachefiles_in_ondemand_mode(cache) && again) { + ret = cachefiles_ondemand_read(object, pos, PAGE_SIZE); + if (!ret) { + again = false; + goto retry; + } + } /* there's space in the cache we can use */ fscache_mark_page_cached(op, page); fscache_retrieval_complete(op, 1); diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 26af6fdf1538..888ace2cc6e1 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -430,7 +430,7 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, struct page *page, fscache_rw_complete_t end_io_func, void *context, - gfp_t gfp) + gfp_t gfp, loff_t pos) { struct fscache_retrieval *op; struct fscache_object *object; @@ -493,6 +493,8 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, if (ret < 0) goto error;
+ op->offset = pos; + /* ask the cache to honour the operation */ if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) { fscache_stat(&fscache_n_cop_allocate_page); diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 3f0b19dcfae7..71ee23f78f1d 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -149,6 +149,7 @@ struct fscache_retrieval { struct list_head to_do; /* list of things to be done by the backend */ unsigned long start_time; /* time at which retrieval started */ atomic_t n_pages; /* number of pages to be retrieved */ + loff_t offset; };
typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op, diff --git a/include/linux/fscache.h b/include/linux/fscache.h index a1c928fe98e7..ce51b915ad43 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -204,7 +204,7 @@ extern int __fscache_read_or_alloc_page(struct fscache_cookie *, struct page *, fscache_rw_complete_t, void *, - gfp_t); + gfp_t, loff_t); extern int __fscache_read_or_alloc_pages(struct fscache_cookie *, struct address_space *, struct list_head *, @@ -545,7 +545,21 @@ int fscache_read_or_alloc_page(struct fscache_cookie *cookie, { if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie)) return __fscache_read_or_alloc_page(cookie, page, end_io_func, - context, gfp); + context, gfp, page_offset(page)); + else + return -ENOBUFS; +} + +static inline +int fscache_read_or_alloc_page2(struct fscache_cookie *cookie, + struct page *page, + fscache_rw_complete_t end_io_func, + void *context, + gfp_t gfp, loff_t pos) +{ + if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie)) + return __fscache_read_or_alloc_page(cookie, page, end_io_func, + context, gfp, pos); else return -ENOBUFS; }
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/301da8953886
--------------------------------
ANBZ: #1666
commit 4e4f1788af0e477bca079e5b1ffc42846b3bafee upstream.
Enable on-demand read mode by adding an optional parameter to the "bind" command.
On-demand mode will be turned on when this parameter is "ondemand", i.e. "bind ondemand". Otherwise cachefiles will work in the original mode.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220509074028.74954-7-jefflexu@linux.alibaba.com Acked-by: David Howells dhowells@redhat.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/bind.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c index 3b39552c2365..c149d5037cc1 100644 --- a/fs/cachefiles/bind.c +++ b/fs/cachefiles/bind.c @@ -46,11 +46,6 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args) cache->bcull_percent < cache->brun_percent && cache->brun_percent < 100);
- if (*args) { - pr_err("'bind' command doesn't take an argument\n"); - return -EINVAL; - } - if (!cache->rootdirname) { pr_err("No cache directory specified\n"); return -EINVAL; @@ -62,6 +57,18 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args) return -EBUSY; }
+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND)) { + if (!strcmp(args, "ondemand")) { + set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); + } else if (*args) { + pr_err("Invalid argument to the 'bind' command\n"); + return -EINVAL; + } + } else if (*args) { + pr_err("'bind' command doesn't take an argument\n"); + return -EINVAL; + } + /* make sure we have copies of the tag and dirname strings */ if (!cache->tag) { /* the tag string is released by the fops->release()
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/a2ad84f77271
--------------------------------
ANBZ: #1666
object->fscache.parent->n_children will be decreased and then object->fscache.parent will be set to NULL in fscache_relinquish_cookie(). Thus it is reasonable that object->fscache.parent->n_children shall not be zero when object->fscache.parent is non-NULL.
While in on-demand mode, each anon_fd also holds one reference to the corresponding object. The anon_fd can be closed asynchronously with fscache_relinquish_cookie(), in which case the following race can be triggered.
user daemon erofs umount cookie state machine worker =========== ============ ============== close(anon_fd) umount fscache_relinquish_cookie cachefiles_ondemand_fd_release cachefiles_put_object fscache_drop_object # check object->fscache.parent # decrease object->parent->n_children # object->parent = NULL # check object->fscache.parent->n_children
To fix this, simply skip the ASSERT in cachefiles_put_object() in on-demand mode.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 99f50bc59b3c..80a241638452 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -339,8 +339,11 @@ static void cachefiles_put_object(struct fscache_object *_object, ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000); #endif
- ASSERTIFCMP(object->fscache.parent, - object->fscache.parent->n_children, >, 0); + if (!cachefiles_in_ondemand_mode(container_of(object->fscache.cache, + struct cachefiles_cache, cache))) { + ASSERTIFCMP(object->fscache.parent, + object->fscache.parent->n_children, >, 0); + }
u = atomic_dec_return(&object->usage); trace_cachefiles_ref(object, _object->cookie,
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/7bcc5ad5e2ed
--------------------------------
ANBZ: #1722
commit 65aa5f6fd8a12e0a343aaf1815949a79a49e3f35 upstream.
When an anonymous fd is released, only flush the requests associated with it, rather than all of requests in xarray.
Fixes: 9032b6e8589f ("cachefiles: implement on-demand read") Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-June/006937.html Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/596 Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index adc257b80b8e..ad3330f9e3d1 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -29,7 +29,8 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, &cache->reqs.xa_lock); BUG_ON(!req);
- if (req->msg.opcode == CACHEFILES_OP_READ) { + if (req->msg.object_id == object_id && + req->msg.opcode == CACHEFILES_OP_READ) { req->error = -EIO; complete(&req->done); radix_tree_iter_delete(&cache->reqs, &iter, slot);
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/2d8af2af6a59
--------------------------------
ANBZ: #3211
BUG_ON() is extremely user unfriendly. Keep moving forward if things wern't so bad.
Fixes: 8fc28945e193 ("cachefiles: notify the user daemon when looking up cookie") Fixes: e0f54fb64c0a ("cachefiles: implement on-demand read") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 3 ++- fs/cachefiles/ondemand.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 1a3a9fda60c8..b531373400d7 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -156,7 +156,8 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache) radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); - BUG_ON(!req); + if (WARN_ON(!req)) + continue; radix_tree_delete(&cache->reqs, iter.index); req->error = -EIO; complete(&req->done); diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index ad3330f9e3d1..ee0d283ac863 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -27,8 +27,8 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); - BUG_ON(!req); - + if (WARN_ON(!req)) + continue; if (req->msg.object_id == object_id && req->msg.opcode == CACHEFILES_OP_READ) { req->error = -EIO;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/9342646eeaf7
--------------------------------
ANBZ: #3211
Prior to fscache refactoring, the volume key in fscache_volume is a string with trailing NUL; while after refactoring, the volume key in fscache_cookie is actually a string without trailing NUL.
Thus the current volume key setup for cachefiles_open may cause oops since it attempts to access volume key from the bad address. This can be reproduced by specifying "fsid" with 10 characters, e.g. "-o fsid=abigdomain".
Fix this by determining if volume key is stored in volume->key or volume->inline_key by checking volume->key_len, rather than volume_key_size (which is actually volume->key_len plus 1).
Reported-by: Jia Zhu zhujia.zj@bytedance.com Fixes: 8fc28945e193 ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 14 ++++++++++---- include/uapi/linux/cachefiles.h | 5 +++-- 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index ee0d283ac863..0ea22ae2f96a 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -407,21 +407,27 @@ static int cachefiles_ondemand_init_open_req(struct cachefiles_req *req, { struct cachefiles_object *object = req->object; struct fscache_cookie *cookie = object->fscache.cookie; - struct fscache_cookie *volume; + struct fscache_cookie *volume = object->fscache.parent->cookie; struct cachefiles_open *load = (void *)req->msg.data; size_t volume_key_size, cookie_key_size; char *cookie_key, *volume_key;
- /* Cookie key is binary data, which is netfs specific. */ + /* + * cookie_key is a string without trailing '\0', while cachefiles_open + * expects cookie key a string without trailing '\0'. + */ cookie_key_size = cookie->key_len; if (cookie->key_len <= sizeof(cookie->inline_key)) cookie_key = cookie->inline_key; else cookie_key = cookie->key;
- volume = object->fscache.parent->cookie; + /* + * volume_key is a string without trailing '\0', while cachefiles_open + * expects volume key a string with trailing '\0'. + */ volume_key_size = volume->key_len + 1; - if (volume_key_size <= sizeof(cookie->inline_key)) + if (volume->key_len <= sizeof(volume->inline_key)) volume_key = volume->inline_key; else volume_key = volume->key; diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h index 78caa73e5343..b6746a2fe57c 100644 --- a/include/uapi/linux/cachefiles.h +++ b/include/uapi/linux/cachefiles.h @@ -37,8 +37,9 @@ struct cachefiles_msg { /* * @data contains the volume_key followed directly by the cookie_key. volume_key * is a NUL-terminated string; @volume_key_size indicates the size of the volume - * key in bytes. cookie_key is binary data, which is netfs specific; - * @cookie_key_size indicates the size of the cookie key in bytes. + * key in bytes (with trailing NUL). cookie_key is a string without trailing + * NUL; @cookie_key_size indicates the size of the cookie key in bytes (without + * trailing NUL). * * @fd identifies an anon_fd referring to the cache file. */
From: Sun Ke sunke32@huawei.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/bccd7771562b
--------------------------------
ANBZ: #3211
commit c93ccd63b18c8d108c57b2bb0e5f3b058b9d2029 upstream.
The cache_size field of copen is specified by the user daemon. If cache_size < 0, then the OPEN request is expected to fail, while copen itself shall succeed. However, returning 0 is indeed unexpected when cache_size is an invalid error code.
Fix this by returning error when cache_size is an invalid error code.
Changes ======= v4: update the code suggested by Dan v3: update the commit log suggested by Jingbo.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Sun Ke sunke32@huawei.com Suggested-by: Jeffle Xu jefflexu@linux.alibaba.com Suggested-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/20220818111935.1683062-1-sunke32@huawei.com/ # v2 Link: https://lore.kernel.org/r/20220818125038.2247720-1-sunke32@huawei.com/ # v3 Link: https://lore.kernel.org/r/20220826023515.3437469-1-sunke32@huawei.com/ # v4 Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 0ea22ae2f96a..3095519841ef 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -166,9 +166,13 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
/* fail OPEN request if daemon reports an error */ if (size < 0) { - if (!IS_ERR_VALUE(size)) - size = -EINVAL; - req->error = size; + if (!IS_ERR_VALUE(size)) { + req->error = -EINVAL; + ret = -EINVAL; + } else { + req->error = size; + ret = 0; + } goto out; }
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/732d3a3bc657
--------------------------------
ANBZ: #3211
Refactor the code arrangement of cachefiles_ondemand_daemon_read() a bit to make it more consistent with that of upstream.
This is in prep for the following fix which makes on-demand request distribution fairer.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 68 +++++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 32 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 3095519841ef..68b8c2a4e6a8 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -253,7 +253,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) { - struct cachefiles_req *req; + struct cachefiles_req *req = NULL; struct cachefiles_msg *msg; unsigned long id = 0; size_t n; @@ -268,46 +268,50 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, xa_lock(&cache->reqs); radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, CACHEFILES_REQ_NEW) { - req = radix_tree_deref_slot_protected(slot, - &cache->reqs.xa_lock); + req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); + WARN_ON(!req); + break; + }
- msg = &req->msg; - n = msg->len; + /* no request tagged with CACHEFILES_REQ_NEW found */ + if (!req) { + xa_unlock(&cache->reqs); + return 0; + }
- if (n > buflen) { - xa_unlock(&cache->reqs); - return -EMSGSIZE; - } + msg = &req->msg; + n = msg->len;
- radix_tree_iter_tag_clear(&cache->reqs, &iter, - CACHEFILES_REQ_NEW); + if (n > buflen) { xa_unlock(&cache->reqs); + return -EMSGSIZE; + }
- id = iter.index; - msg->msg_id = id; + radix_tree_iter_tag_clear(&cache->reqs, &iter, CACHEFILES_REQ_NEW); + xa_unlock(&cache->reqs);
- if (msg->opcode == CACHEFILES_OP_OPEN) { - ret = cachefiles_ondemand_get_fd(req); - if (ret) - goto error; - } + id = iter.index; + msg->msg_id = id;
- if (copy_to_user(_buffer, msg, n) != 0) { - ret = -EFAULT; - goto err_put_fd; - } + if (msg->opcode == CACHEFILES_OP_OPEN) { + ret = cachefiles_ondemand_get_fd(req); + if (ret) + goto error; + }
- /* CLOSE request has no reply */ - if (msg->opcode == CACHEFILES_OP_CLOSE) { - xa_lock(&cache->reqs); - radix_tree_delete(&cache->reqs, id); - xa_unlock(&cache->reqs); - complete(&req->done); - } - return n; + if (copy_to_user(_buffer, msg, n) != 0) { + ret = -EFAULT; + goto err_put_fd; } - xa_unlock(&cache->reqs); - return 0; + + /* CLOSE request has no reply */ + if (msg->opcode == CACHEFILES_OP_CLOSE) { + xa_lock(&cache->reqs); + radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs); + complete(&req->done); + } + return n;
err_put_fd: if (msg->opcode == CACHEFILES_OP_OPEN)
From: Xin Yin yinxin.x@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/5ef14211857f
--------------------------------
ANBZ: #3211
commit 1122f40072731525c06b1371cfa30112b9b54d27 upstream.
For now, enqueuing and dequeuing on-demand requests all start from idx 0, this makes request distribution unfair. In the weighty concurrent I/O scenario, the request stored in higher idx will starve.
Searching requests cyclically in cachefiles_ondemand_daemon_read, makes distribution fairer.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Reported-by: Yongqing Li liyongqing@bytedance.com Signed-off-by: Xin Yin yinxin.x@bytedance.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/ # v1 Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/ # v2 Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 19 ++++++++++++++++--- 2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 7440c3e4fd14..cff061f208f6 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -97,6 +97,7 @@ struct cachefiles_cache { char *tag; /* cache binding tag */ refcount_t unbind_pincount;/* refcount to do daemon unbind */ struct radix_tree_root reqs; /* xarray of pending on-demand requests */ + unsigned long req_id_next; struct idr ondemand_ids; /* xarray for ondemand_id allocation */ u32 ondemand_id_next; }; diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 68b8c2a4e6a8..bebc2f8627d8 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -262,17 +262,29 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, void **slot;
/* - * Search for a request that has not ever been processed, to prevent - * requests from being processed repeatedly. + * Cyclically search for a request that has not ever been processed, + * to prevent requests from being processed repeatedly, and make + * request distribution fair. */ xa_lock(&cache->reqs); - radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, + radix_tree_for_each_tagged(slot, &cache->reqs, &iter, cache->req_id_next, CACHEFILES_REQ_NEW) { req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); WARN_ON(!req); break; }
+ if (!req && cache->req_id_next > 0) { + radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, + CACHEFILES_REQ_NEW) { + if (iter.index >= cache->req_id_next) + break; + req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); + WARN_ON(!req); + break; + } + } + /* no request tagged with CACHEFILES_REQ_NEW found */ if (!req) { xa_unlock(&cache->reqs); @@ -288,6 +300,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, }
radix_tree_iter_tag_clear(&cache->reqs, &iter, CACHEFILES_REQ_NEW); + cache->req_id_next = iter.index + 1; xa_unlock(&cache->reqs);
id = iter.index;
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d6d0edd3e332
--------------------------------
ANBZ: #1666
commit 94d78946704f7facd010b9dee5e158921ab37398 upstream.
... so that it can be used in the following introduced fscache mode.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-10-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 4 ++-- fs/erofs/internal.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index ac331d1c2e2f..1cfb2d510a30 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -135,8 +135,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode, return 0; }
-static int erofs_map_blocks(struct inode *inode, - struct erofs_map_blocks *map, int flags) +int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags) { struct super_block *sb = inode->i_sb; struct erofs_inode *vi = EROFS_I(inode); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index efaa98a84508..7582a711e0f8 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -389,6 +389,8 @@ void erofs_put_metabuf(struct erofs_buf *buf); void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, erofs_blk_t blkaddr, enum erofs_kmap_type type); int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev); +int erofs_map_blocks(struct inode *inode, + struct erofs_map_blocks *map, int flags);
/* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/6ca377e2469d
--------------------------------
ANBZ: #1666
commit 93b856bb5f66ae149ed91876b17c8c3fca576615 upstream.
Until then erofs is exactly blockdev based filesystem.
A new fscache-based mode is going to be introduced for erofs to support scenarios where on-demand read semantics is needed, e.g. container image distribution. In this case, erofs could be mounted from data blobs through fscache.
Add a helper checking which mode erofs works in, and twist the code in preparation for the upcoming fscache mode.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-11-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/internal.h | 5 +++++ fs/erofs/super.c | 36 +++++++++++++++++++++++------------- 2 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 7582a711e0f8..713f120ac49d 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -131,6 +131,11 @@ struct erofs_sb_info { #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option) #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)
+static inline bool erofs_is_fscache_mode(struct super_block *sb) +{ + return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev; +} + enum { EROFS_ZIP_CACHE_DISABLED, EROFS_ZIP_CACHE_READAHEAD, diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 33a223f908a2..5a891211c3f8 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -163,14 +163,16 @@ static int erofs_init_devices(struct super_block *sb, } dis = ptr + erofs_blkoff(pos);
- bdev = blkdev_get_by_path(dif->path, - FMODE_READ | FMODE_EXCL, - sb->s_type); - if (IS_ERR(bdev)) { - err = PTR_ERR(bdev); - goto err_out; + if (!erofs_is_fscache_mode(sb)) { + bdev = blkdev_get_by_path(dif->path, + FMODE_READ | FMODE_EXCL, + sb->s_type); + if (IS_ERR(bdev)) { + err = PTR_ERR(bdev); + goto err_out; + } + dif->bdev = bdev; } - dif->bdev = bdev; dif->blocks = le32_to_cpu(dis->blocks); dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr); sbi->total_blocks += dif->blocks; @@ -426,11 +428,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_magic = EROFS_SUPER_MAGIC;
- if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { - erofs_err(sb, "failed to set erofs blksize"); - return -EINVAL; - } - sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) return -ENOMEM; @@ -440,6 +437,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sbi->devs = ctx->devs; ctx->devs = NULL;
+ if (erofs_is_fscache_mode(sb)) { + sb->s_blocksize = EROFS_BLKSIZ; + sb->s_blocksize_bits = LOG_BLOCK_SIZE; + } else { + if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { + erofs_err(sb, "failed to set erofs blksize"); + return -EINVAL; + } + } + err = erofs_read_superblock(sb); if (err) return err; @@ -663,7 +670,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct erofs_sb_info *sbi = EROFS_SB(sb); - u64 id = huge_encode_dev(sb->s_bdev->bd_dev); + u64 id = 0; + + if (!erofs_is_fscache_mode(sb)) + id = huge_encode_dev(sb->s_bdev->bd_dev);
buf->f_type = sb->s_magic; buf->f_bsize = EROFS_BLKSIZ;
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/35a08cf3d012
--------------------------------
ANBZ: #1666
commit c6be2bd0a5dd91f98d6b5d2df2c79bc32993352c upstream.
A new fscache based mode is going to be introduced for erofs, in which case on-demand read semantics is implemented through fscache.
As the first step, register fscache volume for each erofs filesystem. That means, data blobs can not be shared among erofs filesystems. In the following iteration, we are going to introduce the domain semantics, in which case several erofs filesystems can belong to one domain, and data blobs can be shared among these erofs filesystems of one domain.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-12-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/Kconfig | 10 ++++++++ fs/erofs/Makefile | 2 +- fs/erofs/fscache.c | 58 +++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/internal.h | 23 ++++++++++++++++++ fs/erofs/super.c | 12 ++++++++++ 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 fs/erofs/fscache.c
diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index 4dc00a2320e4..f7222cba2a67 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -95,3 +95,13 @@ config EROFS_FS_CLUSTER_PAGE_LIMIT into 8k-unit, hard limit should not be configured less than 2. Otherwise, the image will be refused to mount on this kernel. + +config EROFS_FS_ONDEMAND + bool "EROFS fscache-based on-demand read support" + depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y) + default n + help + This permits EROFS to use fscache-backed data blobs with on-demand + read support. + + If unsure, say N. \ No newline at end of file diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile index 46f2aa4ba46c..b75be64ffc0b 100644 --- a/fs/erofs/Makefile +++ b/fs/erofs/Makefile @@ -8,4 +8,4 @@ obj-$(CONFIG_EROFS_FS) += erofs.o erofs-objs := super.o inode.o data.o namei.o dir.o utils.o erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o - +erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o \ No newline at end of file diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c new file mode 100644 index 000000000000..dbf3f1bb7588 --- /dev/null +++ b/fs/erofs/fscache.c @@ -0,0 +1,58 @@ +/* + * Copyright (C) 2022, Alibaba Cloud + */ +#include <linux/fscache.h> +#include "internal.h" + +struct fscache_netfs erofs_fscache_netfs = { + .name = "erofs", + .version = 0, +}; + +int erofs_fscache_register(void) +{ + return fscache_register_netfs(&erofs_fscache_netfs); +} + +void erofs_fscache_unregister(void) +{ + fscache_unregister_netfs(&erofs_fscache_netfs); +} + +const struct fscache_cookie_def erofs_fscache_super_index_def = { + .name = "EROFS.super", + .type = FSCACHE_COOKIE_TYPE_INDEX, + .check_aux = NULL, +}; + +int erofs_fscache_register_fs(struct super_block *sb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + struct fscache_cookie *volume; + char *name; + int ret = 0; + + name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid); + if (!name) + return -ENOMEM; + + volume = fscache_acquire_cookie(erofs_fscache_netfs.primary_index, + &erofs_fscache_super_index_def, name, strlen(name), + NULL, 0, NULL, 0, true); + if (IS_ERR_OR_NULL(volume)) { + erofs_err(sb, "failed to register volume for %s", name); + ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; + volume = NULL; + } + sbi->volume = volume; + kfree(name); + return ret; +} + +void erofs_fscache_unregister_fs(struct super_block *sb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + + fscache_relinquish_cookie(sbi->volume, NULL, false); + sbi->volume = NULL; +} diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 713f120ac49d..a70f80f3b79c 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -64,6 +64,7 @@ struct erofs_mount_opts { unsigned int max_sync_decompress_pages; #endif unsigned int mount_opt; + char *fsid; };
struct erofs_dev_context { @@ -118,6 +119,9 @@ struct erofs_sb_info { u8 volume_name[16]; /* volume name */ u32 feature_compat; u32 feature_incompat; + + /* fscache support */ + struct fscache_cookie *volume; };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) @@ -468,6 +472,25 @@ static inline int z_erofs_init_zip_subsystem(void) { return 0; } static inline void z_erofs_exit_zip_subsystem(void) {} #endif /* !CONFIG_EROFS_FS_ZIP */
+/* fscache.c */ +#ifdef CONFIG_EROFS_FS_ONDEMAND +int erofs_fscache_register(void); +void erofs_fscache_unregister(void); +int erofs_fscache_register_fs(struct super_block *sb); +void erofs_fscache_unregister_fs(struct super_block *sb); +#else +static inline int erofs_fscache_register(void) +{ + return 0; +} +static inline void erofs_fscache_unregister(void) {} +static inline int erofs_fscache_register_fs(struct super_block *sb) +{ + return 0; +} +static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} +#endif + #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#endif /* __EROFS_INTERNAL_H */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 5a891211c3f8..2b21c5efaff3 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -440,6 +440,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) if (erofs_is_fscache_mode(sb)) { sb->s_blocksize = EROFS_BLKSIZ; sb->s_blocksize_bits = LOG_BLOCK_SIZE; + + err = erofs_fscache_register_fs(sb); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize"); @@ -588,6 +592,7 @@ static void erofs_kill_sb(struct super_block *sb) if (!sbi) return; erofs_free_dev_context(sbi->devs); + erofs_fscache_unregister_fs(sb); kfree(sbi); sb->s_fs_info = NULL; } @@ -638,6 +643,10 @@ static int __init erofs_module_init(void) if (err) goto zip_err;
+ err = erofs_fscache_register(); + if (err) + goto fscache_err; + err = register_filesystem(&erofs_fs_type); if (err) goto fs_err; @@ -645,6 +654,8 @@ static int __init erofs_module_init(void) return 0;
fs_err: + erofs_fscache_unregister(); +fscache_err: z_erofs_exit_zip_subsystem(); zip_err: erofs_exit_shrinker(); @@ -657,6 +668,7 @@ static int __init erofs_module_init(void) static void __exit erofs_module_exit(void) { unregister_filesystem(&erofs_fs_type); + erofs_fscache_unregister(); z_erofs_exit_zip_subsystem(); erofs_exit_shrinker();
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/63ed7d0d74e9
--------------------------------
ANBZ: #1666
commit b02c602f065f7a09d7678dd1d8bf3d3fd10ed228 upstream.
Introduce a context structure for managing data blobs, and helper functions for initializing and cleaning up this context structure.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-13-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/internal.h | 19 ++++++++++++++++++ 2 files changed, 66 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index dbf3f1bb7588..fbda7da6ced9 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -25,6 +25,53 @@ const struct fscache_cookie_def erofs_fscache_super_index_def = { .check_aux = NULL, };
+const struct fscache_cookie_def erofs_fscache_inode_object_def = { + .name = "CIFS.uniqueid", + .type = FSCACHE_COOKIE_TYPE_DATAFILE, +}; + +int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, char *name) +{ + struct erofs_fscache *ctx; + struct fscache_cookie *cookie; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + cookie = fscache_acquire_cookie(EROFS_SB(sb)->volume, + &erofs_fscache_inode_object_def, + name, strlen(name), + NULL, 0, NULL, 0, true); + if (!cookie) { + erofs_err(sb, "failed to get cookie for %s", name); + kfree(ctx); + return -EINVAL; + } + + //fscache_use_cookie(cookie, false); + ctx->cookie = cookie; + + *fscache = ctx; + return 0; +} + +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +{ + struct erofs_fscache *ctx = *fscache; + + if (!ctx) + return; + + //fscache_unuse_cookie(ctx->cookie, NULL, NULL); + fscache_relinquish_cookie(ctx->cookie, NULL, false); + ctx->cookie = NULL; + + kfree(ctx); + *fscache = NULL; +} + int erofs_fscache_register_fs(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index a70f80f3b79c..cd89fb654a11 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -79,6 +79,10 @@ struct erofs_fs_context { struct erofs_dev_context *devs; };
+struct erofs_fscache { + struct fscache_cookie *cookie; +}; + struct erofs_sb_info { struct erofs_mount_opts opt; /* options */ #ifdef CONFIG_EROFS_FS_ZIP @@ -478,6 +482,10 @@ int erofs_fscache_register(void); void erofs_fscache_unregister(void); int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb); + +int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, char *name); +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); #else static inline int erofs_fscache_register(void) { @@ -489,6 +497,17 @@ static inline int erofs_fscache_register_fs(struct super_block *sb) return 0; } static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} + +static inline int erofs_fscache_register_cookie(struct super_block *sb, + struct erofs_fscache **fscache, + char *name) +{ + return -EOPNOTSUPP; +} + +static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +{ +} #endif
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/30d2fb2e9a8e
--------------------------------
ANBZ: #1666
commit 3c265d7dcefab21a58ca5454c0f778412bde0870 upstream.
Introduce one anonymous inode for data blobs so that erofs can cache metadata directly within such anonymous inode.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-14-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++--- fs/erofs/internal.h | 6 ++++-- 2 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index fbda7da6ced9..7659f8ae65c3 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -30,11 +30,16 @@ const struct fscache_cookie_def erofs_fscache_inode_object_def = { .type = FSCACHE_COOKIE_TYPE_DATAFILE, };
+static const struct address_space_operations erofs_fscache_meta_aops = { +}; + int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, char *name) + struct erofs_fscache **fscache, + char *name, bool need_inode) { struct erofs_fscache *ctx; struct fscache_cookie *cookie; + int ret;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) @@ -46,15 +51,40 @@ int erofs_fscache_register_cookie(struct super_block *sb, NULL, 0, NULL, 0, true); if (!cookie) { erofs_err(sb, "failed to get cookie for %s", name); - kfree(ctx); - return -EINVAL; + ret = -EINVAL; + goto err; }
//fscache_use_cookie(cookie, false); ctx->cookie = cookie;
+ if (need_inode) { + struct inode *const inode = new_inode(sb); + + if (!inode) { + erofs_err(sb, "failed to get anon inode for %s", name); + ret = -ENOMEM; + goto err_cookie; + } + + set_nlink(inode, 1); + inode->i_size = OFFSET_MAX; + inode->i_mapping->a_ops = &erofs_fscache_meta_aops; + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); + + ctx->inode = inode; + } + *fscache = ctx; return 0; + +err_cookie: +// fscache_unuse_cookie(ctx->cookie, NULL, NULL); + fscache_relinquish_cookie(ctx->cookie, NULL, false); + ctx->cookie = NULL; +err: + kfree(ctx); + return ret; }
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) @@ -68,6 +98,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) fscache_relinquish_cookie(ctx->cookie, NULL, false); ctx->cookie = NULL;
+ iput(ctx->inode); + ctx->inode = NULL; + kfree(ctx); *fscache = NULL; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index cd89fb654a11..22844d047566 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -81,6 +81,7 @@ struct erofs_fs_context {
struct erofs_fscache { struct fscache_cookie *cookie; + struct inode *inode; };
struct erofs_sb_info { @@ -484,7 +485,8 @@ int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb);
int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, char *name); + struct erofs_fscache **fscache, + char *name, bool need_inode); void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); #else static inline int erofs_fscache_register(void) @@ -500,7 +502,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
static inline int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, - char *name) + char *name, bool need_inode) { return -EOPNOTSUPP; }
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/0960115ba31d
--------------------------------
ANBZ: #1666
commit 37c90c5fae701983e21cc80396649e3aca7f4fa1 upstream.
Registers fscache context for primary data blob. Also move the initialization of s_op and related fields forward, since anonymous inode will be allocated under the super block when registering the fscache context.
Something worth mentioning about the cleanup routine.
1. The fscache context will instantiate anonymous inodes under the super block. Release these anonymous inodes when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning.
2. The fscache context is initialized prior to the root inode. If .kill_sb() is called when mount failed, .put_super() won't be called when root inode has not been initialized yet. Thus .kill_sb() shall also contain the cleanup routine.
Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-16-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/internal.h | 1 + fs/erofs/super.c | 13 +++++++++---- 2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 22844d047566..06aa8dc55cfc 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -127,6 +127,7 @@ struct erofs_sb_info {
/* fscache support */ struct fscache_cookie *volume; + struct erofs_fscache *s_fscache; };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 2b21c5efaff3..7af2afd62c40 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -427,6 +427,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) int err;
sb->s_magic = EROFS_SUPER_MAGIC; + sb->s_flags |= SB_RDONLY | SB_NOATIME; + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_op = &erofs_sops;
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) @@ -444,6 +447,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) err = erofs_fscache_register_fs(sb); if (err) return err; + err = erofs_fscache_register_cookie(sb, &sbi->s_fscache, + sbi->opt.fsid, true); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize"); @@ -455,11 +462,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) if (err) return err;
- sb->s_flags |= SB_RDONLY | SB_NOATIME; - sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_time_gran = 1; - - sb->s_op = &erofs_sops; sb->s_xattr = erofs_xattr_handlers;
if (test_opt(&sbi->opt, POSIX_ACL)) @@ -592,6 +595,7 @@ static void erofs_kill_sb(struct super_block *sb) if (!sbi) return; erofs_free_dev_context(sbi->devs); + erofs_fscache_unregister_cookie(&sbi->s_fscache); erofs_fscache_unregister_fs(sb); kfree(sbi); sb->s_fs_info = NULL; @@ -609,6 +613,7 @@ static void erofs_put_super(struct super_block *sb) iput(sbi->managed_cache); sbi->managed_cache = NULL; #endif + erofs_fscache_unregister_cookie(&sbi->s_fscache); }
static struct file_system_type erofs_fs_type = {
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/a0e45afbebb0
--------------------------------
ANBZ: #1666
commit 955b478e1b4ad5530cd10395d56d45119d3a3ff4 upstream.
Similar to the multi-device mode, erofs could be mounted from one primary data blob (mandatory) and multiple extra data blobs (optional).
Register fscache context for each extra data blob.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-17-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 3 +++ fs/erofs/internal.h | 2 ++ fs/erofs/super.c | 8 +++++++- 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 1cfb2d510a30..8b4547dc32af 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -222,6 +222,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
/* primary device by default */ map->m_bdev = sb->s_bdev; + map->m_fscache = EROFS_SB(sb)->s_fscache;
if (map->m_deviceid) { down_read(&devs->rwsem); @@ -231,6 +232,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) return -ENODEV; } map->m_bdev = dif->bdev; + map->m_fscache = dif->fscache; up_read(&devs->rwsem); } else if (devs->extra_devices) { down_read(&devs->rwsem); @@ -246,6 +248,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map) map->m_pa < startoff + length) { map->m_pa -= startoff; map->m_bdev = dif->bdev; + map->m_fscache = dif->fscache; break; } } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 06aa8dc55cfc..e1d526ee5be4 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
struct erofs_device_info { char *path; + struct erofs_fscache *fscache; struct block_device *bdev;
u32 blocks; @@ -392,6 +393,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode, #endif /* !CONFIG_EROFS_FS_ZIP */
struct erofs_map_dev { + struct erofs_fscache *m_fscache; struct block_device *m_bdev;
erofs_off_t m_pa; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 7af2afd62c40..037369ff7081 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -163,7 +163,12 @@ static int erofs_init_devices(struct super_block *sb, } dis = ptr + erofs_blkoff(pos);
- if (!erofs_is_fscache_mode(sb)) { + if (erofs_is_fscache_mode(sb)) { + err = erofs_fscache_register_cookie(sb, &dif->fscache, + dif->path, false); + if (err) + break; + } else { bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL, sb->s_type); @@ -530,6 +535,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
if (dif->bdev) blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL); + erofs_fscache_unregister_cookie(&dif->fscache); kfree(dif->path); kfree(dif); return 0;
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/a8d223ce9c5a
--------------------------------
ANBZ: #1666
commit 5375e7c8b0fef11645657384fe1f2cfed1e0baa7 upstream.
Implement the data plane of reading metadata from primary data blob over fscache.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-18-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 6 ++++- fs/erofs/fscache.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 8b4547dc32af..89f76d7a3452 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -34,9 +34,13 @@ static void erofs_readendio(struct bio *bio)
static struct page *erofs_read_meta_page(struct super_block *sb, pgoff_t index) { - struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping; + struct address_space *mapping; struct page *page;
+ if (erofs_is_fscache_mode(sb)) + mapping = EROFS_SB(sb)->s_fscache->inode->i_mapping; + else + mapping = sb->s_bdev->bd_inode->i_mapping; page = read_cache_page_gfp(mapping, index, mapping_gfp_constraint(mapping, ~__GFP_FS)); return page; diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 7659f8ae65c3..c922dda36bda 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -30,7 +30,69 @@ const struct fscache_cookie_def erofs_fscache_inode_object_def = { .type = FSCACHE_COOKIE_TYPE_DATAFILE, };
+static void erofs_readpage_from_fscache_complete(struct page *page, void *ctx, + int error) +{ + if (!error) + SetPageUptodate(page); + unlock_page(page); +} + +static int erofs_fscache_meta_readpage(struct file *data, struct page *page) +{ + int ret; + struct super_block *sb = page->mapping->host->i_sb; + struct erofs_map_dev mdev = { + .m_deviceid = 0, + .m_pa = page_offset(page), + }; + + ret = erofs_map_dev(sb, &mdev); + if (ret) + goto out; + + ret = fscache_read_or_alloc_page(mdev.m_fscache->cookie, page, + erofs_readpage_from_fscache_complete, + NULL, + GFP_KERNEL); + switch (ret) { + case 0: /* page found in fscache, read submitted */ + erofs_dbg("%s: submitted", __func__); + return ret; + case -ENOBUFS: /* page won't be cached */ + case -ENODATA: /* page not in cache */ + erofs_err(sb, "%s: %d", __func__, ret); + ret = -EIO; + goto out; + default: + erofs_err(sb, "unknown error ret = %d", ret); + } + +out: + unlock_page(page); + return ret; +} + +static int erofs_fscache_release_page(struct page *page, gfp_t gfp) +{ + if (WARN_ON(PagePrivate(page))) + return 0; + + ClearPageFsCache(page); + return 1; +} + +static void erofs_fscache_invalidate_page(struct page *page, unsigned int offset, + unsigned int length) +{ + if (offset == 0 && length == PAGE_SIZE) + ClearPageFsCache(page); +} + static const struct address_space_operations erofs_fscache_meta_aops = { + .readpage = erofs_fscache_meta_readpage, + .releasepage = erofs_fscache_release_page, + .invalidatepage = erofs_fscache_invalidate_page, };
int erofs_fscache_register_cookie(struct super_block *sb,
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/2379fb87ba9e
--------------------------------
ANBZ: #1666
commit 1442b02b66ad2c568f9d5178b7c3c1287b37e438 upstream.
Implement the data plane of reading data from data blobs over fscache for non-inline layout.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-19-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 59 +++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/inode.c | 4 +++ fs/erofs/internal.h | 1 + 3 files changed, 64 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index c922dda36bda..5c48656550f7 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -89,12 +89,71 @@ static void erofs_fscache_invalidate_page(struct page *page, unsigned int offset ClearPageFsCache(page); }
+static int erofs_fscache_readpage(struct file *file, struct page *page) +{ + struct inode *inode = page->mapping->host; + struct super_block *sb = inode->i_sb; + struct erofs_map_blocks map; + struct erofs_map_dev mdev; + erofs_off_t pos = page_offset(page); + loff_t pstart; + int ret; + + map.m_la = pos; + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + if (ret) + goto out_unlock; + + if (!(map.m_flags & EROFS_MAP_MAPPED)) { + zero_user_segment(page, 0, PAGE_SIZE); + SetPageUptodate(page); + goto out_unlock; + } + + mdev = (struct erofs_map_dev) { + .m_deviceid = map.m_deviceid, + .m_pa = map.m_pa, + }; + + ret = erofs_map_dev(sb, &mdev); + if (ret) + goto out_unlock; + + pstart = mdev.m_pa + (pos - map.m_la); + ret = fscache_read_or_alloc_page2(mdev.m_fscache->cookie, page, + erofs_readpage_from_fscache_complete, + NULL, + GFP_KERNEL, pstart); + switch (ret) { + case 0: /* page found in fscache, read submitted */ + erofs_dbg("%s: submitted", __func__); + return ret; + case -ENOBUFS: /* page won't be cached */ + case -ENODATA: /* page not in cache */ + erofs_err(sb, "%s: %d", __func__, ret); + ret = -EIO; + goto out_unlock; + default: + erofs_err(sb, "unknown error ret = %d", ret); + } + +out_unlock: + unlock_page(page); + return ret; +} + static const struct address_space_operations erofs_fscache_meta_aops = { .readpage = erofs_fscache_meta_readpage, .releasepage = erofs_fscache_release_page, .invalidatepage = erofs_fscache_invalidate_page, };
+const struct address_space_operations erofs_fscache_access_aops = { + .readpage = erofs_fscache_readpage, + .releasepage = erofs_fscache_release_page, + .invalidatepage = erofs_fscache_invalidate_page, +}; + int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, char *name, bool need_inode) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index b1aca34c47a6..bb6b41cd68b0 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -290,6 +290,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir) goto out_unlock; } inode->i_mapping->a_ops = &erofs_raw_access_aops; +#ifdef CONFIG_EROFS_FS_ONDEMAND + if (erofs_is_fscache_mode(inode->i_sb)) + inode->i_mapping->a_ops = &erofs_fscache_access_aops; +#endif
out_unlock: erofs_put_metabuf(&buf); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index e1d526ee5be4..b9ba4627fdf3 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -491,6 +491,7 @@ int erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache **fscache, char *name, bool need_inode); void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); +extern const struct address_space_operations erofs_fscache_access_aops; #else static inline int erofs_fscache_register(void) {
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/4f2620305915
--------------------------------
ANBZ: #1666
commit bd735bdaa62fb64980c07f5443f24aefd0081569 upstream.
Implement the data plane of reading data from data blobs over fscache for inline layout.
For the heading non-inline part, the data plane for non-inline layout is reused, while only the tail packing part needs special handling.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-20-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 5c48656550f7..37f8d754a741 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -89,6 +89,34 @@ static void erofs_fscache_invalidate_page(struct page *page, unsigned int offset ClearPageFsCache(page); }
+static int erofs_fscache_readpage_inline(struct page *page, + struct erofs_map_blocks *map) +{ + struct super_block *sb = page->mapping->host->i_sb; + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; + erofs_blk_t blknr; + size_t offset, len; + void *src, *dst; + + /* For tail packing layout, the offset may be non-zero. */ + offset = erofs_blkoff(map->m_pa); + blknr = erofs_blknr(map->m_pa); + len = map->m_llen; + + src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP_ATOMIC); + if (IS_ERR(src)) + return PTR_ERR(src); + + dst = kmap_atomic(page); + memcpy(dst, src + offset, len); + memset(dst + len, 0, PAGE_SIZE - len); + kunmap_atomic(dst); + + erofs_put_metabuf(&buf); + SetPageUptodate(page); + return 0; +} + static int erofs_fscache_readpage(struct file *file, struct page *page) { struct inode *inode = page->mapping->host; @@ -110,6 +138,11 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) goto out_unlock; }
+ if (map.m_flags & EROFS_MAP_META) { + ret = erofs_fscache_readpage_inline(page, &map); + goto out_unlock; + } + mdev = (struct erofs_map_dev) { .m_deviceid = map.m_deviceid, .m_pa = map.m_pa,
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/15a330a92f9f
--------------------------------
ANBZ: #1666
commit 9c0cc9c729657446ed001a99488a9d82f5124af4 upstream.
Introduce 'fsid' mount option to enable on-demand read sementics, in which case, erofs will be mounted from data blobs. Users could specify the name of primary data blob by this mount option.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220425122143.56815-22-jefflexu@linux.alibaba.com Acked-by: Chao Yu chao@kernel.org Tested-by: Zichen Tian tianzichen@kuaishou.com Tested-by: Jia Zhu zhujia.zj@bytedance.com Tested-by: Yan Song yansong.ys@antgroup.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/super.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 037369ff7081..5529c83d5d83 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -281,6 +281,7 @@ enum { Opt_acl, Opt_cache_strategy, Opt_device, + Opt_fsid, Opt_err };
@@ -297,6 +298,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { fsparam_enum("cache_strategy", Opt_cache_strategy, erofs_param_cache_strategy), fsparam_string("device", Opt_device), + fsparam_string("fsid", Opt_fsid), {} };
@@ -359,6 +361,17 @@ static int erofs_fc_parse_param(struct fs_context *fc, } ++ctx->devs->extra_devices; break; + case Opt_fsid: +#ifdef CONFIG_EROFS_FS_ONDEMAND + kfree(ctx->opt.fsid); + ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL); + if (!ctx->opt.fsid) + return -ENOMEM; +#else + errorfc(fc, "fsid option not supported"); + return -EINVAL; +#endif + break; default: return -ENOPARAM; } @@ -442,6 +455,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi; sbi->opt = ctx->opt; + ctx->opt.fsid = NULL; sbi->devs = ctx->devs; ctx->devs = NULL;
@@ -507,6 +521,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
static int erofs_fc_get_tree(struct fs_context *fc) { + struct erofs_fs_context *ctx = fc->fs_private; + + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid) + return get_tree_nodev(fc, erofs_fc_fill_super); + return get_tree_bdev(fc, erofs_fc_fill_super); }
@@ -555,6 +574,7 @@ static void erofs_fc_free(struct fs_context *fc) struct erofs_fs_context *ctx = fc->fs_private;
erofs_free_dev_context(ctx->devs); + kfree(ctx->opt.fsid); kfree(ctx); }
@@ -595,7 +615,10 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
- kill_block_super(sb); + if (erofs_is_fscache_mode(sb)) + generic_shutdown_super(sb); + else + kill_block_super(sb);
sbi = EROFS_SB(sb); if (!sbi) @@ -603,6 +626,7 @@ static void erofs_kill_sb(struct super_block *sb) erofs_free_dev_context(sbi->devs); erofs_fscache_unregister_cookie(&sbi->s_fscache); erofs_fscache_unregister_fs(sb); + kfree(sbi->opt.fsid); kfree(sbi); sb->s_fs_info = NULL; } @@ -736,6 +760,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",cache_strategy=readahead"); else if (opt->cache_strategy == EROFS_ZIP_CACHE_READAROUND) seq_puts(seq, ",cache_strategy=readaround"); +#endif +#ifdef CONFIG_EROFS_FS_ONDEMAND + if (sbi->opt.fsid) + seq_printf(seq, ",fsid=%s", sbi->opt.fsid); #endif return 0; }
From: Gao Xiang hsiangkao@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/52ffb0b7c3be
--------------------------------
ANBZ: #1666
commit ba73eadd23d1c2dc5c8dc0c0ae2eeca2b9b709a7 upstream.
When "-o device" mount option is not specified, scan the device table and instantiate the devices if there's any in the device table. In this case, the tag field of each device slot uniquely specifies a device.
Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20220512055601.106109-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Huang Jianan jnhuang@linux.alibaba.com Reviewed-by: Jeffle Xu jefflexu@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/erofs_fs.h | 9 ++--- fs/erofs/super.c | 98 +++++++++++++++++++++++++++++++-------------- 2 files changed, 70 insertions(+), 37 deletions(-)
diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index b15a62bf2c46..9e2712976429 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -28,12 +28,9 @@ #define EROFS_SB_EXTSLOT_SIZE 16
struct erofs_deviceslot { - union { - u8 uuid[16]; /* used for device manager later */ - u8 userdata[64]; /* digest(sha256), etc. */ - } u; - __le32 blocks; /* total fs blocks of this device */ - __le32 mapped_blkaddr; /* map starting at mapped_blkaddr */ + u8 tag[64]; /* digest(sha256), etc. */ + __le32 blocks; /* total fs blocks of this device */ + __le32 mapped_blkaddr; /* map starting at mapped_blkaddr */ u8 reserved[56]; }; #define EROFS_DEVT_SLOT_SIZE sizeof(struct erofs_deviceslot) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 5529c83d5d83..dabcab870a2a 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -123,7 +123,51 @@ static bool check_layout_compatibility(struct super_block *sb, return true; }
-static int erofs_init_devices(struct super_block *sb, +static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, + struct erofs_device_info *dif, erofs_off_t *pos) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + struct erofs_deviceslot *dis; + struct block_device *bdev; + void *ptr; + int ret; + + ptr = erofs_read_metabuf(buf, sb, erofs_blknr(*pos), EROFS_KMAP); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + dis = ptr + erofs_blkoff(*pos); + + if (!dif->path) { + if (!dis->tag[0]) { + erofs_err(sb, "empty device tag @ pos %llu", *pos); + return -EINVAL; + } + dif->path = kmemdup_nul(dis->tag, sizeof(dis->tag), GFP_KERNEL); + if (!dif->path) + return -ENOMEM; + } + + if (erofs_is_fscache_mode(sb)) { + ret = erofs_fscache_register_cookie(sb, &dif->fscache, + dif->path, false); + if (ret) + return ret; + } else { + bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL, + sb->s_type); + if (IS_ERR(bdev)) + return PTR_ERR(bdev); + dif->bdev = bdev; + } + + dif->blocks = le32_to_cpu(dis->blocks); + dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr); + sbi->total_blocks += dif->blocks; + *pos += EROFS_DEVT_SLOT_SIZE; + return 0; +} + +static int erofs_scan_devices(struct super_block *sb, struct erofs_super_block *dsb) { struct erofs_sb_info *sbi = EROFS_SB(sb); @@ -131,8 +175,6 @@ static int erofs_init_devices(struct super_block *sb, erofs_off_t pos; struct erofs_buf buf = __EROFS_BUF_INITIALIZER; struct erofs_device_info *dif; - struct erofs_deviceslot *dis; - void *ptr; int id, err = 0;
sbi->total_blocks = sbi->primarydevice_blocks; @@ -141,7 +183,8 @@ static int erofs_init_devices(struct super_block *sb, else ondisk_extradevs = le16_to_cpu(dsb->extra_devices);
- if (ondisk_extradevs != sbi->devs->extra_devices) { + if (sbi->devs->extra_devices && + ondisk_extradevs != sbi->devs->extra_devices) { erofs_err(sb, "extra devices don't match (ondisk %u, given %u)", ondisk_extradevs, sbi->devs->extra_devices); return -EINVAL; @@ -152,38 +195,31 @@ static int erofs_init_devices(struct super_block *sb, sbi->device_id_mask = roundup_pow_of_two(ondisk_extradevs + 1) - 1; pos = le16_to_cpu(dsb->devt_slotoff) * EROFS_DEVT_SLOT_SIZE; down_read(&sbi->devs->rwsem); - idr_for_each_entry(&sbi->devs->tree, dif, id) { - struct block_device *bdev; - - ptr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), - EROFS_KMAP); - if (IS_ERR(ptr)) { - err = PTR_ERR(ptr); - break; + if (sbi->devs->extra_devices) { + idr_for_each_entry(&sbi->devs->tree, dif, id) { + err = erofs_init_device(&buf, sb, dif, &pos); + if (err) + break; } - dis = ptr + erofs_blkoff(pos); + } else { + for (id = 0; id < ondisk_extradevs; id++) { + dif = kzalloc(sizeof(*dif), GFP_KERNEL); + if (!dif) { + err = -ENOMEM; + break; + } + err = idr_alloc(&sbi->devs->tree, dif, 0, 0, GFP_KERNEL); + if (err < 0) { + kfree(dif); + break; + } + ++sbi->devs->extra_devices;
- if (erofs_is_fscache_mode(sb)) { - err = erofs_fscache_register_cookie(sb, &dif->fscache, - dif->path, false); + err = erofs_init_device(&buf, sb, dif, &pos); if (err) break; - } else { - bdev = blkdev_get_by_path(dif->path, - FMODE_READ | FMODE_EXCL, - sb->s_type); - if (IS_ERR(bdev)) { - err = PTR_ERR(bdev); - goto err_out; - } - dif->bdev = bdev; } - dif->blocks = le32_to_cpu(dis->blocks); - dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr); - sbi->total_blocks += dif->blocks; - pos += EROFS_DEVT_SLOT_SIZE; } -err_out: up_read(&sbi->devs->rwsem); erofs_put_metabuf(&buf); return err; @@ -255,7 +291,7 @@ static int erofs_read_superblock(struct super_block *sb) }
/* handle multiple devices */ - ret = erofs_init_devices(sb, dsb); + ret = erofs_scan_devices(sb, dsb); out: erofs_put_metabuf(&buf); return ret;
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d6c1b4c226b0
--------------------------------
ANBZ: #3211
commit 1015c1016c231b26d4e2c9b3da65b6c043eb97a3 upstream.
Use kill_anon_super() instead of generic_shutdown_super() since the mount() in erofs fscache mode uses get_tree_nodev() and associated anon bdev needs to be freed.
Fixes: 9c0cc9c729657 ("erofs: add 'fsid' mount option") Suggested-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918043456.147-2-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/884 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index dabcab870a2a..74c9653e6840 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -652,7 +652,7 @@ static void erofs_kill_sb(struct super_block *sb) WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
if (erofs_is_fscache_mode(sb)) - generic_shutdown_super(sb); + kill_anon_super(sb); else kill_block_super(sb);
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/f6f607feca33
--------------------------------
ANBZ: #3234
Give the right name to erofs_fscache_super_index_def.
Fixes: 63ed7d0d74e9 ("erofs: add fscache context helper functions") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 37f8d754a741..8eff2967cbdd 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -26,7 +26,7 @@ const struct fscache_cookie_def erofs_fscache_super_index_def = { };
const struct fscache_cookie_def erofs_fscache_inode_object_def = { - .name = "CIFS.uniqueid", + .name = "EROFS.uniqueid", .type = FSCACHE_COOKIE_TYPE_DATAFILE, };
From: Jeffle Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/56f44bed66bc
--------------------------------
ANBZ: #4627
commit 0130e4e8e49f9bd0342d3fc14102470ea9e7230e upstream.
erofs over fscache doesn't support the compressed layout yet. It will cause NULL crash if there are compressed inodes contained when working in fscache mode.
So far in the erofs based container image distribution scenarios (RAFS v6), the compressed RAFS v6 images are downloaded and then decompressed on demand as an uncompressed erofs image. Then the erofs image is mounted in fscache mode for containers to use. IOWs, currently compressed data is decompressed on the userspace side instead and uncompressed erofs images will be finally cached.
The fscache support for the compressed layout is still under development and it will be used for runtime decompression feature. Anyway, to avoid the potential crash, let's leave the compressed inodes unsupported in fscache mode until we support it later.
Fixes: 1442b02b66ad ("erofs: implement fscache-based data read for non-inline layout") Signed-off-by: Jeffle Xu jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Link: https://lore.kernel.org/r/20220526010344.118493-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1561 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/inode.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index bb6b41cd68b0..fc05960e228e 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -286,7 +286,10 @@ static int erofs_fill_inode(struct inode *inode, int isdir) }
if (erofs_inode_is_data_compressed(vi->datalayout)) { - err = z_erofs_fill_inode(inode); + if (!erofs_is_fscache_mode(inode->i_sb)) + err = z_erofs_fill_inode(inode); + else + err = -EOPNOTSUPP; goto out_unlock; } inode->i_mapping->a_ops = &erofs_raw_access_aops;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d1a5c14b77b3
--------------------------------
ANBZ: #2056
Allocate and initialize the file descriptor of the backing file at the lookup phase, in prep for the following support for readahead in on-demand mode. This file descriptor is only maintained in on-demand mode.
One thing worth noting is that, the file descriptor is opened with FMODE_RANDOM, so that the following page_cache_sync_ra() will fallback to force_page_cache_ra(). The following readahead routine in on-demand mode will trigger force readahead on backing files to read from backing files. We'd better make the implementation self-contained so that later the related kernel modules can be distributed and deployed directly without upgrading the kernel. However force_page_cache_ra() is not exported. To work around this, set FMODE_RANDOM on the file descriptor, and call page_cache_sync_readahead() instead.
Besides, implement the write routine of the anonymous fd with buffer IO instead, which can also be facilitated from the pre-allocated file descriptor.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/692 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 6 ++++++ fs/cachefiles/internal.h | 1 + fs/cachefiles/namei.c | 18 ++++++++++++++++++ fs/cachefiles/ondemand.c | 20 ++------------------ 4 files changed, 27 insertions(+), 18 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 80a241638452..60b6ca443e8e 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -308,6 +308,12 @@ static void cachefiles_drop_object(struct fscache_object *_object) object->backer = NULL; }
+ /* clean up file descriptor for non-index object */ + if (object->file) { + fput(object->file); + object->file = NULL; + } + /* note that the object is now inactive */ if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)) cachefiles_mark_object_inactive(cache, object, i_blocks); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index cff061f208f6..ab0ca3b1cd08 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -39,6 +39,7 @@ struct cachefiles_object { struct cachefiles_lookup_data *lookup_data; /* cached lookup data */ struct dentry *dentry; /* the file/dir representing this object */ struct dentry *backer; /* backing file */ + struct file *file; /* backing file in on-demand mode */ loff_t i_size; /* object size */ unsigned long flags; #define CACHEFILES_OBJECT_ACTIVE 0 /* T if marked active */ diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 22a409669fd0..3c7168d0beec 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -705,6 +705,24 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, if (object->dentry->d_sb->s_blocksize > PAGE_SIZE) goto check_error;
+ if (cachefiles_in_ondemand_mode(cache)) { + struct path path; + struct file *file; + + path.mnt = cache->mnt; + path.dentry = object->dentry; + file = dentry_open(&path, O_RDWR | O_LARGEFILE, + cache->cache_cred); + if (IS_ERR(file)) + goto check_error; + /* + * so that page_cache_sync_readahead() will fallback + * to force_page_cache_readahead() + */ + file->f_mode |= FMODE_RANDOM; + object->file = file; + } + object->backer = object->dentry; } else { BUG(); // TODO: open file in data-class subdir diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index bebc2f8627d8..250b98e9820c 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -51,30 +51,14 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, struct iov_iter *iter) { struct cachefiles_object *object = kiocb->ki_filp->private_data; - struct cachefiles_cache *cache; size_t len = iter->count; loff_t pos = kiocb->ki_pos; - struct path path; - struct file *file; int ret;
- if (!object->backer) - return -ENOBUFS; - - cache = container_of(object->fscache.cache, - struct cachefiles_cache, cache); - - /* write data to the backing filesystem and let it store it in its - * own time */ - path.mnt = cache->mnt; - path.dentry = object->backer; - file = dentry_open(&path, O_RDWR | O_LARGEFILE | O_DIRECT, - cache->cache_cred); - if (IS_ERR(file)) + if (!object->file) return -ENOBUFS;
- ret = vfs_iter_write(file, iter, &pos, 0); - fput(file); + ret = vfs_iter_write(object->file, iter, &pos, 0); if (ret != len) return -EIO; return len;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d6843be666bf
--------------------------------
ANBZ: #2056
Fscache/CacheFiles offer fscache_read_or_alloc_pages() to implement the readahead routine of filesystems using fscache. The implementation of fscache_read_or_alloc_pages() will call .readpage() on each backpage, in which case each backpage will generate an IO request. The performance bottleneck is not an issue when fscache is used as the local cache for network filesystems. However it is not the case for filesystems using fscache in on-demand mode.
This patch introduces a new helper fscache_prepare_read() for this use. It first checks if there's any hole inside the requested range, and triggers on-demand read if there's any. This step ensures that all the data is ready there for the requested range.
Then it triggers an asynchronous readahead for the backing file. Since FMODE_RANDOM, the following page_cache_sync_readahead() will fallback to force_page_cache_readahead().
At last it will start a synchronous buffer read on the backing file. Since the asynchronous readahead, the following buffer read will find the page cache up to date most times. The buffer read is handled in the context of workers, so that the readahead routine will not be blocked in the synchronous buffer read.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/692 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 1 + fs/cachefiles/internal.h | 1 + fs/cachefiles/rdwr.c | 162 ++++++++++++++++++++++++++++++++++ fs/fscache/page.c | 66 ++++++++++++++ include/linux/fscache-cache.h | 5 ++ include/linux/fscache.h | 20 +++++ 6 files changed, 255 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 60b6ca443e8e..634e7041c0f3 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -573,6 +573,7 @@ const struct fscache_cache_ops cachefiles_cache_ops = { .attr_changed = cachefiles_attr_changed, .read_or_alloc_page = cachefiles_read_or_alloc_page, .read_or_alloc_pages = cachefiles_read_or_alloc_pages, + .prepare_read = cachefiles_prepare_read, .allocate_page = cachefiles_allocate_page, .allocate_pages = cachefiles_allocate_pages, .write_page = cachefiles_write_page, diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index ab0ca3b1cd08..b8ef5be59005 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -241,6 +241,7 @@ extern int cachefiles_read_or_alloc_page(struct fscache_retrieval *, extern int cachefiles_read_or_alloc_pages(struct fscache_retrieval *, struct list_head *, unsigned *, gfp_t); +extern int cachefiles_prepare_read(struct fscache_retrieval *op, pgoff_t index); extern int cachefiles_allocate_page(struct fscache_retrieval *, struct page *, gfp_t); extern int cachefiles_allocate_pages(struct fscache_retrieval *, diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 7cfbbeee9e87..0e1992bedf71 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -9,6 +9,8 @@ #include <linux/slab.h> #include <linux/file.h> #include <linux/swap.h> +#include <linux/backing-dev.h> +#include <linux/uio.h> #include "internal.h"
/* @@ -793,6 +795,166 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, return -ENOBUFS; }
+static int cachefiles_ondemand_check(struct cachefiles_object *object, + loff_t start_pos, size_t len) +{ + struct file *file = object->file; + size_t remained; + loff_t pos; + int ret; + + /* make sure there's no hole in the requested range */ + pos = start_pos; + remained = len; + + while (remained) { + bool again = true; + size_t count = remained; + loff_t off, off2, new_pos; +retry: + off = vfs_llseek(file, pos, SEEK_DATA); + if (off < 0) { + if (off == (loff_t)-ENXIO) + goto ondemand_read; + return -ENODATA; + } + + if (off >= pos + remained) + goto ondemand_read; + + if (off > pos) { + count = off - pos; + goto ondemand_read; + } + + off2 = vfs_llseek(file, pos, SEEK_HOLE); + if (off2 < 0) + return -ENODATA; + + new_pos = min_t(loff_t, off2, pos + remained); + remained -= new_pos - pos; + pos = new_pos; + continue; +ondemand_read: + if (again) { + ret = cachefiles_ondemand_read(object, pos, count); + if (!ret) { + /* recheck if the hole has been filled or not */ + again = false; + goto retry; + } + } + return -ENODATA; + } + return 0; +} + +struct cachefiles_kiocb { + struct kiocb iocb; + struct fscache_retrieval *op; + struct iov_iter iter; + struct work_struct work; + struct bio_vec bvs[]; +}; + +void cachefiles_readpages_work_func(struct work_struct *work) +{ + struct cachefiles_kiocb *ki = container_of(work, struct cachefiles_kiocb, work); + int ret; + + ret = vfs_iocb_iter_read(ki->iocb.ki_filp, &ki->iocb, &ki->iter); + /* complete the request if there's any progress or error occurred */ + if (ret != -EIOCBQUEUED) { + struct fscache_retrieval *op = ki->op; + unsigned int nr_pages = atomic_read(&op->n_pages); + unsigned int done_pages = 0; + int i, error; + + if (ret > 0) + done_pages = ret / PAGE_SIZE; + + for (i = 0; i < nr_pages; i++) { + error = i < done_pages ? 0 : -EIO; + fscache_end_io(op, ki->bvs[i].bv_page, error); + } + + fscache_retrieval_complete(op, nr_pages); + fscache_put_retrieval(op); + kfree(ki); + } +} + +int cachefiles_prepare_read(struct fscache_retrieval *op, pgoff_t index) +{ + struct cachefiles_object *object; + struct cachefiles_kiocb *ki; + loff_t start_pos = op->offset; + unsigned int n, nr_pages = atomic_read(&op->n_pages); + size_t len = nr_pages << PAGE_SHIFT; + struct page **pages; + size_t size; + int i, ret; + + object = container_of(op->op.object, struct cachefiles_object, fscache); + if (!object->backer) + goto all_enobufs; + + /* + * 1. Check if there's hole in the requested range, and trigger an + * on-demand read request if there's any. + */ + ASSERT(start_pos % PAGE_SIZE == 0); + ret = cachefiles_ondemand_check(object, start_pos, len); + if (ret) + goto all_enobufs; + + /* + * 2. Trigger readahead on the backing file in advance. Since + * FMODE_RANDOM, the following page_cache_sync_readahead() will fallback + * to force_page_cache_readahead(). + */ + page_cache_sync_readahead(d_inode(object->backer)->i_mapping, + &object->file->f_ra, object->file, + start_pos / PAGE_SIZE, nr_pages); + + size = sizeof(struct cachefiles_kiocb) + nr_pages * sizeof(struct bio_vec); + ki = kzalloc(size, GFP_KERNEL); + if (!ki) + goto all_enobufs; + + /* reuse the tailing part of ki as pages[] */ + pages = (void *)ki + size - nr_pages * sizeof(struct page *); + n = find_get_pages_contig(op->mapping, index, nr_pages, pages); + if (WARN_ON(n != nr_pages)) { + for (i = 0; i < n; i++) + put_page(pages[i]); + kfree(ki); + goto all_enobufs; + } + + for (i = 0; i < n; i++) { + put_page(pages[i]); + ki->bvs[i].bv_page = pages[i]; + ki->bvs[i].bv_offset = 0; + ki->bvs[i].bv_len = PAGE_SIZE; + } + iov_iter_bvec(&ki->iter, READ, ki->bvs, n, n * PAGE_SIZE); + + ki->iocb.ki_filp = object->file; + ki->iocb.ki_pos = start_pos; + ki->iocb.ki_ioprio = get_current_ioprio(); + ki->op = fscache_get_retrieval(op); + + /* 3. Start a buffer read in worker context */ + INIT_WORK(&ki->work, cachefiles_readpages_work_func); + queue_work(system_unbound_wq, &ki->work); + return 0; + +all_enobufs: + fscache_retrieval_complete(op, nr_pages); + return -ENOBUFS; +} + /* * allocate a block in the cache in which to store a page * - cache withdrawal is prevented by the caller diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 888ace2cc6e1..39a05a43284d 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -666,6 +666,72 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie, } EXPORT_SYMBOL(__fscache_read_or_alloc_pages);
+int __fscache_prepare_read(struct fscache_cookie *cookie, + struct address_space *mapping, pgoff_t index, + unsigned int nr_pages, loff_t start_pos, + fscache_rw_complete_t term_func, void *context) +{ + struct fscache_retrieval *op; + struct fscache_object *object; + bool wake_cookie = false; + int ret; + + if (hlist_empty(&cookie->backing_objects)) + return -ENOBUFS; + + if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) { + _leave(" = -ENOBUFS [invalidating]"); + return -ENOBUFS; + } + + ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); + + if (fscache_wait_for_deferred_lookup(cookie) < 0) + return -ERESTARTSYS; + + op = fscache_alloc_retrieval(cookie, mapping, term_func, context); + if (!op) + return -ENOMEM; + atomic_set(&op->n_pages, nr_pages); + op->offset = start_pos; + + spin_lock(&cookie->lock); + + if (!fscache_cookie_enabled(cookie) || + hlist_empty(&cookie->backing_objects)) + goto nobufs_unlock; + + object = hlist_entry(cookie->backing_objects.first, + struct fscache_object, cookie_link); + + __fscache_use_cookie(cookie); + if (fscache_submit_op(object, &op->op) < 0) + goto nobufs_unlock_dec; + spin_unlock(&cookie->lock); + + ret = fscache_wait_for_operation_activation( + object, &op->op, + __fscache_stat(&fscache_n_retrieval_op_waits), + __fscache_stat(&fscache_n_retrievals_object_dead)); + if (ret < 0) + goto out; + + ret = object->cache->ops->prepare_read(op, index); +out: + fscache_put_retrieval(op); + return ret; + +nobufs_unlock_dec: + wake_cookie = __fscache_unuse_cookie(cookie); +nobufs_unlock: + spin_unlock(&cookie->lock); + fscache_put_retrieval(op); + if (wake_cookie) + __fscache_wake_unused_cookie(cookie); + return -ENOBUFS; +} +EXPORT_SYMBOL(__fscache_prepare_read); + /* * allocate a block in the cache on which to store a page * - we return: diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 71ee23f78f1d..31f2f13e2924 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -161,6 +161,9 @@ typedef int (*fscache_pages_retrieval_func_t)(struct fscache_retrieval *op, unsigned *nr_pages, gfp_t gfp);
+typedef int (*fscache_prepare_read_func_t)(struct fscache_retrieval *op, + pgoff_t index); + /** * fscache_get_retrieval - Get an extra reference on a retrieval operation * @op: The retrieval operation to get a reference on @@ -285,6 +288,8 @@ struct fscache_cache_ops { * the cache */ fscache_pages_retrieval_func_t read_or_alloc_pages;
+ fscache_prepare_read_func_t prepare_read; + /* request a backing block for a page be allocated in the cache so that * it can be written directly */ fscache_page_retrieval_func_t allocate_page; diff --git a/include/linux/fscache.h b/include/linux/fscache.h index ce51b915ad43..f262446f3a49 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -212,6 +212,13 @@ extern int __fscache_read_or_alloc_pages(struct fscache_cookie *, fscache_rw_complete_t, void *, gfp_t); +extern int __fscache_prepare_read(struct fscache_cookie *cookie, + struct address_space *mapping, + pgoff_t index, + unsigned int nr_pages, + loff_t start_pos, + fscache_rw_complete_t term_func, + void *context); extern int __fscache_alloc_page(struct fscache_cookie *, struct page *, gfp_t); extern int __fscache_write_page(struct fscache_cookie *, struct page *, loff_t, gfp_t); extern void __fscache_uncache_page(struct fscache_cookie *, struct page *); @@ -616,6 +623,19 @@ int fscache_read_or_alloc_pages(struct fscache_cookie *cookie, return -ENOBUFS; }
+static inline +int fscache_prepare_read(struct fscache_cookie *cookie, + struct address_space *mapping, pgoff_t index, + unsigned int nr_pages, loff_t start_pos, + fscache_rw_complete_t term_func, void *context) +{ + if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie)) + return __fscache_prepare_read(cookie, mapping, index, + nr_pages, start_pos, term_func, context); + else + return -ENOBUFS; +} + /** * fscache_alloc_page - Allocate a block in which to store a page * @cookie: The cookie representing the cache object
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/340957ba03ef
--------------------------------
ANBZ: #2056
Implement the readahead routine with fscache_prepare_read().
Besides, register an individual bdi for each erofs instance to enable readahead in on-demand mode.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/692 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++ fs/erofs/super.c | 4 +++ 2 files changed, 85 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 8eff2967cbdd..95b726357bc3 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -38,6 +38,13 @@ static void erofs_readpage_from_fscache_complete(struct page *page, void *ctx, unlock_page(page); }
+static void erofs_readahead_from_fscache_complete(struct page *page, void *ctx, + int error) +{ + erofs_readpage_from_fscache_complete(page, ctx, error); + put_page(page); +} + static int erofs_fscache_meta_readpage(struct file *data, struct page *page) { int ret; @@ -175,6 +182,79 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) return ret; }
+static void erofs_fscache_readahead(struct readahead_control *rac) +{ + struct inode *inode = rac->mapping->host; + struct super_block *sb = inode->i_sb; + struct page *page; + size_t len, count, done = 0; + erofs_off_t pos; + loff_t start, start_pos; + int ret; + + if (!readahead_count(rac)) + return; + + start = readahead_pos(rac); + len = readahead_length(rac); + + do { + struct erofs_map_blocks map; + struct erofs_map_dev mdev; + + pos = start + done; + + map.m_la = pos; + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + if (ret) + return; + + if (!(map.m_flags & EROFS_MAP_MAPPED)) { + page = readahead_page(rac); + zero_user_segment(page, 0, PAGE_SIZE); + SetPageUptodate(page); + unlock_page(page); + put_page(page); + done += PAGE_SIZE; + continue; + } + + if (map.m_flags & EROFS_MAP_META) { + page = readahead_page(rac); + ret = erofs_fscache_readpage_inline(page, &map); + unlock_page(page); + put_page(page); + done += PAGE_SIZE; + continue; + } + + mdev = (struct erofs_map_dev) { + .m_deviceid = map.m_deviceid, + .m_pa = map.m_pa, + }; + + ret = erofs_map_dev(sb, &mdev); + if (ret) + return; + + start_pos = mdev.m_pa + (pos - map.m_la); + count = min_t(size_t, map.m_llen - (pos - map.m_la), len - done); + ret = fscache_prepare_read(mdev.m_fscache->cookie, rac->mapping, + pos / PAGE_SIZE, count / PAGE_SIZE, start_pos, + erofs_readahead_from_fscache_complete, NULL); + if (ret) { + erofs_err(sb, "%s: prepare_read %d", __func__, ret); + return; + } + + done += count; + while (count) { + page = readahead_page(rac); + count -= PAGE_SIZE; + } + } while (done < len); +} + static const struct address_space_operations erofs_fscache_meta_aops = { .readpage = erofs_fscache_meta_readpage, .releasepage = erofs_fscache_release_page, @@ -183,6 +263,7 @@ static const struct address_space_operations erofs_fscache_meta_aops = {
const struct address_space_operations erofs_fscache_access_aops = { .readpage = erofs_fscache_readpage, + .readahead = erofs_fscache_readahead, .releasepage = erofs_fscache_release_page, .invalidatepage = erofs_fscache_invalidate_page, }; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 74c9653e6840..31ee5e56ac0e 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -506,6 +506,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sbi->opt.fsid, true); if (err) return err; + + err = super_setup_bdi(sb); + if (err) + return err; } else { if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) { erofs_err(sb, "failed to set erofs blksize");
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/7b8f11795342
--------------------------------
ANBZ: #3234
commit e1de2da0b7ac2dc0120c2ba8c7044788611933ea upstream.
Some cleanups. No logic changes.
Suggested-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918043456.147-3-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com [jingbo: still use erofs_fscache_unregister_cookie in erofs_put_super] Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 39 +++++++++++++++++++-------------------- fs/erofs/internal.h | 19 +++++++++---------- fs/erofs/super.c | 21 +++++++++------------ 3 files changed, 37 insertions(+), 42 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 95b726357bc3..ade0142d035b 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -268,9 +268,8 @@ const struct address_space_operations erofs_fscache_access_aops = { .invalidatepage = erofs_fscache_invalidate_page, };
-int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, - char *name, bool need_inode) +struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, + char *name, bool need_inode) { struct erofs_fscache *ctx; struct fscache_cookie *cookie; @@ -278,7 +277,7 @@ int erofs_fscache_register_cookie(struct super_block *sb,
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) - return -ENOMEM; + return ERR_PTR(-ENOMEM);
cookie = fscache_acquire_cookie(EROFS_SB(sb)->volume, &erofs_fscache_inode_object_def, @@ -310,42 +309,33 @@ int erofs_fscache_register_cookie(struct super_block *sb, ctx->inode = inode; }
- *fscache = ctx; - return 0; + return ctx;
err_cookie: // fscache_unuse_cookie(ctx->cookie, NULL, NULL); fscache_relinquish_cookie(ctx->cookie, NULL, false); - ctx->cookie = NULL; err: kfree(ctx); - return ret; + return ERR_PTR(ret); }
-void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) { - struct erofs_fscache *ctx = *fscache; - if (!ctx) return;
//fscache_unuse_cookie(ctx->cookie, NULL, NULL); fscache_relinquish_cookie(ctx->cookie, NULL, false); - ctx->cookie = NULL; - iput(ctx->inode); - ctx->inode = NULL; - kfree(ctx); - *fscache = NULL; }
int erofs_fscache_register_fs(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); struct fscache_cookie *volume; + struct erofs_fscache *fscache; char *name; - int ret = 0;
name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid); if (!name) @@ -356,18 +346,27 @@ int erofs_fscache_register_fs(struct super_block *sb) NULL, 0, NULL, 0, true); if (IS_ERR_OR_NULL(volume)) { erofs_err(sb, "failed to register volume for %s", name); - ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; - volume = NULL; + kfree(name); + return volume ? PTR_ERR(volume) : -EOPNOTSUPP; } sbi->volume = volume; kfree(name); - return ret; + + fscache = erofs_fscache_register_cookie(sb, sbi->opt.fsid, true); + /* acquired volume will be relinquished in kill_sb() */ + if (IS_ERR(fscache)) + return PTR_ERR(fscache); + + sbi->s_fscache = fscache; + return 0; }
void erofs_fscache_unregister_fs(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb);
+ erofs_fscache_unregister_cookie(sbi->s_fscache); fscache_relinquish_cookie(sbi->volume, NULL, false); + sbi->s_fscache = NULL; sbi->volume = NULL; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index b9ba4627fdf3..d80da5460197 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -487,10 +487,9 @@ void erofs_fscache_unregister(void); int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb);
-int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, - char *name, bool need_inode); -void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache); +struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, + char *name, bool need_inode); +void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache); extern const struct address_space_operations erofs_fscache_access_aops; #else static inline int erofs_fscache_register(void) @@ -500,18 +499,18 @@ static inline int erofs_fscache_register(void) static inline void erofs_fscache_unregister(void) {} static inline int erofs_fscache_register_fs(struct super_block *sb) { - return 0; + return -EOPNOTSUPP; } static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
-static inline int erofs_fscache_register_cookie(struct super_block *sb, - struct erofs_fscache **fscache, - char *name, bool need_inode) +static inline +struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, + char *name, bool need_inode) { - return -EOPNOTSUPP; + return ERR_PTR(-EOPNOTSUPP); }
-static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache) +static inline void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache) { } #endif diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 31ee5e56ac0e..0149e8cb44e6 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -127,10 +127,10 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, struct erofs_device_info *dif, erofs_off_t *pos) { struct erofs_sb_info *sbi = EROFS_SB(sb); + struct erofs_fscache *fscache; struct erofs_deviceslot *dis; struct block_device *bdev; void *ptr; - int ret;
ptr = erofs_read_metabuf(buf, sb, erofs_blknr(*pos), EROFS_KMAP); if (IS_ERR(ptr)) @@ -148,10 +148,10 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, }
if (erofs_is_fscache_mode(sb)) { - ret = erofs_fscache_register_cookie(sb, &dif->fscache, - dif->path, false); - if (ret) - return ret; + fscache = erofs_fscache_register_cookie(sb, dif->path, false); + if (IS_ERR(fscache)) + return PTR_ERR(fscache); + dif->fscache = fscache; } else { bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL, sb->s_type); @@ -502,10 +502,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) err = erofs_fscache_register_fs(sb); if (err) return err; - err = erofs_fscache_register_cookie(sb, &sbi->s_fscache, - sbi->opt.fsid, true); - if (err) - return err;
err = super_setup_bdi(sb); if (err) @@ -594,7 +590,8 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
if (dif->bdev) blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL); - erofs_fscache_unregister_cookie(&dif->fscache); + erofs_fscache_unregister_cookie(dif->fscache); + dif->fscache = NULL; kfree(dif->path); kfree(dif); return 0; @@ -664,7 +661,6 @@ static void erofs_kill_sb(struct super_block *sb) if (!sbi) return; erofs_free_dev_context(sbi->devs); - erofs_fscache_unregister_cookie(&sbi->s_fscache); erofs_fscache_unregister_fs(sb); kfree(sbi->opt.fsid); kfree(sbi); @@ -683,7 +679,8 @@ static void erofs_put_super(struct super_block *sb) iput(sbi->managed_cache); sbi->managed_cache = NULL; #endif - erofs_fscache_unregister_cookie(&sbi->s_fscache); + erofs_fscache_unregister_cookie(sbi->s_fscache); + sbi->s_fscache = NULL; }
static struct file_system_type erofs_fs_type = {
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/51df01a7d001
--------------------------------
ANBZ: #3234
commit 8b7adf1dff3d5baf687acda936f193f80b7e0179 upstream.
A new fscache-based shared domain mode is going to be introduced for erofs. In which case, same data blobs in same domain will be shared and reused to reduce on-disk space usage.
The implementation of sharing blobs will be introduced in subsequent patches.
Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918043456.147-4-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 132 ++++++++++++++++++++++++++++++++++++++------ fs/erofs/internal.h | 9 +++ 2 files changed, 123 insertions(+), 18 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index ade0142d035b..e6e68436a493 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -1,9 +1,13 @@ /* * Copyright (C) 2022, Alibaba Cloud + * Copyright (C) 2022, Bytedance Inc. All rights reserved. */ #include <linux/fscache.h> #include "internal.h"
+static DEFINE_MUTEX(erofs_domain_list_lock); +static LIST_HEAD(erofs_domain_list); + struct fscache_netfs erofs_fscache_netfs = { .name = "erofs", .version = 0, @@ -268,6 +272,101 @@ const struct address_space_operations erofs_fscache_access_aops = { .invalidatepage = erofs_fscache_invalidate_page, };
+static void erofs_fscache_domain_put(struct erofs_domain *domain) +{ + if (!domain) + return; + mutex_lock(&erofs_domain_list_lock); + if (refcount_dec_and_test(&domain->ref)) { + list_del(&domain->list); + mutex_unlock(&erofs_domain_list_lock); + fscache_relinquish_cookie(domain->volume, NULL, false); + kfree(domain->domain_id); + kfree(domain); + return; + } + mutex_unlock(&erofs_domain_list_lock); +} + +static int erofs_fscache_register_volume(struct super_block *sb) +{ + struct erofs_sb_info *sbi = EROFS_SB(sb); + char *domain_id = sbi->opt.domain_id; + struct fscache_cookie *volume; + char *name; + int ret = 0; + + name = kasprintf(GFP_KERNEL, "erofs,%s", + domain_id ? domain_id : sbi->opt.fsid); + if (!name) + return -ENOMEM; + + volume = fscache_acquire_cookie(erofs_fscache_netfs.primary_index, + &erofs_fscache_super_index_def, name, strlen(name), + NULL, 0, NULL, 0, true); + if (IS_ERR_OR_NULL(volume)) { + erofs_err(sb, "failed to register volume for %s", name); + ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; + volume = NULL; + } + + sbi->volume = volume; + kfree(name); + return ret; +} + +static int erofs_fscache_init_domain(struct super_block *sb) +{ + int err; + struct erofs_domain *domain; + struct erofs_sb_info *sbi = EROFS_SB(sb); + + domain = kzalloc(sizeof(struct erofs_domain), GFP_KERNEL); + if (!domain) + return -ENOMEM; + + domain->domain_id = kstrdup(sbi->opt.domain_id, GFP_KERNEL); + if (!domain->domain_id) { + kfree(domain); + return -ENOMEM; + } + + err = erofs_fscache_register_volume(sb); + if (err) + goto out; + + domain->volume = sbi->volume; + refcount_set(&domain->ref, 1); + list_add(&domain->list, &erofs_domain_list); + sbi->domain = domain; + return 0; +out: + kfree(domain->domain_id); + kfree(domain); + return err; +} + +static int erofs_fscache_register_domain(struct super_block *sb) +{ + int err; + struct erofs_domain *domain; + struct erofs_sb_info *sbi = EROFS_SB(sb); + + mutex_lock(&erofs_domain_list_lock); + list_for_each_entry(domain, &erofs_domain_list, list) { + if (!strcmp(domain->domain_id, sbi->opt.domain_id)) { + sbi->domain = domain; + sbi->volume = domain->volume; + refcount_inc(&domain->ref); + mutex_unlock(&erofs_domain_list_lock); + return 0; + } + } + err = erofs_fscache_init_domain(sb); + mutex_unlock(&erofs_domain_list_lock); + return err; +} + struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, char *name, bool need_inode) { @@ -332,28 +431,19 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx)
int erofs_fscache_register_fs(struct super_block *sb) { + int ret; struct erofs_sb_info *sbi = EROFS_SB(sb); - struct fscache_cookie *volume; struct erofs_fscache *fscache; - char *name; - - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid); - if (!name) - return -ENOMEM;
- volume = fscache_acquire_cookie(erofs_fscache_netfs.primary_index, - &erofs_fscache_super_index_def, name, strlen(name), - NULL, 0, NULL, 0, true); - if (IS_ERR_OR_NULL(volume)) { - erofs_err(sb, "failed to register volume for %s", name); - kfree(name); - return volume ? PTR_ERR(volume) : -EOPNOTSUPP; - } - sbi->volume = volume; - kfree(name); + if (sbi->opt.domain_id) + ret = erofs_fscache_register_domain(sb); + else + ret = erofs_fscache_register_volume(sb); + if (ret) + return ret;
+ /* acquired domain/volume will be relinquished in kill_sb() on error */ fscache = erofs_fscache_register_cookie(sb, sbi->opt.fsid, true); - /* acquired volume will be relinquished in kill_sb() */ if (IS_ERR(fscache)) return PTR_ERR(fscache);
@@ -366,7 +456,13 @@ void erofs_fscache_unregister_fs(struct super_block *sb) struct erofs_sb_info *sbi = EROFS_SB(sb);
erofs_fscache_unregister_cookie(sbi->s_fscache); - fscache_relinquish_cookie(sbi->volume, NULL, false); + + if (sbi->domain) + erofs_fscache_domain_put(sbi->domain); + else + fscache_relinquish_cookie(sbi->volume, NULL, false); + sbi->s_fscache = NULL; sbi->volume = NULL; + sbi->domain = NULL; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index d80da5460197..339b21fffe50 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -66,6 +66,7 @@ struct erofs_mount_opts { #endif unsigned int mount_opt; char *fsid; + char *domain_id; };
struct erofs_dev_context { @@ -80,6 +81,13 @@ struct erofs_fs_context { struct erofs_dev_context *devs; };
+struct erofs_domain { + refcount_t ref; + struct list_head list; + struct fscache_cookie *volume; + char *domain_id; +}; + struct erofs_fscache { struct fscache_cookie *cookie; struct inode *inode; @@ -129,6 +137,7 @@ struct erofs_sb_info { /* fscache support */ struct fscache_cookie *volume; struct erofs_fscache *s_fscache; + struct erofs_domain *domain; };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/f232efaaee43
--------------------------------
ANBZ: #3234
commit a9849560c55e9e4ab9c53d073363dd6e19ec06ef upstream.
Use a pseudo mnt to manage shared cookies.
Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918043456.147-5-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 13 +++++++++++++ fs/erofs/internal.h | 1 + fs/erofs/super.c | 33 +++++++++++++++++++++++++++++++-- 3 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index e6e68436a493..f99e2fd3a146 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -7,6 +7,7 @@
static DEFINE_MUTEX(erofs_domain_list_lock); static LIST_HEAD(erofs_domain_list); +static struct vfsmount *erofs_pseudo_mnt;
struct fscache_netfs erofs_fscache_netfs = { .name = "erofs", @@ -279,6 +280,10 @@ static void erofs_fscache_domain_put(struct erofs_domain *domain) mutex_lock(&erofs_domain_list_lock); if (refcount_dec_and_test(&domain->ref)) { list_del(&domain->list); + if (list_empty(&erofs_domain_list)) { + kern_unmount(erofs_pseudo_mnt); + erofs_pseudo_mnt = NULL; + } mutex_unlock(&erofs_domain_list_lock); fscache_relinquish_cookie(domain->volume, NULL, false); kfree(domain->domain_id); @@ -335,6 +340,14 @@ static int erofs_fscache_init_domain(struct super_block *sb) if (err) goto out;
+ if (!erofs_pseudo_mnt) { + erofs_pseudo_mnt = kern_mount(&erofs_fs_type); + if (IS_ERR(erofs_pseudo_mnt)) { + err = PTR_ERR(erofs_pseudo_mnt); + goto out; + } + } + domain->volume = sbi->volume; refcount_set(&domain->ref, 1); list_add(&domain->list, &erofs_domain_list); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 339b21fffe50..7ec852b7d59d 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -329,6 +329,7 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value) }
extern const struct super_operations erofs_sops; +extern struct file_system_type erofs_fs_type;
extern const struct address_space_operations erofs_raw_access_aops; extern const struct address_space_operations z_erofs_aops; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 0149e8cb44e6..7b24b6952d30 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -473,6 +473,13 @@ static int erofs_init_managed_cache(struct super_block *sb) static int erofs_init_managed_cache(struct super_block *sb) { return 0; } #endif
+static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc) +{ + static const struct tree_descr empty_descr = {""}; + + return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr); +} + static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) { struct inode *inode; @@ -555,6 +562,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) return 0; }
+static int erofs_fc_anon_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, erofs_fc_fill_pseudo_super); +} + static int erofs_fc_get_tree(struct fs_context *fc) { struct erofs_fs_context *ctx = fc->fs_private; @@ -622,10 +634,21 @@ static const struct fs_context_operations erofs_context_ops = { .free = erofs_fc_free, };
+static const struct fs_context_operations erofs_anon_context_ops = { + .get_tree = erofs_fc_anon_get_tree, +}; + static int erofs_init_fs_context(struct fs_context *fc) { - struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + struct erofs_fs_context *ctx; + + /* pseudo mount for anon inodes */ + if (fc->sb_flags & SB_KERNMOUNT) { + fc->ops = &erofs_anon_context_ops; + return 0; + }
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL); @@ -652,6 +675,12 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
+ /* pseudo mount for anon inodes */ + if (sb->s_flags & SB_KERNMOUNT) { + kill_anon_super(sb); + return; + } + if (erofs_is_fscache_mode(sb)) kill_anon_super(sb); else @@ -683,7 +712,7 @@ static void erofs_put_super(struct super_block *sb) sbi->s_fscache = NULL; }
-static struct file_system_type erofs_fs_type = { +struct file_system_type erofs_fs_type = { .owner = THIS_MODULE, .name = "erofs", .init_fs_context = erofs_init_fs_context,
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/8f55d7266411
--------------------------------
ANBZ: #3234
commit 7d41963759feb3cfa4c1164b8b9db5d1f055932d upstream.
Several erofs filesystems can belong to one domain, and data blobs can be shared among these erofs filesystems of same domain.
Users could specify domain_id mount option to create or join into a domain.
Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918110150.6338-1-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 100 +++++++++++++++++++++++++++++++++++++++++--- fs/erofs/internal.h | 3 ++ 2 files changed, 97 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index f99e2fd3a146..217290ab8a8c 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -3,9 +3,11 @@ * Copyright (C) 2022, Bytedance Inc. All rights reserved. */ #include <linux/fscache.h> +#include <linux/mount.h> #include "internal.h"
static DEFINE_MUTEX(erofs_domain_list_lock); +static DEFINE_MUTEX(erofs_domain_cookies_lock); static LIST_HEAD(erofs_domain_list); static struct vfsmount *erofs_pseudo_mnt;
@@ -380,8 +382,9 @@ static int erofs_fscache_register_domain(struct super_block *sb) return err; }
-struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, bool need_inode) +static +struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, + char *name, bool need_inode) { struct erofs_fscache *ctx; struct fscache_cookie *cookie; @@ -431,17 +434,102 @@ struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, return ERR_PTR(ret); }
-void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) +static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx) { - if (!ctx) - return; - //fscache_unuse_cookie(ctx->cookie, NULL, NULL); fscache_relinquish_cookie(ctx->cookie, NULL, false); iput(ctx->inode); + kfree(ctx->name); kfree(ctx); }
+static +struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb, + char *name, bool need_inode) +{ + int err; + struct inode *inode; + struct erofs_fscache *ctx; + struct erofs_domain *domain = EROFS_SB(sb)->domain; + + ctx = erofs_fscache_acquire_cookie(sb, name, need_inode); + if (IS_ERR(ctx)) + return ctx; + + ctx->name = kstrdup(name, GFP_KERNEL); + if (!ctx->name) { + err = -ENOMEM; + goto out; + } + + inode = new_inode(erofs_pseudo_mnt->mnt_sb); + if (!inode) { + err = -ENOMEM; + goto out; + } + + ctx->domain = domain; + ctx->anon_inode = inode; + inode->i_private = ctx; + refcount_inc(&domain->ref); + return ctx; +out: + erofs_fscache_relinquish_cookie(ctx); + return ERR_PTR(err); +} + +static +struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, + char *name, bool need_inode) +{ + struct inode *inode; + struct erofs_fscache *ctx; + struct erofs_domain *domain = EROFS_SB(sb)->domain; + struct super_block *psb = erofs_pseudo_mnt->mnt_sb; + + mutex_lock(&erofs_domain_cookies_lock); + list_for_each_entry(inode, &psb->s_inodes, i_sb_list) { + ctx = inode->i_private; + if (!ctx || ctx->domain != domain || strcmp(ctx->name, name)) + continue; + igrab(inode); + mutex_unlock(&erofs_domain_cookies_lock); + return ctx; + } + ctx = erofs_fscache_domain_init_cookie(sb, name, need_inode); + mutex_unlock(&erofs_domain_cookies_lock); + return ctx; +} + +struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, + char *name, bool need_inode) +{ + if (EROFS_SB(sb)->opt.domain_id) + return erofs_domain_register_cookie(sb, name, need_inode); + return erofs_fscache_acquire_cookie(sb, name, need_inode); +} + +void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) +{ + bool drop; + struct erofs_domain *domain; + + if (!ctx) + return; + domain = ctx->domain; + if (domain) { + mutex_lock(&erofs_domain_cookies_lock); + drop = atomic_read(&ctx->anon_inode->i_count) == 1; + iput(ctx->anon_inode); + mutex_unlock(&erofs_domain_cookies_lock); + if (!drop) + return; + } + + erofs_fscache_relinquish_cookie(ctx); + erofs_fscache_domain_put(domain); +} + int erofs_fscache_register_fs(struct super_block *sb) { int ret; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 7ec852b7d59d..82a47b6ff9ca 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -91,6 +91,9 @@ struct erofs_domain { struct erofs_fscache { struct fscache_cookie *cookie; struct inode *inode; + struct inode *anon_inode; + struct erofs_domain *domain; + char *name; };
struct erofs_sb_info {
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d699a85b7a18
--------------------------------
ANBZ: #3234
commit 2ef164414123fcf574aff7a0be5f71f7e60a3fec upstream.
Introduce 'domain_id' mount option to enable shared domain sementics. In which case, the related cookie is shared if two mountpoints in the same domain have the same data blob. Users could specify the name of domain by this mount option.
Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/20220918043456.147-7-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893
Conflicts: fs/erofs/super.c
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/super.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 7b24b6952d30..14902817d4e9 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -318,6 +318,7 @@ enum { Opt_cache_strategy, Opt_device, Opt_fsid, + Opt_domain_id, Opt_err };
@@ -335,6 +336,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { erofs_param_cache_strategy), fsparam_string("device", Opt_device), fsparam_string("fsid", Opt_fsid), + fsparam_string("domain_id", Opt_domain_id), {} };
@@ -406,6 +408,16 @@ static int erofs_fc_parse_param(struct fs_context *fc, #else errorfc(fc, "fsid option not supported"); return -EINVAL; +#endif + break; + case Opt_domain_id: +#ifdef CONFIG_EROFS_FS_ONDEMAND + kfree(ctx->opt.domain_id); + ctx->opt.domain_id = kstrdup(param->string, GFP_KERNEL); + if (!ctx->opt.domain_id) + return -ENOMEM; +#else + errorfc(fc, "domain_id option not supported"); #endif break; default: @@ -499,6 +511,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_fs_info = sbi; sbi->opt = ctx->opt; ctx->opt.fsid = NULL; + ctx->opt.domain_id = NULL; sbi->devs = ctx->devs; ctx->devs = NULL;
@@ -624,6 +637,7 @@ static void erofs_fc_free(struct fs_context *fc)
erofs_free_dev_context(ctx->devs); kfree(ctx->opt.fsid); + kfree(ctx->opt.domain_id); kfree(ctx); }
@@ -692,6 +706,7 @@ static void erofs_kill_sb(struct super_block *sb) erofs_free_dev_context(sbi->devs); erofs_fscache_unregister_fs(sb); kfree(sbi->opt.fsid); + kfree(sbi->opt.domain_id); kfree(sbi); sb->s_fs_info = NULL; } @@ -830,6 +845,8 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) #ifdef CONFIG_EROFS_FS_ONDEMAND if (sbi->opt.fsid) seq_printf(seq, ",fsid=%s", sbi->opt.fsid); + if (opt->domain_id) + seq_printf(seq, ",domain_id=%s", opt->domain_id); #endif return 0; }
From: Dawei Li set_pte_at@outlook.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/496c8bbd9a1b
--------------------------------
ANBZ: #3234
commit ce4b815686573bef82d5ee53bf6f509bf20904dc upstream.
s_inodes is superblock-specific resource, which should be protected by sb's specific lock s_inode_list_lock.
Link: https://lore.kernel.org/r/TYCP286MB23238380DE3B74874E8D78ABCA299@TYCP286MB23... Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain") Reviewed-by: Yue Hu huyue2@coolpad.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Dawei Li set_pte_at@outlook.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893 Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 217290ab8a8c..a6302a06f1e4 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -488,14 +488,17 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
mutex_lock(&erofs_domain_cookies_lock); + spin_lock(&psb->s_inode_list_lock); list_for_each_entry(inode, &psb->s_inodes, i_sb_list) { ctx = inode->i_private; if (!ctx || ctx->domain != domain || strcmp(ctx->name, name)) continue; igrab(inode); + spin_unlock(&psb->s_inode_list_lock); mutex_unlock(&erofs_domain_cookies_lock); return ctx; } + spin_unlock(&psb->s_inode_list_lock); ctx = erofs_fscache_domain_init_cookie(sb, name, need_inode); mutex_unlock(&erofs_domain_cookies_lock); return ctx;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/3dd899f1ae08
--------------------------------
ANBZ: #3234
commit 39bfcb8138f6dc3375f23b1e62ccfc7c0d83295d upstream.
When erofs instance is remounted with fsid or domain_id mount option specified, the original fsid and domain_id string pointer in sbi->opt is directly overridden with the fsid and domain_id string in the new fs_context, without freeing the original fsid and domain_id string. What's worse, when the new fsid and domain_id string is transferred to sbi, they are not reset to NULL in fs_context, and thus they are freed when remount finishes, while sbi is still referring to these strings.
Reconfiguration for fsid and domain_id seems unusual. Thus clarify this restriction explicitly and dump a warning when users are attempting to do this.
Besides, to fix the use-after-free issue, move fsid and domain_id from erofs_mount_opts to outside.
Fixes: c6be2bd0a5dd ("erofs: register fscache volume") Fixes: 8b7adf1dff3d ("erofs: introduce fscache-based domain") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Chao Yu chao@kernel.org Link: https://lore.kernel.org/r/20221021023153.1330-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/893
Conflicts: fs/erofs/super.c
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 14 +++++++------- fs/erofs/internal.h | 6 ++++-- fs/erofs/super.c | 39 ++++++++++++++++++++++----------------- 3 files changed, 33 insertions(+), 26 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index a6302a06f1e4..5c7208418d44 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -298,13 +298,13 @@ static void erofs_fscache_domain_put(struct erofs_domain *domain) static int erofs_fscache_register_volume(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); - char *domain_id = sbi->opt.domain_id; + char *domain_id = sbi->domain_id; struct fscache_cookie *volume; char *name; int ret = 0;
name = kasprintf(GFP_KERNEL, "erofs,%s", - domain_id ? domain_id : sbi->opt.fsid); + domain_id ? domain_id : sbi->fsid); if (!name) return -ENOMEM;
@@ -332,7 +332,7 @@ static int erofs_fscache_init_domain(struct super_block *sb) if (!domain) return -ENOMEM;
- domain->domain_id = kstrdup(sbi->opt.domain_id, GFP_KERNEL); + domain->domain_id = kstrdup(sbi->domain_id, GFP_KERNEL); if (!domain->domain_id) { kfree(domain); return -ENOMEM; @@ -369,7 +369,7 @@ static int erofs_fscache_register_domain(struct super_block *sb)
mutex_lock(&erofs_domain_list_lock); list_for_each_entry(domain, &erofs_domain_list, list) { - if (!strcmp(domain->domain_id, sbi->opt.domain_id)) { + if (!strcmp(domain->domain_id, sbi->domain_id)) { sbi->domain = domain; sbi->volume = domain->volume; refcount_inc(&domain->ref); @@ -507,7 +507,7 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, char *name, bool need_inode) { - if (EROFS_SB(sb)->opt.domain_id) + if (EROFS_SB(sb)->domain_id) return erofs_domain_register_cookie(sb, name, need_inode); return erofs_fscache_acquire_cookie(sb, name, need_inode); } @@ -539,7 +539,7 @@ int erofs_fscache_register_fs(struct super_block *sb) struct erofs_sb_info *sbi = EROFS_SB(sb); struct erofs_fscache *fscache;
- if (sbi->opt.domain_id) + if (sbi->domain_id) ret = erofs_fscache_register_domain(sb); else ret = erofs_fscache_register_volume(sb); @@ -547,7 +547,7 @@ int erofs_fscache_register_fs(struct super_block *sb) return ret;
/* acquired domain/volume will be relinquished in kill_sb() on error */ - fscache = erofs_fscache_register_cookie(sb, sbi->opt.fsid, true); + fscache = erofs_fscache_register_cookie(sb, sbi->fsid, true); if (IS_ERR(fscache)) return PTR_ERR(fscache);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 82a47b6ff9ca..59de35fec51e 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -65,8 +65,6 @@ struct erofs_mount_opts { unsigned int max_sync_decompress_pages; #endif unsigned int mount_opt; - char *fsid; - char *domain_id; };
struct erofs_dev_context { @@ -79,6 +77,8 @@ struct erofs_dev_context { struct erofs_fs_context { struct erofs_mount_opts opt; struct erofs_dev_context *devs; + char *fsid; + char *domain_id; };
struct erofs_domain { @@ -141,6 +141,8 @@ struct erofs_sb_info { struct fscache_cookie *volume; struct erofs_fscache *s_fscache; struct erofs_domain *domain; + char *fsid; + char *domain_id; };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 14902817d4e9..606d4637a795 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -401,9 +401,9 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; case Opt_fsid: #ifdef CONFIG_EROFS_FS_ONDEMAND - kfree(ctx->opt.fsid); - ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL); - if (!ctx->opt.fsid) + kfree(ctx->fsid); + ctx->fsid = kstrdup(param->string, GFP_KERNEL); + if (!ctx->fsid) return -ENOMEM; #else errorfc(fc, "fsid option not supported"); @@ -412,9 +412,9 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; case Opt_domain_id: #ifdef CONFIG_EROFS_FS_ONDEMAND - kfree(ctx->opt.domain_id); - ctx->opt.domain_id = kstrdup(param->string, GFP_KERNEL); - if (!ctx->opt.domain_id) + kfree(ctx->domain_id); + ctx->domain_id = kstrdup(param->string, GFP_KERNEL); + if (!ctx->domain_id) return -ENOMEM; #else errorfc(fc, "domain_id option not supported"); @@ -510,10 +510,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi; sbi->opt = ctx->opt; - ctx->opt.fsid = NULL; - ctx->opt.domain_id = NULL; sbi->devs = ctx->devs; ctx->devs = NULL; + sbi->fsid = ctx->fsid; + ctx->fsid = NULL; + sbi->domain_id = ctx->domain_id; + ctx->domain_id = NULL;
if (erofs_is_fscache_mode(sb)) { sb->s_blocksize = EROFS_BLKSIZ; @@ -584,7 +586,7 @@ static int erofs_fc_get_tree(struct fs_context *fc) { struct erofs_fs_context *ctx = fc->fs_private;
- if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid) + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->fsid) return get_tree_nodev(fc, erofs_fc_fill_super);
return get_tree_bdev(fc, erofs_fc_fill_super); @@ -598,6 +600,9 @@ static int erofs_fc_reconfigure(struct fs_context *fc)
DBG_BUGON(!sb_rdonly(sb));
+ if (ctx->fsid || ctx->domain_id) + erofs_info(sb, "ignoring reconfiguration for fsid|domain_id."); + if (test_opt(&ctx->opt, POSIX_ACL)) fc->sb_flags |= SB_POSIXACL; else @@ -636,8 +641,8 @@ static void erofs_fc_free(struct fs_context *fc) struct erofs_fs_context *ctx = fc->fs_private;
erofs_free_dev_context(ctx->devs); - kfree(ctx->opt.fsid); - kfree(ctx->opt.domain_id); + kfree(ctx->fsid); + kfree(ctx->domain_id); kfree(ctx); }
@@ -705,8 +710,8 @@ static void erofs_kill_sb(struct super_block *sb) return; erofs_free_dev_context(sbi->devs); erofs_fscache_unregister_fs(sb); - kfree(sbi->opt.fsid); - kfree(sbi->opt.domain_id); + kfree(sbi->fsid); + kfree(sbi->domain_id); kfree(sbi); sb->s_fs_info = NULL; } @@ -843,10 +848,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",cache_strategy=readaround"); #endif #ifdef CONFIG_EROFS_FS_ONDEMAND - if (sbi->opt.fsid) - seq_printf(seq, ",fsid=%s", sbi->opt.fsid); - if (opt->domain_id) - seq_printf(seq, ",domain_id=%s", opt->domain_id); + if (sbi->fsid) + seq_printf(seq, ",fsid=%s", sbi->fsid); + if (sbi->domain_id) + seq_printf(seq, ",domain_id=%s", sbi->domain_id); #endif return 0; }
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/4590e9194687
--------------------------------
ANBZ: #4627
commit 8b58f9f02162124c2149779af401c8115c70b649 upstream.
For erofs_map_blocks() and erofs_map_blocks_flatmode(), the flags argument is always EROFS_GET_BLOCKS_RAW. Thus remove the unused flags parameter for these two functions.
Besides EROFS_GET_BLOCKS_RAW is originally introduced for reading compressed (raw) data for compressed files. However it's never used actually and let's remove it now.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Yue Hu huyue2@coolpad.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Link: https://lore.kernel.org/r/20230209024825.17335-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1561
Conflicts: fs/erofs/data.c fs/erofs/internal.h
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 16 +++++++--------- fs/erofs/fscache.c | 4 ++-- fs/erofs/internal.h | 6 +----- include/trace/events/erofs.h | 3 +-- 4 files changed, 11 insertions(+), 18 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 89f76d7a3452..44fea7bf7bb8 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -96,8 +96,7 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, }
static int erofs_map_blocks_flatmode(struct inode *inode, - struct erofs_map_blocks *map, - int flags) + struct erofs_map_blocks *map) { erofs_blk_t nblocks, lastblk; u64 offset = map->m_la; @@ -139,8 +138,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode, return 0; }
-int erofs_map_blocks(struct inode *inode, - struct erofs_map_blocks *map, int flags) +int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) { struct super_block *sb = inode->i_sb; struct erofs_inode *vi = EROFS_I(inode); @@ -152,7 +150,7 @@ int erofs_map_blocks(struct inode *inode, void *kaddr; int err = 0;
- trace_erofs_map_blocks_enter(inode, map, flags); + trace_erofs_map_blocks_enter(inode, map, 0); map->m_deviceid = 0; if (map->m_la >= inode->i_size) { /* leave out-of-bound access unmapped */ @@ -162,7 +160,7 @@ int erofs_map_blocks(struct inode *inode, }
if (vi->datalayout != EROFS_INODE_CHUNK_BASED) { - err = erofs_map_blocks_flatmode(inode, map, flags); + err = erofs_map_blocks_flatmode(inode, map); goto out; }
@@ -214,7 +212,7 @@ int erofs_map_blocks(struct inode *inode, out: if (!err) map->m_llen = map->m_plen; - trace_erofs_map_blocks_exit(inode, map, flags, 0); + trace_erofs_map_blocks_exit(inode, map, 0, err); return err; }
@@ -297,7 +295,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, erofs_blk_t blknr; unsigned int blkoff;
- err = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + err = erofs_map_blocks(inode, &map); if (err) goto err_out;
@@ -467,7 +465,7 @@ static sector_t erofs_bmap(struct address_space *mapping, sector_t block) return 0; }
- if (!erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW)) + if (!erofs_map_blocks(inode, &map)) return erofs_blknr(map.m_pa);
return 0; diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 5c7208418d44..36bb49ecf6a2 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -142,7 +142,7 @@ static int erofs_fscache_readpage(struct file *file, struct page *page) int ret;
map.m_la = pos; - ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + ret = erofs_map_blocks(inode, &map); if (ret) goto out_unlock;
@@ -212,7 +212,7 @@ static void erofs_fscache_readahead(struct readahead_control *rac) pos = start + done;
map.m_la = pos; - ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); + ret = erofs_map_blocks(inode, &map); if (ret) return;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 59de35fec51e..f7b01a4be183 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -388,9 +388,6 @@ struct erofs_map_blocks { unsigned int m_flags; };
-/* Flags used by erofs_map_blocks_flatmode() */ -#define EROFS_GET_BLOCKS_RAW 0x0001 - /* zmap.c */ #ifdef CONFIG_EROFS_FS_ZIP int z_erofs_fill_inode(struct inode *inode); @@ -421,8 +418,7 @@ void erofs_put_metabuf(struct erofs_buf *buf); void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, erofs_blk_t blkaddr, enum erofs_kmap_type type); int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev); -int erofs_map_blocks(struct inode *inode, - struct erofs_map_blocks *map, int flags); +int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map);
/* inode.c */ static inline unsigned long erofs_inode_hash(erofs_nid_t nid) diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h index 717ddd17acb1..f02427cb664c 100644 --- a/include/trace/events/erofs.h +++ b/include/trace/events/erofs.h @@ -18,8 +18,7 @@ struct erofs_map_blocks; { 0, "FILE" }, \ { 1, "DIR" })
-#define show_map_flags(flags) __print_flags(flags, "|", \ - { EROFS_GET_BLOCKS_RAW, "RAW" }) +#define show_map_flags(flags) __print_flags(flags, "|", {} )
#define show_mflags(flags) __print_flags(flags, "", \ { EROFS_MAP_MAPPED, "M" }, \
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/770910348d2b
--------------------------------
ANBZ: #4201
This optimization is inspired by [1].
When the on-demand routine is triggered, the user daemon will fetch data from remote and then write the fetched data to the backing file with DIRECT IO to avoid double page cache. When the on-demand routine finishes, the process requesting for IO will read data from the backing file (also with DIRECT IO). This mechanism will cause read latency since these two DIRECT IO.
To optimize this, make the user daemon buffer write the backing file. Make the process requesting for IO read data from the page cache of the backing file first if there's any, and otherwise fallback to the DIRECT IO.
Be noted that this patch only refactors the write routine to buffered IO, because the read routine prior to the refactoring of FsCache/Cachefiles, i.e. cachefiles_read_backing_file_one() in the normal read path and cachefiles_prepare_read() in the readahead path, has already worked in this way of trying to buffer read first.
Link: [1] https://github.com/dragonflyoss/image-service/blob/master/contrib/kernel-pat... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1252 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 250b98e9820c..950dc5a71cb6 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -3,8 +3,12 @@ #include <linux/file.h> #include <linux/anon_inodes.h> #include <linux/uio.h> +#include <linux/module.h> #include "internal.h"
+static bool cachefiles_buffered_ondemand = true; +module_param_named(buffered_ondemand, cachefiles_buffered_ondemand, bool, 0644); + static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file) { @@ -52,13 +56,23 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, { struct cachefiles_object *object = kiocb->ki_filp->private_data; size_t len = iter->count; - loff_t pos = kiocb->ki_pos; + struct kiocb iocb; int ret;
if (!object->file) return -ENOBUFS;
- ret = vfs_iter_write(object->file, iter, &pos, 0); + iocb = (struct kiocb) { + .ki_filp = object->file, + .ki_pos = kiocb->ki_pos, + .ki_flags = IOCB_WRITE, + .ki_ioprio = get_current_ioprio(), + }; + + if (!cachefiles_buffered_ondemand) + iocb.ki_flags |= IOCB_DIRECT; + + ret = vfs_iocb_iter_write(object->file, &iocb, iter); if (ret != len) return -EIO; return len;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/299a5cb8249c
--------------------------------
ANBZ: #3213
... in prep for the following failover feature for the on-demand mode of Cachefiles.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/main.c | 1 + include/linux/fscache-cache.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/fscache/main.c b/fs/fscache/main.c index 4207f98e405f..a9f059220418 100644 --- a/fs/fscache/main.c +++ b/fs/fscache/main.c @@ -39,6 +39,7 @@ MODULE_PARM_DESC(fscache_debug,
struct kobject *fscache_root; struct workqueue_struct *fscache_object_wq; +EXPORT_SYMBOL(fscache_object_wq); struct workqueue_struct *fscache_op_wq;
DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait); diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 31f2f13e2924..f3ae78d1e5f3 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -74,6 +74,7 @@ struct fscache_cache { };
extern wait_queue_head_t fscache_cache_cleared_wq; +extern struct workqueue_struct *fscache_object_wq;
/* * operation to be applied to a cache object
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/538bf7c68f81
--------------------------------
ANBZ: #3213
cherry-picked from [1]
Previously, @ondemand_id field was used not only to identify ondemand state of the object, but also to represent the index of the xarray. This commit introduces @state field to decouple the role of @ondemand_id and adds helpers to access it.
Reviewed-by: Xin Yin yinxin.x@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Link: [1] https://lore.kernel.org/lkml/20221014080559.42108-5-zhujia.zj@bytedance.com/... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 22 ++++++++++++++++++++++ fs/cachefiles/ondemand.c | 24 +++++++++++------------- 2 files changed, 33 insertions(+), 13 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index b8ef5be59005..20d4c41bed2e 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -31,6 +31,11 @@ extern unsigned cachefiles_debug;
#define cachefiles_gfp (__GFP_RECLAIM | __GFP_NORETRY | __GFP_NOMEMALLOC)
+enum cachefiles_object_state { + CACHEFILES_ONDEMAND_OBJSTATE_close, /* Anonymous fd closed by daemon or initial state */ + CACHEFILES_ONDEMAND_OBJSTATE_open, /* Anonymous fd associated with object is available */ +}; + /* * node records */ @@ -50,6 +55,7 @@ struct cachefiles_object { struct rb_node active_node; /* link in active tree (dentry is key) */ #ifdef CONFIG_CACHEFILES_ONDEMAND int ondemand_id; + enum cachefiles_object_state state; #endif };
@@ -263,6 +269,22 @@ extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); extern int cachefiles_ondemand_read(struct cachefiles_object *object, loff_t pos, size_t len); + +#define CACHEFILES_OBJECT_STATE_FUNCS(_state) \ +static inline bool \ +cachefiles_ondemand_object_is_##_state(const struct cachefiles_object *object) \ +{ \ + return object->state == CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ +} \ + \ +static inline void \ +cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \ +{ \ + object->state = CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ +} + +CACHEFILES_OBJECT_STATE_FUNCS(open); +CACHEFILES_OBJECT_STATE_FUNCS(close); #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 950dc5a71cb6..253d7e3a2bf5 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -24,6 +24,8 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
xa_lock(&cache->reqs); object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + cachefiles_ondemand_set_object_close(object); + /* * Flush all pending READ requests since their completion depends on * anon_fd. @@ -181,6 +183,8 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) else set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
+ cachefiles_ondemand_set_object_open(req->object); + out: complete(&req->done); return ret; @@ -398,8 +402,8 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
/* coupled with the barrier in cachefiles_flush_reqs() */ smp_mb(); - - if (opcode != CACHEFILES_OP_OPEN && object->ondemand_id <= 0) { + if (opcode != CACHEFILES_OP_OPEN && + !cachefiles_ondemand_object_is_open(object)) { WARN_ON_ONCE(object->ondemand_id == 0); xa_unlock(&cache->reqs); ret = -EIO; @@ -463,18 +467,11 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, void *private) { struct cachefiles_object *object = req->object; - int object_id = object->ondemand_id;
- /* - * It's possible that object id is still 0 if the cookie looking up - * phase failed before OPEN request has ever been sent. Also avoid - * sending CLOSE request for CACHEFILES_ONDEMAND_ID_CLOSED, which means - * anon_fd has already been closed. - */ - if (object_id <= 0) + if (!cachefiles_ondemand_object_is_open(object)) return -ENOENT;
- req->msg.object_id = object_id; + req->msg.object_id = object->ondemand_id; return 0; }
@@ -492,7 +489,7 @@ static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, int object_id = object->ondemand_id;
/* Stop enqueuing requests when daemon has closed anon_fd. */ - if (object_id <= 0) { + if (!cachefiles_ondemand_object_is_open(object)) { WARN_ON_ONCE(object_id == 0); pr_info_once("READ: anonymous fd closed prematurely.\n"); return -EIO; @@ -515,7 +512,8 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object) * creating a new tmpfile as the cache file. Reuse the previously * allocated object ID if any. */ - if (object->ondemand_id > 0 || object->type == FSCACHE_COOKIE_TYPE_INDEX) + if (cachefiles_ondemand_object_is_open(object) || + object->type == FSCACHE_COOKIE_TYPE_INDEX) return 0;
volume_key_size = object->fscache.parent->cookie->key_len + 1;
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/679445f70359
--------------------------------
ANBZ: #3213
cherry-picked from [1]
We'll introduce a @work_struct field for @object in subsequent patches, it will enlarge the size of @object. As the result of that, this commit extracts ondemand info field from @object.
Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Link: [1] https://lore.kernel.org/lkml/20221014080559.42108-5-zhujia.zj@bytedance.com/... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 6 ++++++ fs/cachefiles/internal.h | 22 ++++++++++++++++------ fs/cachefiles/ondemand.c | 29 +++++++++++++++++++++++------ 3 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 634e7041c0f3..0a946d046724 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -51,6 +51,9 @@ static struct fscache_object *cachefiles_alloc_object(
fscache_object_init(&object->fscache, cookie, &cache->cache);
+ if (cachefiles_ondemand_init_obj_info(object)) + goto nomem_obj_info; + object->type = cookie->def->type;
/* get hold of the raw key @@ -102,6 +105,8 @@ static struct fscache_object *cachefiles_alloc_object( nomem_key: kfree(buffer); nomem_buffer: + kfree(object->private); +nomem_obj_info: BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)); kmem_cache_free(cachefiles_object_jar, object); fscache_object_destroyed(&cache->cache); @@ -373,6 +378,7 @@ static void cachefiles_put_object(struct fscache_object *_object, }
cache = object->fscache.cache; + kfree(object->private); fscache_object_destroy(&object->fscache); kmem_cache_free(cachefiles_object_jar, object); fscache_object_destroyed(cache); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 20d4c41bed2e..c44782e12ab1 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -36,6 +36,12 @@ enum cachefiles_object_state { CACHEFILES_ONDEMAND_OBJSTATE_open, /* Anonymous fd associated with object is available */ };
+struct cachefiles_ondemand_info { + int ondemand_id; + enum cachefiles_object_state state; + struct cachefiles_object *object; +}; + /* * node records */ @@ -53,10 +59,7 @@ struct cachefiles_object { uint8_t new; /* T if object new */ spinlock_t work_lock; struct rb_node active_node; /* link in active tree (dentry is key) */ -#ifdef CONFIG_CACHEFILES_ONDEMAND - int ondemand_id; - enum cachefiles_object_state state; -#endif + struct cachefiles_ondemand_info *private; };
extern struct kmem_cache *cachefiles_object_jar; @@ -270,17 +273,19 @@ extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); extern int cachefiles_ondemand_read(struct cachefiles_object *object, loff_t pos, size_t len);
+extern int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object); + #define CACHEFILES_OBJECT_STATE_FUNCS(_state) \ static inline bool \ cachefiles_ondemand_object_is_##_state(const struct cachefiles_object *object) \ { \ - return object->state == CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ + return object->private->state == CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ } \ \ static inline void \ cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \ { \ - object->state = CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ + object->private->state = CACHEFILES_ONDEMAND_OBJSTATE_##_state; \ }
CACHEFILES_OBJECT_STATE_FUNCS(open); @@ -305,6 +310,11 @@ static inline int cachefiles_ondemand_read(struct cachefiles_object *object, { return -EOPNOTSUPP; } + +static inline int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object) +{ + return 0; +} #endif
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 253d7e3a2bf5..27da8cc76899 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -13,17 +13,18 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file) { struct cachefiles_object *object = file->private_data; - int object_id = object->ondemand_id; struct cachefiles_cache *cache; void **slot; struct radix_tree_iter iter; + struct cachefiles_ondemand_info *info = object->private; + int object_id = info->ondemand_id; struct cachefiles_req *req;
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
xa_lock(&cache->reqs); - object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; + info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; cachefiles_ondemand_set_object_close(object);
/* @@ -235,7 +236,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) load = (void *)req->msg.data; load->fd = fd; req->msg.object_id = object_id; - object->ondemand_id = object_id; + object->private->ondemand_id = object_id;
cachefiles_get_unbind_pincount(cache); return 0; @@ -404,7 +405,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, smp_mb(); if (opcode != CACHEFILES_OP_OPEN && !cachefiles_ondemand_object_is_open(object)) { - WARN_ON_ONCE(object->ondemand_id == 0); + WARN_ON_ONCE(object->private->ondemand_id == 0); xa_unlock(&cache->reqs); ret = -EIO; goto out; @@ -471,7 +472,7 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req, if (!cachefiles_ondemand_object_is_open(object)) return -ENOENT;
- req->msg.object_id = object->ondemand_id; + req->msg.object_id = object->private->ondemand_id; return 0; }
@@ -486,7 +487,7 @@ static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, struct cachefiles_object *object = req->object; struct cachefiles_read *load = (void *)req->msg.data; struct cachefiles_read_ctx *read_ctx = private; - int object_id = object->ondemand_id; + int object_id = object->private->ondemand_id;
/* Stop enqueuing requests when daemon has closed anon_fd. */ if (!cachefiles_ondemand_object_is_open(object)) { @@ -539,3 +540,19 @@ int cachefiles_ondemand_read(struct cachefiles_object *object, sizeof(struct cachefiles_read), cachefiles_ondemand_init_read_req, &read_ctx); } + +int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object) +{ + struct cachefiles_cache *cache; + + cache = container_of(object->fscache.cache, struct cachefiles_cache, cache); + if (!cachefiles_in_ondemand_mode(cache)) + return 0; + + object->private = kzalloc(sizeof(struct cachefiles_ondemand_info), GFP_KERNEL); + if (!object->private) + return -ENOMEM; + + object->private->object = object; + return 0; +}
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/5d1970fc484a
--------------------------------
ANBZ: #3213
cherry-picked from [1]
When an anonymous fd is closed by user daemon, if there is a new read request for this file comes up, the anonymous fd should be re-opened to handle that read request rather than fail it directly.
1. Introduce reopening state for objects that are closed but have inflight/subsequent read requests. 2. No longer flush READ requests but only CLOSE requests when anonymous fd is closed. 3. Enqueue the reopen work to workqueue, thus user daemon could get rid of daemon_read context and handle that request smoothly. Otherwise, the user daemon will send a reopen request and wait for itself to process the request.
Reviewed-by: Xin Yin yinxin.x@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Link: [1] https://lore.kernel.org/lkml/20221014080559.42108-5-zhujia.zj@bytedance.com/... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 3 + fs/cachefiles/ondemand.c | 125 ++++++++++++++++++++++++--------------- 2 files changed, 79 insertions(+), 49 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index c44782e12ab1..9f460188e1a8 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -34,9 +34,11 @@ extern unsigned cachefiles_debug; enum cachefiles_object_state { CACHEFILES_ONDEMAND_OBJSTATE_close, /* Anonymous fd closed by daemon or initial state */ CACHEFILES_ONDEMAND_OBJSTATE_open, /* Anonymous fd associated with object is available */ + CACHEFILES_ONDEMAND_OBJSTATE_reopening, /* Object that was closed and is being reopened. */ };
struct cachefiles_ondemand_info { + struct work_struct work; int ondemand_id; enum cachefiles_object_state state; struct cachefiles_object *object; @@ -290,6 +292,7 @@ cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \
CACHEFILES_OBJECT_STATE_FUNCS(open); CACHEFILES_OBJECT_STATE_FUNCS(close); +CACHEFILES_OBJECT_STATE_FUNCS(reopening); #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 27da8cc76899..79a8eea88c2e 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -27,18 +27,14 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; cachefiles_ondemand_set_object_close(object);
- /* - * Flush all pending READ requests since their completion depends on - * anon_fd. - */ - radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { + /* Only flush CACHEFILES_REQ_NEW marked req to avoid race with daemon_read */ + radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, CACHEFILES_REQ_NEW) { req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); if (WARN_ON(!req)) continue; if (req->msg.object_id == object_id && - req->msg.opcode == CACHEFILES_OP_READ) { - req->error = -EIO; + req->msg.opcode == CACHEFILES_OP_CLOSE) { complete(&req->done); radix_tree_iter_delete(&cache->reqs, &iter, slot); } @@ -185,6 +181,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
cachefiles_ondemand_set_object_open(req->object); + wake_up_all(&cache->daemon_pollwq);
out: complete(&req->done); @@ -235,7 +232,6 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
load = (void *)req->msg.data; load->fd = fd; - req->msg.object_id = object_id; object->private->ondemand_id = object_id;
cachefiles_get_unbind_pincount(cache); @@ -253,16 +249,58 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) return ret; }
+static void ondemand_object_worker(struct work_struct *work) +{ + struct cachefiles_object *object; + + object = ((struct cachefiles_ondemand_info *)work)->object; + cachefiles_ondemand_init_object(object); +} + +/* + * Find a request to be handled in the range of [start, end]. If there are any + * inflight or subsequent READ requests on the closed object, reopen it. Skip + * read requests whose related object is reopening. + */ +static struct cachefiles_req *cachefiles_ondemand_select_req(struct cachefiles_cache *cache, + struct radix_tree_iter *iter, + unsigned long start, + unsigned long end) +{ + void **slot; + struct cachefiles_req *req; + struct cachefiles_ondemand_info *info; + + radix_tree_for_each_tagged(slot, &cache->reqs, iter, start, CACHEFILES_REQ_NEW) { + req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); + if (WARN_ON(!req)) + return NULL; + if (iter->index > end) + return NULL; + if (req->msg.opcode != CACHEFILES_OP_READ) + return req; + info = req->object->private; + if (cachefiles_ondemand_object_is_close(req->object)) { + cachefiles_ondemand_set_object_reopening(req->object); + queue_work(fscache_object_wq, &info->work); + continue; + } else if (cachefiles_ondemand_object_is_reopening(req->object)) { + continue; + } + return req; + } + return NULL; +} + ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) { - struct cachefiles_req *req = NULL; + struct cachefiles_req *req; struct cachefiles_msg *msg; unsigned long id = 0; size_t n; int ret = 0; struct radix_tree_iter iter; - void **slot;
/* * Cyclically search for a request that has not ever been processed, @@ -270,25 +308,9 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, * request distribution fair. */ xa_lock(&cache->reqs); - radix_tree_for_each_tagged(slot, &cache->reqs, &iter, cache->req_id_next, - CACHEFILES_REQ_NEW) { - req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); - WARN_ON(!req); - break; - } - - if (!req && cache->req_id_next > 0) { - radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, - CACHEFILES_REQ_NEW) { - if (iter.index >= cache->req_id_next) - break; - req = radix_tree_deref_slot_protected(slot, &cache->reqs.xa_lock); - WARN_ON(!req); - break; - } - } - - /* no request tagged with CACHEFILES_REQ_NEW found */ + req = cachefiles_ondemand_select_req(cache, &iter, cache->req_id_next, ULONG_MAX); + if (!req && cache->req_id_next > 0) + req = cachefiles_ondemand_select_req(cache, &iter, 0, cache->req_id_next - 1); if (!req) { xa_unlock(&cache->reqs); return 0; @@ -307,14 +329,18 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, xa_unlock(&cache->reqs);
id = iter.index; - msg->msg_id = id;
if (msg->opcode == CACHEFILES_OP_OPEN) { ret = cachefiles_ondemand_get_fd(req); - if (ret) + if (ret) { + cachefiles_ondemand_set_object_close(req->object); goto error; + } }
+ msg->msg_id = id; + msg->object_id = req->object->private->ondemand_id; + if (copy_to_user(_buffer, msg, n) != 0) { ret = -EFAULT; goto err_put_fd; @@ -352,7 +378,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, { static atomic64_t global_index = ATOMIC64_INIT(0); struct cachefiles_cache *cache; - struct cachefiles_req *req; + struct cachefiles_req *req = NULL; long id; int ret;
@@ -362,12 +388,16 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) return 0;
- if (test_bit(CACHEFILES_DEAD, &cache->flags)) - return -EIO; + if (test_bit(CACHEFILES_DEAD, &cache->flags)) { + ret = -EIO; + goto out; + }
req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL); - if (!req) - return -ENOMEM; + if (!req) { + ret = -ENOMEM; + goto out; + }
req->object = object; init_completion(&req->done); @@ -403,7 +433,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
/* coupled with the barrier in cachefiles_flush_reqs() */ smp_mb(); - if (opcode != CACHEFILES_OP_OPEN && + if (opcode == CACHEFILES_OP_CLOSE && !cachefiles_ondemand_object_is_open(object)) { WARN_ON_ONCE(object->private->ondemand_id == 0); xa_unlock(&cache->reqs); @@ -421,7 +451,15 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, wake_up_all(&cache->daemon_pollwq); wait_for_completion(&req->done); ret = req->error; + kfree(req); + return ret; out: + /* Reset the object to close state in error handling path. + * If error occurs after creating the anonymous fd, + * cachefiles_ondemand_fd_release() will set object to close. + */ + if (opcode == CACHEFILES_OP_OPEN) + cachefiles_ondemand_set_object_close(req->object); kfree(req); return ret; } @@ -471,8 +509,6 @@ static int cachefiles_ondemand_init_close_req(struct cachefiles_req *req,
if (!cachefiles_ondemand_object_is_open(object)) return -ENOENT; - - req->msg.object_id = object->private->ondemand_id; return 0; }
@@ -484,19 +520,9 @@ struct cachefiles_read_ctx { static int cachefiles_ondemand_init_read_req(struct cachefiles_req *req, void *private) { - struct cachefiles_object *object = req->object; struct cachefiles_read *load = (void *)req->msg.data; struct cachefiles_read_ctx *read_ctx = private; - int object_id = object->private->ondemand_id; - - /* Stop enqueuing requests when daemon has closed anon_fd. */ - if (!cachefiles_ondemand_object_is_open(object)) { - WARN_ON_ONCE(object_id == 0); - pr_info_once("READ: anonymous fd closed prematurely.\n"); - return -EIO; - }
- req->msg.object_id = object_id; load->off = read_ctx->off; load->len = read_ctx->len; return 0; @@ -554,5 +580,6 @@ int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object) return -ENOMEM;
object->private->object = object; + INIT_WORK(&object->private->work, ondemand_object_worker); return 0; }
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/f8ecbc39993a
--------------------------------
ANBZ: #3213
cherry-picked from [1]
Don't trigger EPOLLIN when there are only reopening read requests in xarray.
Suggested-by: Xin Yin yinxin.x@bytedance.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Link: [1] https://lore.kernel.org/lkml/20221014080559.42108-5-zhujia.zj@bytedance.com/... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 16 ++++++++++++++-- fs/cachefiles/internal.h | 12 ++++++++++++ 2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index b531373400d7..075c227a336b 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -359,14 +359,26 @@ static __poll_t cachefiles_daemon_poll(struct file *file, struct poll_table_struct *poll) { struct cachefiles_cache *cache = file->private_data; + struct cachefiles_req *req; + struct radix_tree_iter iter; __poll_t mask; + void **slot;
poll_wait(file, &cache->daemon_pollwq, poll); mask = 0;
if (cachefiles_in_ondemand_mode(cache)) { - if (!radix_tree_empty(&cache->reqs)) - mask |= EPOLLIN; + if (!radix_tree_empty(&cache->reqs)) { + radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, + CACHEFILES_REQ_NEW) { + req = radix_tree_deref_slot_protected(slot, + &cache->reqs.xa_lock); + if (!cachefiles_ondemand_is_reopening_read(req)) { + mask |= EPOLLIN; + break; + } + } + } } else { if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) mask |= EPOLLIN; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 9f460188e1a8..435e8168f26f 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -293,6 +293,13 @@ cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \ CACHEFILES_OBJECT_STATE_FUNCS(open); CACHEFILES_OBJECT_STATE_FUNCS(close); CACHEFILES_OBJECT_STATE_FUNCS(reopening); + +static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req) +{ + return cachefiles_ondemand_object_is_reopening(req->object) && + req->msg.opcode == CACHEFILES_OP_READ; +} + #else static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen, loff_t *pos) @@ -318,6 +325,11 @@ static inline int cachefiles_ondemand_init_obj_info(struct cachefiles_object *ob { return 0; } + +static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req) +{ + return false; +} #endif
/*
From: Jia Zhu zhujia.zj@bytedance.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/71001eabd791
--------------------------------
ANBZ: #3213
cherry-picked from [1]
Previously, in ondemand read scenario, if the anonymous fd was closed by user daemon, inflight and subsequent read requests would return EIO. As long as the device connection is not released, user daemon can hold and restore inflight requests by setting the request flag to CACHEFILES_REQ_NEW.
Suggested-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Jia Zhu zhujia.zj@bytedance.com Signed-off-by: Xin Yin yinxin.x@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com [jingbo: use xas_for_each since radix_tree_iter_tag_set is unavailable] Link: [1] https://lore.kernel.org/lkml/20221014080559.42108-5-zhujia.zj@bytedance.com/... Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Acked-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/894 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 1 + fs/cachefiles/internal.h | 3 +++ fs/cachefiles/ondemand.c | 23 +++++++++++++++++++++++ 3 files changed, 27 insertions(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 075c227a336b..4bb81e003ae1 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -75,6 +75,7 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = { { "tag", cachefiles_daemon_tag }, #ifdef CONFIG_CACHEFILES_ONDEMAND { "copen", cachefiles_ondemand_copen }, + { "restore", cachefiles_ondemand_restore }, #endif { "", NULL } }; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 435e8168f26f..f975042c1658 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -270,6 +270,9 @@ extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args);
+extern int cachefiles_ondemand_restore(struct cachefiles_cache *cache, + char *args); + extern int cachefiles_ondemand_init_object(struct cachefiles_object *object); extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object); extern int cachefiles_ondemand_read(struct cachefiles_object *object, diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 79a8eea88c2e..1fd1efd5a117 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -188,6 +188,29 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) return ret; }
+int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args) +{ + struct cachefiles_req *req; + + XA_STATE(xas, &cache->reqs, 0); + + if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) + return -EOPNOTSUPP; + + /* + * Reset the requests to CACHEFILES_REQ_NEW state, so that the + * requests have been processed halfway before the crash of the + * user daemon could be reprocessed after the recovery. + */ + xas_lock(&xas); + xas_for_each(&xas, req, ULONG_MAX) + xas_set_mark(&xas, CACHEFILES_REQ_NEW); + xas_unlock(&xas); + + wake_up_all(&cache->daemon_pollwq); + return 0; +} + static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) { struct cachefiles_object *object = req->object;
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/d736b9b9be9f
--------------------------------
ANBZ: #3213
cachefiles_object is allocated from cachefiles_object_jar slab cache without zeroing.
Apart from cachefiles_alloc_object(), cachefiles_daemon_add_cache() also allocates cachefiles_object directly from cachefiles_object_jar slab cache, in which object->private is not initialized, while the allocated cachefiles_object is still freed in cachefiles_put_object(). This is reasonable since the cachefiles_object allocated in cachefiles_daemon_add_cache() represents a directory rather than a data file, while object->private is only used for data files.
However, if object->private is not reset to NULL when cachefiles_object is freed, and then the cachefiles_object is allocated again in cachefiles_alloc_object(), a wild pointer is exposed in object->private, which can cause double-free or use-after-free.
Fixes: 679445f70359 ("anolis: cachefiles: extract ondemand info field from cachefiles_object") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/915 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 0a946d046724..5eedd6382737 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -106,6 +106,7 @@ static struct fscache_object *cachefiles_alloc_object( kfree(buffer); nomem_buffer: kfree(object->private); + object->private = NULL; nomem_obj_info: BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)); kmem_cache_free(cachefiles_object_jar, object); @@ -379,6 +380,7 @@ static void cachefiles_put_object(struct fscache_object *_object,
cache = object->fscache.cache; kfree(object->private); + object->private = NULL; fscache_object_destroy(&object->fscache); kmem_cache_free(cachefiles_object_jar, object); fscache_object_destroyed(cache);
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/51839830d50d
--------------------------------
ANBZ: #3213
Add missing lock protection in poll routine when iterating xarray, otherwise:
1. The radix_tree API itself doesn't imply RCU read lock, thus we may encounter UAF when dereferencing slot without radix_tree's lock held.
2. Even with RCU read lock held, only the slot of the radix tree is ensured to be pinned there, while the data structure (e.g. struct cachefiles_req) stored in the slot has no such guarantee. The poll routine will iterate the radix tree and dereference cachefiles_req accordingly. Thus RCU read lock is not adequate in this case and spinlock is needed here.
3. Otherwise, radix_tree_deref_slot_protected() will fail lockdep_is_held() checking.
Fixes: f8ecbc39993a ("anolis: cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1004 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 4bb81e003ae1..e26ebbc89806 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -370,6 +370,7 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
if (cachefiles_in_ondemand_mode(cache)) { if (!radix_tree_empty(&cache->reqs)) { + xa_lock(&cache->reqs); radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, CACHEFILES_REQ_NEW) { req = radix_tree_deref_slot_protected(slot, @@ -379,6 +380,7 @@ static __poll_t cachefiles_daemon_poll(struct file *file, break; } } + xa_unlock(&cache->reqs); } } else { if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
From: Jingbo Xu jefflexu@linux.alibaba.com
anolis inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://gitee.com/anolis/cloud-kernel/commit/bf730b36eb29
--------------------------------
ANBZ: #3213
When entering the error path, the request may not have been allocated or req->object may not have been initialized.
Fixes: 5d1970fc484a ("anolis: cachefiles: resend an open request if the read request's object is closed") Reported-by: Jia Zhu zhujia.zj@bytedance.com Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Link: https://gitee.com/anolis/cloud-kernel/pulls/1022 Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 1fd1efd5a117..7ae08e98e417 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -482,7 +482,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, * cachefiles_ondemand_fd_release() will set object to close. */ if (opcode == CACHEFILES_OP_OPEN) - cachefiles_ondemand_set_object_close(req->object); + cachefiles_ondemand_set_object_close(object); kfree(req); return ret; }
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
----------------------------------------
To support erofs over fscache mode, the following config is enabled by default on x86 and arm64:
CONFIG_CACHEFILES_ONDEMAND=y CONFIG_EROFS_FS=m CONFIG_EROFS_FS_XATTR=y CONFIG_EROFS_FS_POSIX_ACL=y CONFIG_EROFS_FS_SECURITY=y CONFIG_EROFS_FS_ONDEMAND=y
Signed-off-by: Baokun Li libaokun1@huawei.com --- arch/arm64/configs/openeuler_defconfig | 9 ++++++++- arch/x86/configs/openeuler_defconfig | 9 ++++++++- 2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 4c117f2f55f1..9996b88ba58c 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -6407,6 +6407,7 @@ CONFIG_FSCACHE_STATS=y CONFIG_CACHEFILES=m # CONFIG_CACHEFILES_DEBUG is not set # CONFIG_CACHEFILES_HISTOGRAM is not set +CONFIG_CACHEFILES_ONDEMAND=y # end of Caches
# @@ -6520,7 +6521,13 @@ CONFIG_PSTORE_COMPRESS_DEFAULT="deflate" CONFIG_PSTORE_RAM=m # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set -# CONFIG_EROFS_FS is not set +CONFIG_EROFS_FS=m +# CONFIG_EROFS_FS_DEBUG is not set +CONFIG_EROFS_FS_XATTR=y +CONFIG_EROFS_FS_POSIX_ACL=y +CONFIG_EROFS_FS_SECURITY=y +# CONFIG_EROFS_FS_ZIP is not set +CONFIG_EROFS_FS_ONDEMAND=y CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=m CONFIG_NFS_V2=m diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index bfaadb4b298f..d5282a39d8cc 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -7462,6 +7462,7 @@ CONFIG_FSCACHE_STATS=y CONFIG_CACHEFILES=m # CONFIG_CACHEFILES_DEBUG is not set # CONFIG_CACHEFILES_HISTOGRAM is not set +CONFIG_CACHEFILES_ONDEMAND=y # end of Caches
# @@ -7577,7 +7578,13 @@ CONFIG_PSTORE_COMPRESS_DEFAULT="deflate" CONFIG_PSTORE_RAM=m # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set -# CONFIG_EROFS_FS is not set +CONFIG_EROFS_FS=m +# CONFIG_EROFS_FS_DEBUG is not set +CONFIG_EROFS_FS_XATTR=y +CONFIG_EROFS_FS_POSIX_ACL=y +CONFIG_EROFS_FS_SECURITY=y +# CONFIG_EROFS_FS_ZIP is not set +CONFIG_EROFS_FS_ONDEMAND=y CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=m # CONFIG_NFS_V2 is not set
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When erofs performs readahead will execute erofs_fscache_readahead(), which will execute fscache_prepare_read() first, then execute readahead_page(), that may lead to BUG_ON as it may unlock_page first. Fix this problem by only update rac but not check the page lock as the erofs readahead is not rely on page.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 36bb49ecf6a2..12b0c564b7c3 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -255,10 +255,9 @@ static void erofs_fscache_readahead(struct readahead_control *rac) }
done += count; - while (count) { - page = readahead_page(rac); - count -= PAGE_SIZE; - } + count /= PAGE_SIZE; + rac->_nr_pages -= count; + rac->_index += count; } while (done < len); }
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In order to make the user state control the characteristics of the erofs, a dynamic switch about erofs is added.
The switch named erofs_enabled is closed by default, and is isolated by CONFIG_EROFS_FS. Users can open it by echo 1/on/ON/y/Y to /sys/module/fs_ctl/parameters/erofs_enabled.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/Makefile | 2 ++ fs/erofs/super.c | 3 +++ fs/fs_ctl.c | 36 ++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 4 ++++ 4 files changed, 45 insertions(+) create mode 100644 fs/fs_ctl.c
diff --git a/fs/Makefile b/fs/Makefile index 29cc13ba2c08..06f000e22e06 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -15,6 +15,8 @@ obj-y := open.o read_write.o file_table.o super.o \ stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \ fs_types.o fs_context.o fs_parser.o fsopen.o init.o \ kernel_read_file.o remap_range.o +obj-y += fs_ctl.o + ifdef CONFIG_CC_IS_CLANG CFLAGS_namei.o := $(call cc-disable-warning, bitwise-instead-of-logical) endif diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 606d4637a795..a6cb90c4fcce 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -661,6 +661,9 @@ static int erofs_init_fs_context(struct fs_context *fc) { struct erofs_fs_context *ctx;
+ if (!READ_ONCE(erofs_enabled)) + return -EOPNOTSUPP; + /* pseudo mount for anon inodes */ if (fc->sb_flags & SB_KERNMOUNT) { fc->ops = &erofs_anon_context_ops; diff --git a/fs/fs_ctl.c b/fs/fs_ctl.c new file mode 100644 index 000000000000..ab6b532188d9 --- /dev/null +++ b/fs/fs_ctl.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2024. Huawei Technologies Co., Ltd */ + +#include <linux/moduleparam.h> +#include <linux/kernel.h> +#include <linux/fs.h> + +#if IS_ENABLED(CONFIG_EROFS_FS) +static int param_set_bool_on_only_once(const char *s, const struct kernel_param *kp) +{ + int ret; + bool value, *res = kp->arg; + + if (!s) + s = "1"; + + ret = strtobool(s, &value); + if (ret) + return ret; + + if (!value && *res) + return -EBUSY; + + if (value && !*res) + WRITE_ONCE(*res, true); + + return 0; +} +#endif + +#if IS_ENABLED(CONFIG_EROFS_FS) +bool erofs_enabled; +EXPORT_SYMBOL(erofs_enabled); +module_param_call(erofs_enabled, param_set_bool_on_only_once, param_get_bool, + &erofs_enabled, 0644); +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index 7e8684e3f05d..911c923c91bf 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3698,4 +3698,8 @@ bool generic_atomic_write_valid(loff_t pos, size_t len, return true; }
+#if IS_ENABLED(CONFIG_EROFS_FS) +extern bool erofs_enabled; +#endif + #endif /* _LINUX_FS_H */
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In order to make the user state better control the characteristics of the on demand load mode, a dynamic switch about the on-demand mode is added.
The switch named cachefiles_ondemand_enabled is closed by default, and is isolated by CONFIG_CACHEFILES_ONDEMAND. Users can open it by echo 1/on/ON/y/Y to /sys/module/fs_ctl/parameters/cachefiles_ondemand_enabled.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/bind.c | 4 ++++ fs/erofs/internal.h | 1 + fs/erofs/super.c | 16 +++++++++++++++- fs/fs_ctl.c | 9 ++++++++- include/linux/fs.h | 13 +++++++++++++ 5 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c index c149d5037cc1..d076651b2931 100644 --- a/fs/cachefiles/bind.c +++ b/fs/cachefiles/bind.c @@ -59,6 +59,10 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND)) { if (!strcmp(args, "ondemand")) { + if (!cachefiles_ondemand_is_enabled()) { + pr_err("ondemand mode is disabled\n"); + return -EINVAL; + } set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); } else if (*args) { pr_err("Invalid argument to the 'bind' command\n"); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index f7b01a4be183..16eaeee4a295 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -79,6 +79,7 @@ struct erofs_fs_context { struct erofs_dev_context *devs; char *fsid; char *domain_id; + bool ondemand_enabled; };
struct erofs_domain { diff --git a/fs/erofs/super.c b/fs/erofs/super.c index a6cb90c4fcce..7c9a29d2d9d4 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -401,6 +401,10 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; case Opt_fsid: #ifdef CONFIG_EROFS_FS_ONDEMAND + if (!ctx->ondemand_enabled) { + errorfc(fc, "fsid option not supported"); + return -EINVAL; + } kfree(ctx->fsid); ctx->fsid = kstrdup(param->string, GFP_KERNEL); if (!ctx->fsid) @@ -412,6 +416,10 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; case Opt_domain_id: #ifdef CONFIG_EROFS_FS_ONDEMAND + if (!ctx->ondemand_enabled) { + errorfc(fc, "domain_id option not supported"); + break; + } kfree(ctx->domain_id); ctx->domain_id = kstrdup(param->string, GFP_KERNEL); if (!ctx->domain_id) @@ -657,11 +665,16 @@ static const struct fs_context_operations erofs_anon_context_ops = { .get_tree = erofs_fc_anon_get_tree, };
+static inline bool erofs_can_init(void) +{ + return READ_ONCE(erofs_enabled) || cachefiles_ondemand_is_enabled(); +} + static int erofs_init_fs_context(struct fs_context *fc) { struct erofs_fs_context *ctx;
- if (!READ_ONCE(erofs_enabled)) + if (!erofs_can_init()) return -EOPNOTSUPP;
/* pseudo mount for anon inodes */ @@ -678,6 +691,7 @@ static int erofs_init_fs_context(struct fs_context *fc) kfree(ctx); return -ENOMEM; } + ctx->ondemand_enabled = cachefiles_ondemand_is_enabled(); fc->fs_private = ctx;
idr_init(&ctx->devs->tree); diff --git a/fs/fs_ctl.c b/fs/fs_ctl.c index ab6b532188d9..6464c9ba5e18 100644 --- a/fs/fs_ctl.c +++ b/fs/fs_ctl.c @@ -5,7 +5,7 @@ #include <linux/kernel.h> #include <linux/fs.h>
-#if IS_ENABLED(CONFIG_EROFS_FS) +#if IS_ENABLED(CONFIG_EROFS_FS) || IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) static int param_set_bool_on_only_once(const char *s, const struct kernel_param *kp) { int ret; @@ -34,3 +34,10 @@ EXPORT_SYMBOL(erofs_enabled); module_param_call(erofs_enabled, param_set_bool_on_only_once, param_get_bool, &erofs_enabled, 0644); #endif + +#if IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) +bool cachefiles_ondemand_enabled; +EXPORT_SYMBOL(cachefiles_ondemand_enabled); +module_param_call(cachefiles_ondemand_enabled, param_set_bool_on_only_once, param_get_bool, + &cachefiles_ondemand_enabled, 0644); +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index 911c923c91bf..5e6d2b63626b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3702,4 +3702,17 @@ bool generic_atomic_write_valid(loff_t pos, size_t len, extern bool erofs_enabled; #endif
+#if IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) +extern bool cachefiles_ondemand_enabled; +static inline bool cachefiles_ondemand_is_enabled(void) +{ + return READ_ONCE(cachefiles_ondemand_enabled); +} +#else +static inline bool cachefiles_ondemand_is_enabled(void) +{ + return false; +} +#endif + #endif /* _LINUX_FS_H */
From: Jingbo Xu jefflexu@linux.alibaba.com
mainline inclusion from mainline-v6.3-rc1 commit 7032809a44d752b9e2275833787e0aa88a7540af category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Relinquish fscache volume with mutex held. Otherwise if a new domain is registered when the old domain with the same name gets removed from the list but not relinquished yet, fscache may complain the collision.
Fixes: 8b7adf1dff3d ("erofs: introduce fscache-based domain") Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Link: https://lore.kernel.org/r/20230209063913.46341-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com
Conflicts: fs/erofs/fscache.c --- fs/erofs/fscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 12b0c564b7c3..d803c8716909 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -285,8 +285,8 @@ static void erofs_fscache_domain_put(struct erofs_domain *domain) kern_unmount(erofs_pseudo_mnt); erofs_pseudo_mnt = NULL; } - mutex_unlock(&erofs_domain_list_lock); fscache_relinquish_cookie(domain->volume, NULL, false); + mutex_unlock(&erofs_domain_list_lock); kfree(domain->domain_id); kfree(domain); return;
From: David Howells dhowells@redhat.com
mainline inclusion from mainline-v6.8-rc2 commit c3d6569a43322f371e7ba0ad386112723757ac8f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
cachefiles_ondemand_init_object() as called from cachefiles_open_file() and cachefiles_create_tmpfile() does not check if object->ondemand is set before dereferencing it, leading to an oops something like:
RIP: 0010:cachefiles_ondemand_init_object+0x9/0x41 ... Call Trace: <TASK> cachefiles_open_file+0xc9/0x187 cachefiles_lookup_cookie+0x122/0x2be fscache_cookie_state_machine+0xbe/0x32b fscache_cookie_worker+0x1f/0x2d process_one_work+0x136/0x208 process_scheduled_works+0x3a/0x41 worker_thread+0x1a2/0x1f6 kthread+0xca/0xd2 ret_from_fork+0x21/0x33
Fix this by making cachefiles_ondemand_init_object() return immediately if cachefiles->ondemand is NULL.
Fixes: 3c5ecfe16e76 ("cachefiles: extract ondemand info field from cachefiles_object") Reported-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com cc: Gao Xiang xiang@kernel.org cc: Chao Yu chao@kernel.org cc: Yue Hu huyue2@coolpad.com cc: Jeffle Xu jefflexu@linux.alibaba.com cc: linux-erofs@lists.ozlabs.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
Conflicts: fs/cachefiles/ondemand.c
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 7ae08e98e417..493b8b8675f7 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -556,6 +556,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object) struct fscache_cookie *cookie = object->fscache.cookie; size_t volume_key_size, cookie_key_size, data_len;
+ if (!object->private) + return 0; + /* * CacheFiles will firstly check the cache file under the root cache * directory. If the coherency check failed, it will fallback to
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
If the parent is not ready when an object is initialized, it grabs a reference count of the object and adds it to the parent's dependents. When the parent is ready, it traverses the dependents to remove the object and reduce the reference count.
However, calling fscache_dequeue_object when ABORT_INIT only removes the object from the parent's dependents without decreasing the object's reference count, thus leading to reference count leakage. When releasing /dev/cachefiles, the following softlock up is triggered because it waits for the reference count to reach 0.
================================================================== INFO: task cachefilesd2:635 blocked for more than 122 seconds. Not tainted 5.10.0-xfstests-00004-g042dc94280ce-dirty #1321 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:cachefilesd2 state:D stack:0 pid:635 ppid:596 flags:0x00004006 Call Trace: __schedule+0x3cc/0x770 schedule+0x5b/0xd0 fscache_withdraw_cache+0x298/0x427 cachefiles_daemon_unbind.cold+0x18/0x69 [cachefiles] cachefiles_put_unbind_pincount+0x2a/0x60 [cachefiles] cachefiles_daemon_release+0x75/0x1e0 [cachefiles] __fput+0xe8/0x260 task_work_run+0x5f/0xb0 do_exit+0x381/0xbb0 [...] ==================================================================
So put the reference count of the object when removing it from the parent's dependents in fscache_dequeue_object() to fix the issue.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/object.c | 1 + include/linux/fscache-cache.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/fscache/object.c b/fs/fscache/object.c index cb2146e02cd5..9ce3041a5d71 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -904,6 +904,7 @@ static void fscache_dequeue_object(struct fscache_object *object) if (!list_empty(&object->dep_link)) { spin_lock(&object->parent->lock); list_del_init(&object->dep_link); + fscache_put_object(object, fscache_obj_put_dequeue); spin_unlock(&object->parent->lock); }
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index f3ae78d1e5f3..cb8be2d3143c 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -34,6 +34,7 @@ enum fscache_obj_ref_trace { fscache_obj_put_enq_dep, fscache_obj_put_queue, fscache_obj_put_work, + fscache_obj_put_dequeue, fscache_obj_ref__nr_traces };
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When initialising an object in fscache_initialise_object(), it will just drop the object due to [no parent]/[bad parent]/[grab failed], however the current object may already have children added in fscache_attach_object(), so the assertion may fail in fscache_put_object() because n_children != 0.
========================================================= kernel BUG at fs/cachefiles/interface.c:372! CPU: 2 PID: 153 Comm: kworker/u8:7 5.10.0-00001-g24c450967e57-dirty RIP: 0010:cachefiles_put_object.cold+0x3b3/0x434 Call Trace: fscache_put_object+0x77/0xc0 fscache_object_work_func+0x4d/0x60 process_one_work+0x40e/0x810 worker_thread+0x96/0x700 kthread+0x1f4/0x250 ret_from_fork+0x1f/0x30 =========================================================
As follows, mounting OBJ1 and OBJ4 successively can trigger the above assertion failure:
OBJ0->OBJ3->OBJ2->OBJ1 | --->OBJ5->OBJ4
fscache_look_up_object(OBJ3) cachefiles_lookup_object cachefiles_walk_to_object fscache_object_lookup_negative fscache_lookup_failure(OBJ3) __fscache_acquire_cookie fscache_acquire_non_index_cookie fscache_alloc_object(OBJ4) fscache_alloc_object(OBJ5) fscache_attach_object(OBJ5) cookie->parent->backing_objects fscache_object_is_dying(OBJ3) OBJ3->n_children++ fscache_kill_object(OBJ3) fscache_mark_object_dead(OBJ3) clear_bit(FSCACHE_OBJECT_IS_LIVE, &OBJ3->flags); fscache_kill_dependents(OBJ3) list_del_init(&dep->dep_link); // WAIT_FOR_CLEARANCE fscache_attach_object(OBJ4) OBJ4->parent = OBJ5 OBJ5->n_children++ fscache_raise_event(OBJ4, OBJECT_NEW_CHILD) fscache_initialise_object(OBJ4) fscache_raise_event(OBJ5, OBJECT_NEW_CHILD) fscache_initialise_object(OBJ5) if (fscache_object_is_dying(OBJ3)) fscache_drop_object(OBJ5) cachefiles_drop_object(OBJ5) OBJ3->n_children-- fscache_put_object(OBJ5) cachefiles_put_object(OBJ5) OBJ5->usage == 0 OBJ5)->fscache.n_children != 0 // ASSERTCMP FAIL
To fix this, replace DROP_OBJECT in fscache_initialise_object() with KILL_OBJECT so that it waits for n_children to become 0 when there are child objects.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/object.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 9ce3041a5d71..0375f448afc4 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -382,14 +382,14 @@ static const struct fscache_state *fscache_initialise_object(struct fscache_obje parent = object->parent; if (!parent) { _leave(" [no parent]"); - return transit_to(DROP_OBJECT); + return transit_to(KILL_OBJECT); }
_debug("parent: %s of:%lx", parent->state->name, parent->flags);
if (fscache_object_is_dying(parent)) { _leave(" [bad parent]"); - return transit_to(DROP_OBJECT); + return transit_to(KILL_OBJECT); }
if (fscache_object_is_available(parent)) { @@ -411,7 +411,7 @@ static const struct fscache_state *fscache_initialise_object(struct fscache_obje spin_unlock(&parent->lock); if (!success) { _leave(" [grab failed]"); - return transit_to(DROP_OBJECT); + return transit_to(KILL_OBJECT); }
/* fscache_acquire_non_index_cookie() uses this
From: Hou Tao houtao1@huawei.com
mainline inclusion from mainline-v6.2-rc1 commit 27f2a2dcc6261406b509b5022a1e5c23bf622830 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When shared domain is enabled, doing mount twice with the same fsid and domain_id will trigger sysfs warning as shown below:
sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin' CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 sysfs_warn_dup.cold+0x17/0x27 sysfs_create_dir_ns+0xb8/0xd0 kobject_add_internal+0xb1/0x240 kobject_init_and_add+0x71/0xa0 erofs_register_sysfs+0x89/0x110 erofs_fc_fill_super+0x98c/0xaf0 vfs_get_super+0x7d/0x100 get_tree_nodev+0x16/0x20 erofs_fc_get_tree+0x20/0x30 vfs_get_tree+0x24/0xb0 path_mount+0x2fa/0xa90 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0
The reason is erofs_fscache_register_cookie() doesn't guarantee the primary data blob (aka fsid) is unique in the shared domain and erofs_register_sysfs() invoked by the second mount will fail due to the duplicated fsid in the shared domain and report warning.
It would be better to check the uniqueness of fsid before doing erofs_register_sysfs(), so adding a new flags parameter for erofs_fscache_register_cookie() and doing the uniqueness check if EROFS_REG_COOKIE_NEED_NOEXIST is enabled.
After the patch, the error in dmesg for the duplicated mount would be:
erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY
Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Hou Tao houtao1@huawei.com Link: https://lore.kernel.org/r/20221125110822.3812942-1-houtao@huaweicloud.com Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain") Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com
Conflicts: fs/erofs/fscache.c fs/erofs/internal.h --- fs/erofs/fscache.c | 47 +++++++++++++++++++++++++++++++++------------ fs/erofs/internal.h | 10 ++++++++-- fs/erofs/super.c | 2 +- 3 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index d803c8716909..761393a4e151 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -383,7 +383,8 @@ static int erofs_fscache_register_domain(struct super_block *sb)
static struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, - char *name, bool need_inode) + char *name, + unsigned int flags) { struct erofs_fscache *ctx; struct fscache_cookie *cookie; @@ -406,7 +407,7 @@ struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, //fscache_use_cookie(cookie, false); ctx->cookie = cookie;
- if (need_inode) { + if (flags & EROFS_REG_COOKIE_NEED_INODE) { struct inode *const inode = new_inode(sb);
if (!inode) { @@ -444,14 +445,15 @@ static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx)
static struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb, - char *name, bool need_inode) + char *name, + unsigned int flags) { int err; struct inode *inode; struct erofs_fscache *ctx; struct erofs_domain *domain = EROFS_SB(sb)->domain;
- ctx = erofs_fscache_acquire_cookie(sb, name, need_inode); + ctx = erofs_fscache_acquire_cookie(sb, name, flags); if (IS_ERR(ctx)) return ctx;
@@ -479,7 +481,8 @@ struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
static struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, - char *name, bool need_inode) + char *name, + unsigned int flags) { struct inode *inode; struct erofs_fscache *ctx; @@ -492,23 +495,30 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, ctx = inode->i_private; if (!ctx || ctx->domain != domain || strcmp(ctx->name, name)) continue; - igrab(inode); + if (!(flags & EROFS_REG_COOKIE_NEED_NOEXIST)) { + igrab(inode); + } else { + erofs_err(sb, "%s already exists in domain %s", name, + domain->domain_id); + ctx = ERR_PTR(-EEXIST); + } spin_unlock(&psb->s_inode_list_lock); mutex_unlock(&erofs_domain_cookies_lock); return ctx; } spin_unlock(&psb->s_inode_list_lock); - ctx = erofs_fscache_domain_init_cookie(sb, name, need_inode); + ctx = erofs_fscache_domain_init_cookie(sb, name, flags); mutex_unlock(&erofs_domain_cookies_lock); return ctx; }
struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, bool need_inode) + char *name, + unsigned int flags) { if (EROFS_SB(sb)->domain_id) - return erofs_domain_register_cookie(sb, name, need_inode); - return erofs_fscache_acquire_cookie(sb, name, need_inode); + return erofs_domain_register_cookie(sb, name, flags); + return erofs_fscache_acquire_cookie(sb, name, flags); }
void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) @@ -537,6 +547,7 @@ int erofs_fscache_register_fs(struct super_block *sb) int ret; struct erofs_sb_info *sbi = EROFS_SB(sb); struct erofs_fscache *fscache; + unsigned int flags;
if (sbi->domain_id) ret = erofs_fscache_register_domain(sb); @@ -545,8 +556,20 @@ int erofs_fscache_register_fs(struct super_block *sb) if (ret) return ret;
- /* acquired domain/volume will be relinquished in kill_sb() on error */ - fscache = erofs_fscache_register_cookie(sb, sbi->fsid, true); + /* + * When shared domain is enabled, using NEED_NOEXIST to guarantee + * the primary data blob (aka fsid) is unique in the shared domain. + * + * For non-shared-domain case, fscache_acquire_volume() invoked by + * erofs_fscache_register_volume() has already guaranteed + * the uniqueness of primary data blob. + * + * Acquired domain/volume will be relinquished in kill_sb() on error. + */ + flags = EROFS_REG_COOKIE_NEED_INODE; + if (sbi->domain_id) + flags |= EROFS_REG_COOKIE_NEED_NOEXIST; + fscache = erofs_fscache_register_cookie(sb, sbi->fsid, flags); if (IS_ERR(fscache)) return PTR_ERR(fscache);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 16eaeee4a295..12f652e8b967 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -492,6 +492,10 @@ static inline int z_erofs_init_zip_subsystem(void) { return 0; } static inline void z_erofs_exit_zip_subsystem(void) {} #endif /* !CONFIG_EROFS_FS_ZIP */
+/* flags for erofs_fscache_register_cookie() */ +#define EROFS_REG_COOKIE_NEED_INODE 1 +#define EROFS_REG_COOKIE_NEED_NOEXIST 2 + /* fscache.c */ #ifdef CONFIG_EROFS_FS_ONDEMAND int erofs_fscache_register(void); @@ -500,7 +504,8 @@ int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb);
struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, bool need_inode); + char *name, + unsigned int flags); void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache); extern const struct address_space_operations erofs_fscache_access_aops; #else @@ -517,7 +522,8 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
static inline struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, bool need_inode) + char *name, + unsigned int flags) { return ERR_PTR(-EOPNOTSUPP); } diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 7c9a29d2d9d4..f8d6371003b0 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -148,7 +148,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, }
if (erofs_is_fscache_mode(sb)) { - fscache = erofs_fscache_register_cookie(sb, dif->path, false); + fscache = erofs_fscache_register_cookie(sb, dif->path, 0); if (IS_ERR(fscache)) return PTR_ERR(fscache); dif->fscache = fscache;
mainline inclusion from mainline-v6.9-rc1 commit 0f28be64d132aaf95d06375c8002ad9ecea69d71 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Lockdep reported the following issue when mounting erofs with a domain_id:
============================================ WARNING: possible recursive locking detected 6.8.0-rc7-xfstests #521 Not tainted -------------------------------------------- mount/396 is trying to acquire lock: ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0
but task is already holding lock: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&type->s_umount_key#50/1); lock(&type->s_umount_key#50/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by mount/396: #0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 #1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3}, at: erofs_fscache_register_fs+0x3d/0x270 [erofs]
stack backtrace: CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 validate_chain+0x5c4/0xa00 __lock_acquire+0x6a9/0xd50 lock_acquire+0xcd/0x2b0 down_write_nested+0x45/0xd0 alloc_super+0xe3/0x3d0 sget_fc+0x62/0x2f0 vfs_get_super+0x21/0x90 vfs_get_tree+0x2c/0xf0 fc_mount+0x12/0x40 vfs_kern_mount.part.0+0x75/0x90 kern_mount+0x24/0x40 erofs_fscache_register_fs+0x1ef/0x270 [erofs] erofs_fc_fill_super+0x213/0x380 [erofs]
This is because the file_system_type of both erofs and the pseudo-mount point of domain_id is erofs_fs_type, so two successive calls to alloc_super() are considered to be using the same lock and trigger the warning above.
Therefore add a nodev file_system_type called erofs_anon_fs_type in fscache.c to silence this complaint. Because kern_mount() takes a pointer to struct file_system_type, not its (string) name. So we don't need to call register_filesystem(). In addition, call init_pseudo() in erofs_anon_init_fs_context() as suggested by Al Viro, so that we can remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and erofs_anon_context_ops.
Suggested-by: Al Viro viro@zeniv.linux.org.uk Fixes: a9849560c55e ("erofs: introduce a pseudo mnt to manage shared cookies") Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-and-tested-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Yang Erkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240307101018.2021925-1-libaokun1@huawei.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com
Conflicts: fs/erofs/fscache.c fs/erofs/internal.h fs/erofs/super.c [Context differences.] Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 15 ++++++++++++++- fs/erofs/internal.h | 1 - fs/erofs/super.c | 30 +----------------------------- 3 files changed, 15 insertions(+), 31 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 761393a4e151..d42a583ff6b1 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -2,6 +2,7 @@ * Copyright (C) 2022, Alibaba Cloud * Copyright (C) 2022, Bytedance Inc. All rights reserved. */ +#include <linux/pseudo_fs.h> #include <linux/fscache.h> #include <linux/mount.h> #include "internal.h" @@ -11,6 +12,18 @@ static DEFINE_MUTEX(erofs_domain_cookies_lock); static LIST_HEAD(erofs_domain_list); static struct vfsmount *erofs_pseudo_mnt;
+static int erofs_anon_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type erofs_anon_fs_type = { + .owner = THIS_MODULE, + .name = "pseudo_erofs", + .init_fs_context = erofs_anon_init_fs_context, + .kill_sb = kill_anon_super, +}; + struct fscache_netfs erofs_fscache_netfs = { .name = "erofs", .version = 0, @@ -342,7 +355,7 @@ static int erofs_fscache_init_domain(struct super_block *sb) goto out;
if (!erofs_pseudo_mnt) { - erofs_pseudo_mnt = kern_mount(&erofs_fs_type); + erofs_pseudo_mnt = kern_mount(&erofs_anon_fs_type); if (IS_ERR(erofs_pseudo_mnt)) { err = PTR_ERR(erofs_pseudo_mnt); goto out; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 12f652e8b967..905a08516cc6 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -335,7 +335,6 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value) }
extern const struct super_operations erofs_sops; -extern struct file_system_type erofs_fs_type;
extern const struct address_space_operations erofs_raw_access_aops; extern const struct address_space_operations z_erofs_aops; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index f8d6371003b0..769b32cd2cae 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -493,13 +493,6 @@ static int erofs_init_managed_cache(struct super_block *sb) static int erofs_init_managed_cache(struct super_block *sb) { return 0; } #endif
-static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc) -{ - static const struct tree_descr empty_descr = {""}; - - return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr); -} - static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) { struct inode *inode; @@ -585,11 +578,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) return 0; }
-static int erofs_fc_anon_get_tree(struct fs_context *fc) -{ - return get_tree_nodev(fc, erofs_fc_fill_pseudo_super); -} - static int erofs_fc_get_tree(struct fs_context *fc) { struct erofs_fs_context *ctx = fc->fs_private; @@ -661,10 +649,6 @@ static const struct fs_context_operations erofs_context_ops = { .free = erofs_fc_free, };
-static const struct fs_context_operations erofs_anon_context_ops = { - .get_tree = erofs_fc_anon_get_tree, -}; - static inline bool erofs_can_init(void) { return READ_ONCE(erofs_enabled) || cachefiles_ondemand_is_enabled(); @@ -677,12 +661,6 @@ static int erofs_init_fs_context(struct fs_context *fc) if (!erofs_can_init()) return -EOPNOTSUPP;
- /* pseudo mount for anon inodes */ - if (fc->sb_flags & SB_KERNMOUNT) { - fc->ops = &erofs_anon_context_ops; - return 0; - } - ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; @@ -711,12 +689,6 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
- /* pseudo mount for anon inodes */ - if (sb->s_flags & SB_KERNMOUNT) { - kill_anon_super(sb); - return; - } - if (erofs_is_fscache_mode(sb)) kill_anon_super(sb); else @@ -749,7 +721,7 @@ static void erofs_put_super(struct super_block *sb) sbi->s_fscache = NULL; }
-struct file_system_type erofs_fs_type = { +static struct file_system_type erofs_fs_type = { .owner = THIS_MODULE, .name = "erofs", .init_fs_context = erofs_init_fs_context,
From: Al Viro viro@zeniv.linux.org.uk
mainline inclusion from mainline-v6.8-rc6 commit 2c88c16dc20e88dd54d2f6f4d01ae1dce6cc9654 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
if you have a variable that holds NULL or a pointer to live struct mount, do not shove ERR_PTR() into it - not if you later treat "not NULL" as "holds a pointer to object".
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Conflicts: fs/erofs/fscache.c [Context differences.] Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index d42a583ff6b1..70b96377fcb6 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -355,11 +355,13 @@ static int erofs_fscache_init_domain(struct super_block *sb) goto out;
if (!erofs_pseudo_mnt) { - erofs_pseudo_mnt = kern_mount(&erofs_anon_fs_type); - if (IS_ERR(erofs_pseudo_mnt)) { - err = PTR_ERR(erofs_pseudo_mnt); + struct vfsmount *mnt = kern_mount(&erofs_anon_fs_type); + + if (IS_ERR(mnt)) { + err = PTR_ERR(mnt); goto out; } + erofs_pseudo_mnt = mnt; }
domain->volume = sbi->volume;
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
----------------------------------------
If the meta object->flags is not set FSCACHE_OBJECT_IS_LOOKED_UP when __fscache_read_or_alloc_page() is executed, the ASSERT will fail and occur kernel BUG().
Under normal circumstances, fscache_wait_for_deferred_lookup() will wait the cookie lookup flag to be cleared, which at the same time set object lookup flag. But when object state LOOK_UP_OBJECT fail, it will enter LOOKUP_FAILURE state, execute fscache_lookup_failure() and clear the cookie lookup flags but not set object lookup flag, then the ASSERT fails and cause kernel BUG().
We do not need to trigger kernel BUG() directly, just clear the ASSERT as fscache_submit_op() will also check the object state later and intercept the error status.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/page.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 39a05a43284d..b08568743370 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -472,8 +472,6 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, object = hlist_entry(cookie->backing_objects.first, struct fscache_object, cookie_link);
- ASSERT(test_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags)); - __fscache_use_cookie(cookie); atomic_inc(&object->n_reads); __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
From: Jingbo Xu jefflexu@linux.alibaba.com
mainline inclusion from mainline-6.3-rc1 commit 2dfb8c3b122fad4504a92c34ba68f2fe4444b3f6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
We'd better not touch sb->s_inodes list and inode->i_count directly. Let's maintain cookies of share domain in a self-contained list in erofs.
Besides, relinquish cookie with the mutex held. Otherwise if a cookie is registered when the old cookie with the same name in the same domain has been removed from the list but not relinquished yet, fscache may complain "Duplicate cookie detected".
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Link: https://lore.kernel.org/r/20230209063913.46341-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 44 ++++++++++++++++++++------------------------ fs/erofs/internal.h | 4 ++++ 2 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 70b96377fcb6..e6fa21ce686f 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -10,6 +10,7 @@ static DEFINE_MUTEX(erofs_domain_list_lock); static DEFINE_MUTEX(erofs_domain_cookies_lock); static LIST_HEAD(erofs_domain_list); +static LIST_HEAD(erofs_domain_cookies_list); static struct vfsmount *erofs_pseudo_mnt;
static int erofs_anon_init_fs_context(struct fs_context *fc) @@ -289,8 +290,6 @@ const struct address_space_operations erofs_fscache_access_aops = {
static void erofs_fscache_domain_put(struct erofs_domain *domain) { - if (!domain) - return; mutex_lock(&erofs_domain_list_lock); if (refcount_dec_and_test(&domain->ref)) { list_del(&domain->list); @@ -408,6 +407,8 @@ struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return ERR_PTR(-ENOMEM); + INIT_LIST_HEAD(&ctx->node); + refcount_set(&ctx->ref, 1);
cookie = fscache_acquire_cookie(EROFS_SB(sb)->volume, &erofs_fscache_inode_object_def, @@ -454,6 +455,7 @@ static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx) //fscache_unuse_cookie(ctx->cookie, NULL, NULL); fscache_relinquish_cookie(ctx->cookie, NULL, false); iput(ctx->inode); + iput(ctx->anon_inode); kfree(ctx->name); kfree(ctx); } @@ -486,6 +488,7 @@ struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
ctx->domain = domain; ctx->anon_inode = inode; + list_add(&ctx->node, &erofs_domain_cookies_list); inode->i_private = ctx; refcount_inc(&domain->ref); return ctx; @@ -499,29 +502,23 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, char *name, unsigned int flags) { - struct inode *inode; struct erofs_fscache *ctx; struct erofs_domain *domain = EROFS_SB(sb)->domain; - struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
mutex_lock(&erofs_domain_cookies_lock); - spin_lock(&psb->s_inode_list_lock); - list_for_each_entry(inode, &psb->s_inodes, i_sb_list) { - ctx = inode->i_private; - if (!ctx || ctx->domain != domain || strcmp(ctx->name, name)) + list_for_each_entry(ctx, &erofs_domain_cookies_list, node) { + if (ctx->domain != domain || strcmp(ctx->name, name)) continue; if (!(flags & EROFS_REG_COOKIE_NEED_NOEXIST)) { - igrab(inode); + refcount_inc(&ctx->ref); } else { erofs_err(sb, "%s already exists in domain %s", name, domain->domain_id); ctx = ERR_PTR(-EEXIST); } - spin_unlock(&psb->s_inode_list_lock); mutex_unlock(&erofs_domain_cookies_lock); return ctx; } - spin_unlock(&psb->s_inode_list_lock); ctx = erofs_fscache_domain_init_cookie(sb, name, flags); mutex_unlock(&erofs_domain_cookies_lock); return ctx; @@ -538,23 +535,22 @@ struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) { - bool drop; - struct erofs_domain *domain; + struct erofs_domain *domain = NULL;
if (!ctx) return; - domain = ctx->domain; - if (domain) { - mutex_lock(&erofs_domain_cookies_lock); - drop = atomic_read(&ctx->anon_inode->i_count) == 1; - iput(ctx->anon_inode); - mutex_unlock(&erofs_domain_cookies_lock); - if (!drop) - return; - } + if (!ctx->domain) + return erofs_fscache_relinquish_cookie(ctx);
- erofs_fscache_relinquish_cookie(ctx); - erofs_fscache_domain_put(domain); + mutex_lock(&erofs_domain_cookies_lock); + if (refcount_dec_and_test(&ctx->ref)) { + domain = ctx->domain; + list_del(&ctx->node); + erofs_fscache_relinquish_cookie(ctx); + } + mutex_unlock(&erofs_domain_cookies_lock); + if (domain) + erofs_fscache_domain_put(domain); }
int erofs_fscache_register_fs(struct super_block *sb) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 905a08516cc6..f53d7042a4a7 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -93,7 +93,11 @@ struct erofs_fscache { struct fscache_cookie *cookie; struct inode *inode; struct inode *anon_inode; + + /* used for share domain mode */ struct erofs_domain *domain; + struct list_head node; + refcount_t ref; char *name; };
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the erofs ondemand loading scenario, repeatedly mount and umount the image with identical fsid or device_id may occasionally result "duplicate cookie found!" error. This is because the cookie is still hashed.
The cookie only unhashed when the cookie->ref reaching zero, which making it undetectable. Cookie->ref count reaching zero depends on 3 conditions: a) fscache_object exits the state machine process b) user-space close requests being issued c) the completion of the umount process.
In other words, if the object has not exited the state machine or if the user-space has not issued a close request after the umount process, an error will occur during mounting. Linux-mainline solves this problem by overwriting fscache and cachefiles, adding a synchronous wait mechanism during mount. To address the issue in this version, we can borrow the idea of mainline, adding a waiting mechanism to ensure that the next mount operation waits until the duplicate cookie is unhashed. In addition, the waiting mechanism is interruptible to avoid hang.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 49 ++++++++++++++++++++++++++++++++++++++--- include/linux/fscache.h | 2 ++ 2 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 6104f627cc71..e6341f85d9f3 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -159,6 +159,7 @@ struct fscache_cookie *fscache_alloc_cookie(
cookie->def = def; cookie->parent = parent; + cookie->collision = NULL; cookie->netfs_data = netfs_data; cookie->flags = (1 << FSCACHE_COOKIE_NO_DATA_YET); cookie->type = def->type; @@ -176,6 +177,27 @@ struct fscache_cookie *fscache_alloc_cookie( return NULL; }
+static bool fscache_is_acquire_pending(struct fscache_cookie *cookie) +{ + return test_bit(FSCACHE_COOKIE_ACQUIRE_PENDING, &cookie->flags); +} + +static int fscache_wait_on_cookie_collision(struct fscache_cookie *candidate) +{ + int ret; + + ret = wait_on_bit_timeout(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE, 20 * HZ); + if (ret == -EINTR) + return ret; + if (fscache_is_acquire_pending(candidate)) { + pr_notice("Potential cookie collision!"); + return wait_on_bit(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE); + } + return 0; +} + /* * Attempt to insert the new cookie into the hash. If there's a collision, we * return the old cookie if it's not in use and an error otherwise. @@ -192,8 +214,13 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
hlist_bl_lock(h); hlist_bl_for_each_entry(cursor, p, h, hash_link) { - if (fscache_compare_cookie(candidate, cursor) == 0) - goto collision; + if (fscache_compare_cookie(candidate, cursor) == 0) { + if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags)) + goto collision; + cursor->collision = candidate; + set_bit(FSCACHE_COOKIE_ACQUIRE_PENDING, &candidate->flags); + break; + } }
__set_bit(FSCACHE_COOKIE_ACQUIRED, &candidate->flags); @@ -201,16 +228,29 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate) atomic_inc(&candidate->parent->n_children); hlist_bl_add_head(&candidate->hash_link, h); hlist_bl_unlock(h); + + if (fscache_is_acquire_pending(candidate) && + fscache_wait_on_cookie_collision(candidate)) { + fscache_cookie_put(candidate->parent, fscache_cookie_put_acquire_nobufs); + atomic_dec(&candidate->parent->n_children); + hlist_bl_lock(h); + hlist_bl_del(&candidate->hash_link); + if (fscache_is_acquire_pending(candidate)) + cursor->collision = NULL; + hlist_bl_unlock(h); + pr_err("Wait duplicate cookie unhashed interrupted\n"); + return NULL; + } return candidate;
collision: if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags)) { trace_fscache_cookie(cursor, fscache_cookie_collision, atomic_read(&cursor->usage)); - pr_err("Duplicate cookie detected\n"); fscache_print_cookie(cursor, 'O'); fscache_print_cookie(candidate, 'N'); hlist_bl_unlock(h); + pr_err("Duplicate cookie detected\n"); return NULL; }
@@ -835,6 +875,9 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie)
hlist_bl_lock(h); hlist_bl_del(&cookie->hash_link); + if (cookie->collision) + clear_and_wake_up_bit(FSCACHE_COOKIE_ACQUIRE_PENDING, + &cookie->collision->flags); hlist_bl_unlock(h); }
diff --git a/include/linux/fscache.h b/include/linux/fscache.h index f262446f3a49..6931c144a2fd 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -139,6 +139,7 @@ struct fscache_cookie { struct hlist_head backing_objects; /* object(s) backing this file/index */ const struct fscache_cookie_def *def; /* definition */ struct fscache_cookie *parent; /* parent of this entry */ + struct fscache_cookie *collision; /* collision cookie */ struct hlist_bl_node hash_link; /* Link in hash table */ void *netfs_data; /* back pointer to netfs */ struct radix_tree_root stores; /* pages to be stored on this cookie */ @@ -156,6 +157,7 @@ struct fscache_cookie { #define FSCACHE_COOKIE_AUX_UPDATED 8 /* T if the auxiliary data was updated */ #define FSCACHE_COOKIE_ACQUIRED 9 /* T if cookie is in use */ #define FSCACHE_COOKIE_RELINQUISHING 10 /* T if cookie is being relinquished */ +#define FSCACHE_COOKIE_ACQUIRE_PENDING 11 /* T if cookie is waiting to complete acquisition */
u8 type; /* Type of object */ u8 key_len; /* Length of index key */
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
A new filed 'new_location' is added in fscache_cookie_def() to determine the behaviour, two new helpers is also added to judge if 'new_location' is set for volume or data cookie.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- include/linux/fscache.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 6931c144a2fd..40ede2ff29dc 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -70,6 +70,12 @@ struct fscache_cookie_def { #define FSCACHE_COOKIE_TYPE_INDEX 0 #define FSCACHE_COOKIE_TYPE_DATAFILE 1
+ /* + * Used for index cookie. If set, the location of cachefile will be the + * same as mainline kernel v5.18+. + */ + bool new_location; + /* select the cache into which to insert an entry in this index * - optional * - should return a cache identifier or NULL to cause the cache to be @@ -876,4 +882,18 @@ void fscache_enable_cookie(struct fscache_cookie *cookie, can_enable, data); }
+static inline bool volume_new_location(struct fscache_cookie *cookie) +{ + return cookie->def && cookie->type == FSCACHE_COOKIE_TYPE_INDEX && + cookie->def->new_location; +} + +static inline bool data_new_location(struct fscache_cookie *cookie) +{ + if (cookie->type != FSCACHE_COOKIE_TYPE_DATAFILE) + return false; + + return cookie->parent && volume_new_location(cookie->parent); +} + #endif /* _LINUX_FSCACHE_H */
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently index or data cookie use the same way to generate key_hash:
fscache(0, key, keylen).
However, this is different from mainline:
- volume: 1) buf = 'keylen' + key 2) fscache(0, buf, buflen) - data: fscache(volume key_hash, key, keylen)
Noted that key_hash of data cookie will be used to determine the location of cachefile. Hence convert to use the same key_hash as mainline for the case new location.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 47 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index e6341f85d9f3..fd1a05821a6c 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -65,6 +65,38 @@ void fscache_free_cookie(struct fscache_cookie *cookie) } }
+static int fscache_set_volume_key_hash(struct fscache_cookie *cookie, u32 *buf) +{ + u8 *key; + size_t hlen = round_up(1 + cookie->key_len + 1, sizeof(__le32)); + + key = kzalloc(hlen, GFP_KERNEL); + if (!key) + return -ENOMEM; + + key[0] = cookie->key_len; + memcpy(key + 1, buf, cookie->key_len); + cookie->key_hash = fscache_hash(0, (u32 *)key, hlen / sizeof(__le32)); + kfree(key); + + return 0; +} + +static int fscache_set_key_hash(struct fscache_cookie *cookie, u32 *buf, + int bufs) +{ + unsigned int salt = 0; + + if (volume_new_location(cookie)) + return fscache_set_volume_key_hash(cookie, buf); + + if (data_new_location(cookie)) + salt = cookie->parent->key_hash; + + cookie->key_hash = fscache_hash(salt, buf, bufs); + return 0; +} + /* * Set the index key in a cookie. The cookie struct has space for a 16-byte * key plus length and hash, but if that's not big enough, it's instead a @@ -76,6 +108,7 @@ static int fscache_set_key(struct fscache_cookie *cookie, { u32 *buf; int bufs; + int ret;
bufs = DIV_ROUND_UP(index_key_len, sizeof(*buf));
@@ -89,8 +122,12 @@ static int fscache_set_key(struct fscache_cookie *cookie, }
memcpy(buf, index_key, index_key_len); - cookie->key_hash = fscache_hash(0, buf, bufs); - return 0; + ret = fscache_set_key_hash(cookie, buf, bufs); + if (ret && index_key_len > sizeof(cookie->inline_key)) { + kfree(cookie->key); + cookie->key = NULL; + } + return ret; }
static long fscache_compare_cookie(const struct fscache_cookie *a, @@ -137,6 +174,9 @@ struct fscache_cookie *fscache_alloc_cookie(
cookie->key_len = index_key_len; cookie->aux_len = aux_data_len; + cookie->def = def; + cookie->parent = parent; + cookie->type = def->type;
if (fscache_set_key(cookie, index_key, index_key_len) < 0) goto nomem; @@ -157,12 +197,9 @@ struct fscache_cookie *fscache_alloc_cookie( */ atomic_set(&cookie->n_active, 1);
- cookie->def = def; - cookie->parent = parent; cookie->collision = NULL; cookie->netfs_data = netfs_data; cookie->flags = (1 << FSCACHE_COOKIE_NO_DATA_YET); - cookie->type = def->type; spin_lock_init(&cookie->lock); spin_lock_init(&cookie->stores_lock); INIT_HLIST_HEAD(&cookie->backing_objects);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
For each cookie, cache dir/file name is combined by:
type + acc + key
And the member 'acc' doesn't exist in mainline. There are no functional changes for now, and prepare to keep cachefile location the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index be96f5fc5cac..c43808c82de0 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -22,6 +22,14 @@ static const char cachefiles_filecharmap[256] = { [48 ... 127] = 1, /* '0' -> '~' */ };
+static void cachefiles_cook_acc(char *key, unsigned int acc, int *len) +{ + key[*len + 1] = cachefiles_charmap[acc & 63]; + acc >>= 6; + key[*len] = cachefiles_charmap[acc & 63]; + *len += 2; +} + /* * turn the raw key into something cooked * - the raw key should include the length in the two bytes at the front @@ -86,14 +94,8 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) mark = len - 1;
if (print) { - acc = *(uint16_t *) raw; + cachefiles_cook_acc(key, *(uint16_t *) raw, &len); raw += 2; - - key[len + 1] = cachefiles_charmap[acc & 63]; - acc >>= 6; - key[len] = cachefiles_charmap[acc & 63]; - len += 2; - seg = 250; for (loop = keylen; loop > 0; loop--) { if (seg <= 0) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
cachefile is combined by: hulk-5.10: index cookie + data cookie linux-next: volume + cookie
1) for index cookie, cache dir is combined by csum + cache dir; 2) for volume from mainline, cache dir doens't contain csum; 3) for data cookie, cache file is combined by csum + cache file;
On the one hand, the way to generate csum is different from mainline, on the other hand, csum only exist for data cookie for mainline. There are no functional changes for now, and prepare to keep cachefile location the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index c43808c82de0..1e2f840dabaa 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -30,6 +30,18 @@ static void cachefiles_cook_acc(char *key, unsigned int acc, int *len) *len += 2; }
+static int cachefiles_cook_csum(const u8 *raw, int keylen, char *key) +{ + unsigned char csum = 0; + int loop; + + for (loop = 0; loop < keylen; loop++) + csum += raw[loop]; + sprintf(key, "@%02x%c+", (unsigned int) csum, 0); + + return 5; +} + /* * turn the raw key into something cooked * - the raw key should include the length in the two bytes at the front @@ -40,7 +52,6 @@ static void cachefiles_cook_acc(char *key, unsigned int acc, int *len) */ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) { - unsigned char csum, ch; unsigned int acc; char *key; int loop, len, max, seg, mark, print; @@ -49,13 +60,9 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type)
BUG_ON(keylen < 2 || keylen > 514);
- csum = raw[0] + raw[1]; print = 1; - for (loop = 2; loop < keylen; loop++) { - ch = raw[loop]; - csum += ch; - print &= cachefiles_filecharmap[ch]; - } + for (loop = 2; loop < keylen; loop++) + print &= cachefiles_filecharmap[raw[loop]];
if (print) { /* if the path is usable ASCII, then we render it directly */ @@ -86,11 +93,7 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) if (!key) return NULL;
- len = 0; - - /* build the cooked key */ - sprintf(key, "@%02x%c+", (unsigned) csum, 0); - len = 5; + len = cachefiles_cook_csum(raw, keylen, key); mark = len - 1;
if (print) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently cookie is combined with:
csum + acc + key
For 'key', if it contains space, then cachefiles_cook_key() will generate new charmap to replace the ascii directly. However, from mainline this is only used for data cookie, and volume is always use ascii directly.
Hence is new_location is set, use ascii directly for volume cookie to keep cachefile location the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 2 +- fs/cachefiles/internal.h | 3 ++- fs/cachefiles/key.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 5eedd6382737..a13089d06e16 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -77,7 +77,7 @@ static struct fscache_object *cachefiles_alloc_object( ((char *)buffer)[keylen + 4] = 0;
/* turn the raw key into something that can work with as a filename */ - key = cachefiles_cook_key(buffer, keylen + 2, object->type); + key = cachefiles_cook_key(object, buffer, keylen + 2); if (!key) goto nomem_key;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index f975042c1658..510a895efe7a 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -195,7 +195,8 @@ extern const struct fscache_cache_ops cachefiles_cache_ops; /* * key.c */ -extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type); +extern char *cachefiles_cook_key(struct cachefiles_object *object, + const u8 *raw, int keylen);
/* * namei.c diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 1e2f840dabaa..f94847a44ab5 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -50,19 +50,24 @@ static int cachefiles_cook_csum(const u8 *raw, int keylen, char *key) * cooked * - need to cut the cooked key into 252 char lengths (189 raw bytes) */ -char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) +char *cachefiles_cook_key(struct cachefiles_object *object, + const u8 *raw, int keylen) { unsigned int acc; char *key; int loop, len, max, seg, mark, print; + uint8_t type = object->type; + struct fscache_cookie *cookie = object->fscache.cookie;
_enter(",%d", keylen);
BUG_ON(keylen < 2 || keylen > 514);
print = 1; - for (loop = 2; loop < keylen; loop++) - print &= cachefiles_filecharmap[raw[loop]]; + if (!volume_new_location(cookie)) { + for (loop = 2; loop < keylen; loop++) + print &= cachefiles_filecharmap[raw[loop]]; + }
if (print) { /* if the path is usable ASCII, then we render it directly */
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently cookie is combined with:
csum + acc + key
However, acc doesn't exist in mainline anymore, hence if new_location is set, skip acc to keep cachefile location the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index f94847a44ab5..53643600fd1b 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -102,7 +102,8 @@ char *cachefiles_cook_key(struct cachefiles_object *object, mark = len - 1;
if (print) { - cachefiles_cook_acc(key, *(uint16_t *) raw, &len); + if (!volume_new_location(cookie) && !data_new_location(cookie)) + cachefiles_cook_acc(key, *(uint16_t *) raw, &len); raw += 2; seg = 250; for (loop = keylen; loop > 0; loop--) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently cookie is combined with:
csum + acc + key
However csum only exist for data cookie from mainline, hence if new location is set, skip csum for volume to keep cachefile location the same as mainline.
Noted that csum construction for data cookie is also different from mainline, and following patches will adapt this.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 53643600fd1b..89dac9b90c45 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -30,11 +30,15 @@ static void cachefiles_cook_acc(char *key, unsigned int acc, int *len) *len += 2; }
-static int cachefiles_cook_csum(const u8 *raw, int keylen, char *key) +static int cachefiles_cook_csum(struct fscache_cookie *cookie, const u8 *raw, + int keylen, char *key) { unsigned char csum = 0; int loop;
+ if (volume_new_location(cookie)) + return 1; + for (loop = 0; loop < keylen; loop++) csum += raw[loop]; sprintf(key, "@%02x%c+", (unsigned int) csum, 0); @@ -98,7 +102,7 @@ char *cachefiles_cook_key(struct cachefiles_object *object, if (!key) return NULL;
- len = cachefiles_cook_csum(raw, keylen, key); + len = cachefiles_cook_csum(cookie, raw, keylen, key); mark = len - 1;
if (print) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
cachefile of data cookie is combined with:
hulk-5.10: csum + acc + key mainline: key_hash + key
Hence convert to use key_hash as csum, so that cachefile location is the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 89dac9b90c45..2895ab1a38da 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -39,8 +39,12 @@ static int cachefiles_cook_csum(struct fscache_cookie *cookie, const u8 *raw, if (volume_new_location(cookie)) return 1;
- for (loop = 0; loop < keylen; loop++) - csum += raw[loop]; + if (data_new_location(cookie)) { + csum = (u8)cookie->key_hash; + } else { + for (loop = 0; loop < keylen; loop++) + csum += raw[loop]; + } sprintf(key, "@%02x%c+", (unsigned int) csum, 0);
return 5;
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
If input keyname contain character that is unprintable, the treatment from mainline is totally different. Hence bance port the treatment from mainline for the case new_location, so that cachefile location is the same as mainline.
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 110 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 2895ab1a38da..772ad69c49ba 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -22,6 +22,11 @@ static const char cachefiles_filecharmap[256] = { [48 ... 127] = 1, /* '0' -> '~' */ };
+static inline unsigned int how_many_hex_digits(unsigned int x) +{ + return x ? round_up(ilog2(x) + 1, 4) / 4 : 0; +} + static void cachefiles_cook_acc(char *key, unsigned int acc, int *len) { key[*len + 1] = cachefiles_charmap[acc & 63]; @@ -50,6 +55,85 @@ static int cachefiles_cook_csum(struct fscache_cookie *cookie, const u8 *raw, return 5; }
+static char *cachefiles_cook_data_key(const u8 *key, int keylen) +{ + const u8 *kend; + unsigned int acc, i, n, nle, nbe; + unsigned int b64len, len, pad; + char *name, sep; + + /* See if it makes sense to encode it as "hex,hex,hex" for each 32-bit + * chunk. We rely on the key having been padded out to a whole number + * of 32-bit words. + */ + n = round_up(keylen, 4); + nbe = nle = 0; + for (i = 0; i < n; i += 4) { + u32 be = be32_to_cpu(*(__be32 *)(key + i)); + u32 le = le32_to_cpu(*(__le32 *)(key + i)); + + nbe += 1 + how_many_hex_digits(be); + nle += 1 + how_many_hex_digits(le); + } + + b64len = DIV_ROUND_UP(keylen, 3); + pad = b64len * 3 - keylen; + b64len = 2 + b64len * 4; /* Length if we base64-encode it */ + _debug("len=%u nbe=%u nle=%u b64=%u", keylen, nbe, nle, b64len); + if (nbe < b64len || nle < b64len) { + unsigned int nlen = min(nbe, nle) + 1; + + name = kmalloc(nlen, GFP_KERNEL); + if (!name) + return NULL; + sep = (nbe <= nle) ? 'S' : 'T'; /* Encoding indicator */ + len = 0; + for (i = 0; i < n; i += 4) { + u32 x; + + if (nbe <= nle) + x = be32_to_cpu(*(__be32 *)(key + i)); + else + x = le32_to_cpu(*(__le32 *)(key + i)); + name[len++] = sep; + if (x != 0) + len += snprintf(name + len, nlen - len, "%x", x); + sep = ','; + } + name[len] = 0; + return name; + } + + /* We need to base64-encode it */ + name = kmalloc(b64len + 1, GFP_KERNEL); + if (!name) + return NULL; + + name[0] = 'E'; + name[1] = '0' + pad; + len = 2; + kend = key + keylen; + do { + acc = *key++; + if (key < kend) { + acc |= *key++ << 8; + if (key < kend) + acc |= *key++ << 16; + } + + name[len++] = cachefiles_charmap[acc & 63]; + acc >>= 6; + name[len++] = cachefiles_charmap[acc & 63]; + acc >>= 6; + name[len++] = cachefiles_charmap[acc & 63]; + acc >>= 6; + name[len++] = cachefiles_charmap[acc & 63]; + } while (key < kend); + + name[len] = 0; + return name; +} + /* * turn the raw key into something cooked * - the raw key should include the length in the two bytes at the front @@ -86,6 +170,9 @@ char *cachefiles_cook_key(struct cachefiles_object *object, * is ((514 + 251) / 252) = 3 */ max += 1; /* NUL on end */ + } else if (data_new_location(cookie)) { + max = 5; /* @checksum/M */ + max += 1; /* NUL on end */ } else { /* calculate the maximum length of the cooked key */ keylen = (keylen + 2) / 3; @@ -131,6 +218,29 @@ char *cachefiles_cook_key(struct cachefiles_object *object, case FSCACHE_COOKIE_TYPE_DATAFILE: type = 'D'; break; default: type = 'S'; break; } + } else if (data_new_location(cookie)) { + int nlen; + char *name = cachefiles_cook_data_key(raw + 2, keylen - 2); + char *new_key; + + if (!name) { + kfree(key); + return NULL; + } + + nlen = max + strlen(name) - 1; + new_key = krealloc(key, nlen, GFP_KERNEL); + if (!new_key) { + kfree(key); + kfree(name); + return NULL; + } + + key = new_key; + type = name[0]; + for (loop = 1; loop < strlen(name); loop++) + key[len++] = name[loop]; + kfree(name); } else { seg = 252; for (loop = keylen; loop > 0; loop--) {
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
To avoid the 5.10 upgrade causing existing caches to be re-downloaded due to a different directory structure, first remove the "@24/I05erofs" directory by removing erofs_fscache_netfs.
before: .../cache/@24/I05erofs/@16/I0gerofs,erofs1.img/@c5/D0aerofs1.img after: .../cache/@16/I0gerofs,erofs1.img/@c5/D0aerofs1.img mainline: .../cache/Ierofs,erofs1.img/@ba/Derofs1.img
Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Yu Kuai yukuai3@huawei.com --- fs/erofs/fscache.c | 18 ++---------------- fs/erofs/internal.h | 7 ------- fs/erofs/super.c | 7 ------- 3 files changed, 2 insertions(+), 30 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index e6fa21ce686f..f0b9815f56f6 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -4,6 +4,7 @@ */ #include <linux/pseudo_fs.h> #include <linux/fscache.h> +#include <linux/fscache-cache.h> #include <linux/mount.h> #include "internal.h"
@@ -25,21 +26,6 @@ static struct file_system_type erofs_anon_fs_type = { .kill_sb = kill_anon_super, };
-struct fscache_netfs erofs_fscache_netfs = { - .name = "erofs", - .version = 0, -}; - -int erofs_fscache_register(void) -{ - return fscache_register_netfs(&erofs_fscache_netfs); -} - -void erofs_fscache_unregister(void) -{ - fscache_unregister_netfs(&erofs_fscache_netfs); -} - const struct fscache_cookie_def erofs_fscache_super_index_def = { .name = "EROFS.super", .type = FSCACHE_COOKIE_TYPE_INDEX, @@ -319,7 +305,7 @@ static int erofs_fscache_register_volume(struct super_block *sb) if (!name) return -ENOMEM;
- volume = fscache_acquire_cookie(erofs_fscache_netfs.primary_index, + volume = fscache_acquire_cookie(&fscache_fsdef_index, &erofs_fscache_super_index_def, name, strlen(name), NULL, 0, NULL, 0, true); if (IS_ERR_OR_NULL(volume)) { diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index f53d7042a4a7..f961256bd469 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -501,8 +501,6 @@ static inline void z_erofs_exit_zip_subsystem(void) {}
/* fscache.c */ #ifdef CONFIG_EROFS_FS_ONDEMAND -int erofs_fscache_register(void); -void erofs_fscache_unregister(void); int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb);
@@ -512,11 +510,6 @@ struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache); extern const struct address_space_operations erofs_fscache_access_aops; #else -static inline int erofs_fscache_register(void) -{ - return 0; -} -static inline void erofs_fscache_unregister(void) {} static inline int erofs_fscache_register_fs(struct super_block *sb) { return -EOPNOTSUPP; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 769b32cd2cae..eb9f7b71ec14 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -753,10 +753,6 @@ static int __init erofs_module_init(void) if (err) goto zip_err;
- err = erofs_fscache_register(); - if (err) - goto fscache_err; - err = register_filesystem(&erofs_fs_type); if (err) goto fs_err; @@ -764,8 +760,6 @@ static int __init erofs_module_init(void) return 0;
fs_err: - erofs_fscache_unregister(); -fscache_err: z_erofs_exit_zip_subsystem(); zip_err: erofs_exit_shrinker(); @@ -778,7 +772,6 @@ static int __init erofs_module_init(void) static void __exit erofs_module_exit(void) { unregister_filesystem(&erofs_fs_type); - erofs_fscache_unregister(); z_erofs_exit_zip_subsystem(); erofs_exit_shrinker();
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
So that cachefile location is the same with mainline kernel, and cachefile can still be used after upgrading kernel.
test 1) domain_id="domain", fsid="test.img", device="blob1.img" - hulk-5.10: ./@24/I05erofs/@cf/I0cerofs,domain/@33/D08test.img ./@24/I05erofs/@cf/I0cerofs,domain/@44/D09blob1.img - hulk-5.10 with this set: ./Ierofs,domain/@ff/Dtest.img ./Ierofs,domain/@d6/Dblob1.img - mainline: ./Ierofs,domain/@ff/Dtest.img ./Ierofs,domain/@d6/Dblob1.img
test 2) fsid="test.img", device="blob1.img" - hulk-5.10: ./@24/I05erofs/@84/I0eerofs,test.img/@33/D08test.img ./@24/I05erofs/@84/I0eerofs,test.img/@44/D09blob1.img - hulk-5.10 with this set: ./Ierofs,test.img/@f7/Dtest.img ./Ierofs,test.img/@cb/Dblob1.img - mainline: ./Ierofs,test.img/@f7/Dtest.img ./Ierofs,test.img/@cb/Dblob1.img
test 3) domain_id="d o m a i n", fsid="t e s t.img", device="b l o b 1.img" - hulk-5.10: ./@24/I05erofs/@74/Jh0gpOZCpPN2pwY68J1iowA68K100/@96/Eb00twk68P12tKAmrD100 ./@24/I05erofs/@74/Jh0gpOZCpPN2pwY68J1iowA68K100/@c8/Ed0wowM68L1yow4zbFRSp - hulk-5.10 with this set: ./Ierofs,d o m a i n/@9a/E1Q1ipwc78QViqJt60 ./Ierofs,d o m a i n/@e8/E2y12rwY68y1icKAmrD100 - mainline: ./Ierofs,d o m a i n/@9a/E1Q1ipwc78QViqJt60 ./Ierofs,d o m a i n/@e8/E2y12rwY68y1icKAmrD100
test 4) fsid="t e s t.img", device="b l o b 1.img" - hulk-5.10: ./@24/I05erofs/@e7/Jh0gpOZCpPN2twk68P12tKAmrD100/@96/Eb00twk68P12tKAmrD100 ./@24/I05erofs/@e7/Jh0gpOZCpPN2twk68P12tKAmrD100/@c8/Ed0wowM68L1yow4zbFRSp - hulk-5.10 with this set ./Ierofs,t e s t.img/@18/E1Q1ipwc78QViqJt60 ./Ierofs,t e s t.img/@86/E2y12rwY68y1icKAmrD100 - mainline ./Ierofs,t e s t.img/@18/E1Q1ipwc78QViqJt60 ./Ierofs,t e s t.img/@86/E2y12rwY68y1icKAmrD100
Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index f0b9815f56f6..2389b50c7906 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -30,6 +30,7 @@ const struct fscache_cookie_def erofs_fscache_super_index_def = { .name = "EROFS.super", .type = FSCACHE_COOKIE_TYPE_INDEX, .check_aux = NULL, + .new_location = true, };
const struct fscache_cookie_def erofs_fscache_inode_object_def = {
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
So that the same helper function can be used when determining whether a new version of xattr is needed.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/key.c | 12 ++++++------ fs/erofs/fscache.c | 2 +- fs/fscache/cookie.c | 4 ++-- include/linux/fscache.h | 14 +++++++------- 4 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 772ad69c49ba..d1adeb58f35c 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -41,10 +41,10 @@ static int cachefiles_cook_csum(struct fscache_cookie *cookie, const u8 *raw, unsigned char csum = 0; int loop;
- if (volume_new_location(cookie)) + if (volume_new_version(cookie)) return 1;
- if (data_new_location(cookie)) { + if (data_new_version(cookie)) { csum = (u8)cookie->key_hash; } else { for (loop = 0; loop < keylen; loop++) @@ -156,7 +156,7 @@ char *cachefiles_cook_key(struct cachefiles_object *object, BUG_ON(keylen < 2 || keylen > 514);
print = 1; - if (!volume_new_location(cookie)) { + if (!volume_new_version(cookie)) { for (loop = 2; loop < keylen; loop++) print &= cachefiles_filecharmap[raw[loop]]; } @@ -170,7 +170,7 @@ char *cachefiles_cook_key(struct cachefiles_object *object, * is ((514 + 251) / 252) = 3 */ max += 1; /* NUL on end */ - } else if (data_new_location(cookie)) { + } else if (data_new_version(cookie)) { max = 5; /* @checksum/M */ max += 1; /* NUL on end */ } else { @@ -197,7 +197,7 @@ char *cachefiles_cook_key(struct cachefiles_object *object, mark = len - 1;
if (print) { - if (!volume_new_location(cookie) && !data_new_location(cookie)) + if (!volume_new_version(cookie) && !data_new_version(cookie)) cachefiles_cook_acc(key, *(uint16_t *) raw, &len); raw += 2; seg = 250; @@ -218,7 +218,7 @@ char *cachefiles_cook_key(struct cachefiles_object *object, case FSCACHE_COOKIE_TYPE_DATAFILE: type = 'D'; break; default: type = 'S'; break; } - } else if (data_new_location(cookie)) { + } else if (data_new_version(cookie)) { int nlen; char *name = cachefiles_cook_data_key(raw + 2, keylen - 2); char *new_key; diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 2389b50c7906..17640adf574d 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -30,7 +30,7 @@ const struct fscache_cookie_def erofs_fscache_super_index_def = { .name = "EROFS.super", .type = FSCACHE_COOKIE_TYPE_INDEX, .check_aux = NULL, - .new_location = true, + .new_version = true, };
const struct fscache_cookie_def erofs_fscache_inode_object_def = { diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index fd1a05821a6c..bd4a6734d163 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -87,10 +87,10 @@ static int fscache_set_key_hash(struct fscache_cookie *cookie, u32 *buf, { unsigned int salt = 0;
- if (volume_new_location(cookie)) + if (volume_new_version(cookie)) return fscache_set_volume_key_hash(cookie, buf);
- if (data_new_location(cookie)) + if (data_new_version(cookie)) salt = cookie->parent->key_hash;
cookie->key_hash = fscache_hash(salt, buf, bufs); diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 40ede2ff29dc..d4af91675c3c 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -71,10 +71,10 @@ struct fscache_cookie_def { #define FSCACHE_COOKIE_TYPE_DATAFILE 1
/* - * Used for index cookie. If set, the location of cachefile will be the - * same as mainline kernel v5.18+. + * Used for index cookie. If set, the location/xattr of cachefiles + * will be the same as mainline kernel v5.18+. */ - bool new_location; + bool new_version;
/* select the cache into which to insert an entry in this index * - optional @@ -882,18 +882,18 @@ void fscache_enable_cookie(struct fscache_cookie *cookie, can_enable, data); }
-static inline bool volume_new_location(struct fscache_cookie *cookie) +static inline bool volume_new_version(struct fscache_cookie *cookie) { return cookie->def && cookie->type == FSCACHE_COOKIE_TYPE_INDEX && - cookie->def->new_location; + cookie->def->new_version; }
-static inline bool data_new_location(struct fscache_cookie *cookie) +static inline bool data_new_version(struct fscache_cookie *cookie) { if (cookie->type != FSCACHE_COOKIE_TYPE_DATAFILE) return false;
- return cookie->parent && volume_new_location(cookie->parent); + return cookie->parent && volume_new_version(cookie->parent); }
#endif /* _LINUX_FSCACHE_H */
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When upgrading the kernel version from 5.10 to 6.6 data that has been cached is re-downloaded due to inconsistencies between the two versions of xattr. To avoid this problem, in cachefiles ondemand mode, make 5.10 also use the mainline xatrr.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/xattr.c | 179 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 147 insertions(+), 32 deletions(-)
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c index 72e42438f3d7..4aadace026d2 100644 --- a/fs/cachefiles/xattr.c +++ b/fs/cachefiles/xattr.c @@ -18,6 +18,29 @@ static const char cachefiles_xattr_cache[] = XATTR_USER_PREFIX "CacheFiles.cache";
+#define CACHEFILES_COOKIE_TYPE_DATA 1 +#define CACHEFILES_CONTENT_NO_DATA 0 /* No content stored */ + +struct cachefiles_obj_xattr { + __be64 object_size; /* Actual size of the object */ + __be64 zero_point; /* always zero */ + __u8 type; /* Type of object */ + __u8 content; /* always zero */ + __u8 data[]; /* netfs coherency data, always NULL */ +} __packed; + +struct cachefiles_vol_xattr { + __be32 reserved; /* Reserved, should be 0 */ + __u8 data[]; /* netfs volume coherency data, NULL */ +} __packed; + +struct cachefiles_vol_xattr new_vol_xattr; + +static int cachefiles_set_new_vol_xattr(struct cachefiles_object *object); +static int cachefiles_check_new_vol_xattr(struct cachefiles_object *object); +static int cachefiles_set_new_obj_xattr(struct cachefiles_object *object); +static int cachefiles_check_new_obj_xattr(struct cachefiles_object *object); + /* * check the type label on an object * - done using xattrs @@ -110,9 +133,14 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object, _debug("SET #%u", auxdata->len);
clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags); - ret = vfs_setxattr(dentry, cachefiles_xattr_cache, - &auxdata->type, auxdata->len, - XATTR_CREATE); + if (data_new_version(object->fscache.cookie)) + ret = cachefiles_set_new_obj_xattr(object); + else if (volume_new_version(object->fscache.cookie)) + ret = cachefiles_set_new_vol_xattr(object); + else + ret = vfs_setxattr(dentry, cachefiles_xattr_cache, + &auxdata->type, auxdata->len, + XATTR_CREATE); if (ret < 0 && ret != -ENOMEM) cachefiles_io_error_obj( object, @@ -190,48 +218,30 @@ int cachefiles_check_auxdata(struct cachefiles_object *object) return ret; }
-/* - * check the state xattr on a cache file - * - return -ESTALE if the object should be deleted - */ -int cachefiles_check_object_xattr(struct cachefiles_object *object, - struct cachefiles_xattr *auxdata) +int cachefiles_check_old_object_xattr(struct cachefiles_object *object, + struct cachefiles_xattr *auxdata) { struct cachefiles_xattr *auxbuf; + unsigned int len = sizeof(struct cachefiles_xattr) + 512; struct dentry *dentry = object->dentry; int ret;
- _enter("%p,#%d", object, auxdata->len); - - ASSERT(dentry); - ASSERT(d_backing_inode(dentry)); - - auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, cachefiles_gfp); - if (!auxbuf) { - _leave(" = -ENOMEM"); + auxbuf = kmalloc(len, cachefiles_gfp); + if (!auxbuf) return -ENOMEM; - }
/* read the current type label */ ret = vfs_getxattr(dentry, cachefiles_xattr_cache, &auxbuf->type, 512 + 1); - if (ret < 0) { - if (ret == -ENODATA) - goto stale; /* no attribute - power went off - * mid-cull? */ - - if (ret == -ERANGE) - goto bad_type_length; - - cachefiles_io_error_obj(object, - "Can't read xattr on %lu (err %d)", - d_backing_inode(dentry)->i_ino, -ret); + if (ret < 0) goto error; - }
/* check the on-disk object */ - if (ret < 1) - goto bad_type_length; + if (ret < 1) { + pr_err("Cache object %lu xattr length incorrect\n", + d_backing_inode(dentry)->i_ino); + goto stale; + }
if (auxbuf->type != auxdata->type) goto stale; @@ -287,6 +297,51 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object,
error: kfree(auxbuf); + return ret; + +stale: + ret = -ESTALE; + goto error; +} + +/* + * check the state xattr on a cache file + * - return -ESTALE if the object should be deleted + */ +int cachefiles_check_object_xattr(struct cachefiles_object *object, + struct cachefiles_xattr *auxdata) +{ + int ret; + struct dentry *dentry = object->dentry; + + _enter("%p,#%d", object, auxdata->len); + + ASSERT(dentry); + ASSERT(d_backing_inode(dentry)); + + if (data_new_version(object->fscache.cookie)) + ret = cachefiles_check_new_obj_xattr(object); + else if (volume_new_version(object->fscache.cookie)) + ret = cachefiles_check_new_vol_xattr(object); + else + ret = cachefiles_check_old_object_xattr(object, auxdata); + + if (ret < 0) { + if (ret == -ENOMEM || ret == -ESTALE) + goto error; + /* no attribute - power went off mid-cull? */ + if (ret == -ENODATA) + goto stale; + if (ret == -ERANGE) + goto bad_type_length; + + cachefiles_io_error_obj(object, + "Can't read xattr on %lu (err %d)", + d_backing_inode(dentry)->i_ino, -ret); + goto error; + } + ret = 0; +error: _leave(" = %d", ret); return ret;
@@ -323,3 +378,63 @@ int cachefiles_remove_object_xattr(struct cachefiles_cache *cache, _leave(" = %d", ret); return ret; } + +static int cachefiles_set_new_vol_xattr(struct cachefiles_object *object) +{ + unsigned int len = sizeof(struct cachefiles_vol_xattr); + struct dentry *dentry = object->dentry; + + return vfs_setxattr(dentry, cachefiles_xattr_cache, &new_vol_xattr, + len, XATTR_CREATE); +} + +static int cachefiles_check_new_vol_xattr(struct cachefiles_object *object) +{ + int ret; + struct cachefiles_vol_xattr buf; + unsigned int len = sizeof(struct cachefiles_vol_xattr); + struct dentry *dentry = object->dentry; + + ret = vfs_getxattr(dentry, cachefiles_xattr_cache, &buf, len); + if (ret < 0) + return ret; + + if (ret != len || memcmp(&buf, &new_vol_xattr, len) != 0) + ret = -ESTALE; + + return ret > 0 ? 0 : ret; +} + +static int cachefiles_set_new_obj_xattr(struct cachefiles_object *object) +{ + unsigned int len = sizeof(struct cachefiles_obj_xattr); + struct dentry *dentry = object->dentry; + struct cachefiles_obj_xattr buf = { + .object_size = cpu_to_be64(object->fscache.store_limit_l), + .type = CACHEFILES_COOKIE_TYPE_DATA, + .content = CACHEFILES_CONTENT_NO_DATA, + }; + + return vfs_setxattr(dentry, cachefiles_xattr_cache, &buf, len, + XATTR_CREATE); +} + +static int cachefiles_check_new_obj_xattr(struct cachefiles_object *object) +{ + int ret; + struct cachefiles_obj_xattr buf; + unsigned int len = sizeof(struct cachefiles_obj_xattr); + struct dentry *dentry = object->dentry; + + ret = vfs_getxattr(dentry, cachefiles_xattr_cache, &buf, len); + if (ret < 0) + return ret; + + if (ret != len || + buf.type != CACHEFILES_COOKIE_TYPE_DATA || + buf.content != CACHEFILES_CONTENT_NO_DATA || + buf.object_size != cpu_to_be64(object->fscache.store_limit_l)) + ret = -ESTALE; + + return ret > 0 ? 0 : ret; +}
From: Gao Xiang hsiangkao@linux.alibaba.com
mainline inclusion from mainline-v6.1-rc1 commit 1dd73601a1cba37a0ed5f89a8662c90191df5873 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
As syzbot reported [1], the root cause is that i_size field is a signed type, and negative i_size is also less than EROFS_BLKSIZ. As a consequence, it's handled as fast symlink unexpectedly.
Let's fall back to the generic path to deal with such unusual i_size.
[1] https://lore.kernel.org/r/000000000000ac8efa05e7feaa1f@google.com
Reported-by: syzbot+f966c13b1b4fc0403b19@syzkaller.appspotmail.com Fixes: 431339ba9042 ("staging: erofs: add inode operations") Reviewed-by: Yue Hu huyue2@coolpad.com Link: https://lore.kernel.org/r/20220909023948.28925-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index fc05960e228e..dbdee1964388 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -215,7 +215,7 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr,
/* if it cannot be handled with fast symlink scheme */ if (vi->datalayout != EROFS_INODE_FLAT_INLINE || - inode->i_size >= EROFS_BLKSIZ) { + inode->i_size >= EROFS_BLKSIZ || inode->i_size < 0) { inode->i_op = &erofs_symlink_iops; return 0; }
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
At present, the trace point of fscache_cookie_put() has the UAF problem of cookie. Following is the process that triggers the issue:
process A process B fscache_cookie_put atomic_dec_return(&cookie->usage) fscache_cookie_put atomic_dec_return(&cookie->usage) trace_fscache_cookie(cookie...) fscache_free_cookie trace_fscache_cookie(cookie...)
After process B has set the cookie free, process A calls trace and the cookie UAF problem occurs. Fix this by calling trace before decrement cookie->usage.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index bd4a6734d163..422248fa55ca 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -930,9 +930,8 @@ void fscache_cookie_put(struct fscache_cookie *cookie, _enter("%p", cookie);
do { + trace_fscache_cookie(cookie, where, atomic_read(&cookie->usage)); usage = atomic_dec_return(&cookie->usage); - trace_fscache_cookie(cookie, where, usage); - if (usage > 0) return; BUG_ON(usage < 0);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
The err_put_fd tag is only used once, so remove it to make the code more readable
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 493b8b8675f7..968376ea9ae6 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -366,7 +366,10 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
if (copy_to_user(_buffer, msg, n) != 0) { ret = -EFAULT; - goto err_put_fd; + if (msg->opcode == CACHEFILES_OP_OPEN) + __close_fd(current->files, + ((struct cachefiles_open *)msg->data)->fd); + goto error; }
/* CLOSE request has no reply */ @@ -378,10 +381,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, } return n;
-err_put_fd: - if (msg->opcode == CACHEFILES_OP_OPEN) - __close_fd(current->files, - ((struct cachefiles_open *)msg->data)->fd); error: xa_lock(&cache->reqs); radix_tree_delete(&cache->reqs, id);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0 Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542 Call Trace: kasan_report+0x94/0xc0 cachefiles_ondemand_daemon_read+0x609/0xab0 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 626: __kmalloc+0x1df/0x4b0 cachefiles_ondemand_send_req+0x24d/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 626: kfree+0xf1/0x2c0 cachefiles_ondemand_send_req+0x568/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd copy_to_user(_buffer, msg, n) process_open_req(REQ_A) ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW);
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req
write(devfd, ("copen %u,%llu", msg->msg_id, size)); cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) cachefiles_ondemand_get_fd(REQ_A) fd = get_unused_fd_flags file = anon_inode_getfile fd_install(fd, file) load = (void *)REQ_A->msg.data; load->fd = fd; // load UAF !!!
This issue is caused by issuing a restore command when the daemon is still alive, which results in a request being processed multiple times thus triggering a UAF. So to avoid this problem, add an additional reference count to cachefiles_req, which is held while waiting and reading, and then released when the waiting and reading is over.
Note that since there is only one reference count for waiting, we need to avoid the same request being completed multiple times, so we can only complete the request if it is successfully removed from the xarray.
Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests") Suggested-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 36 ++++++++++++++++++++---------------- 2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 510a895efe7a..e84a8a217ee3 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -123,6 +123,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache) struct cachefiles_req { struct cachefiles_object *object; struct completion done; + refcount_t ref; int error; struct cachefiles_msg msg; }; diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 968376ea9ae6..c1f8be89afca 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -9,6 +9,12 @@ static bool cachefiles_buffered_ondemand = true; module_param_named(buffered_ondemand, cachefiles_buffered_ondemand, bool, 0644);
+static inline void cachefiles_req_put(struct cachefiles_req *req) +{ + if (refcount_dec_and_test(&req->ref)) + kfree(req); +} + static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file) { @@ -349,6 +355,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
radix_tree_iter_tag_clear(&cache->reqs, &iter, CACHEFILES_REQ_NEW); cache->req_id_next = iter.index + 1; + refcount_inc(&req->ref); xa_unlock(&cache->reqs);
id = iter.index; @@ -357,7 +364,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, ret = cachefiles_ondemand_get_fd(req); if (ret) { cachefiles_ondemand_set_object_close(req->object); - goto error; + goto out; } }
@@ -369,25 +376,21 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, if (msg->opcode == CACHEFILES_OP_OPEN) __close_fd(current->files, ((struct cachefiles_open *)msg->data)->fd); - goto error; }
- /* CLOSE request has no reply */ - if (msg->opcode == CACHEFILES_OP_CLOSE) { +out: + /* Remove error request and CLOSE request has no reply */ + if (ret || msg->opcode == CACHEFILES_OP_CLOSE) { xa_lock(&cache->reqs); - radix_tree_delete(&cache->reqs, id); + if (radix_tree_lookup(&cache->reqs, id) == req) { + req->error = ret; + complete(&req->done); + radix_tree_delete(&cache->reqs, id); + } xa_unlock(&cache->reqs); - complete(&req->done); } - return n; - -error: - xa_lock(&cache->reqs); - radix_tree_delete(&cache->reqs, id); - xa_unlock(&cache->reqs); - req->error = ret; - complete(&req->done); - return ret; + cachefiles_req_put(req); + return ret ? ret : n; }
typedef int (*init_req_fn)(struct cachefiles_req *req, void *private); @@ -421,6 +424,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, goto out; }
+ refcount_set(&req->ref, 1); req->object = object; init_completion(&req->done); req->msg.opcode = opcode; @@ -473,7 +477,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, wake_up_all(&cache->daemon_pollwq); wait_for_completion(&req->done); ret = req->error; - kfree(req); + cachefiles_req_put(req); return ret; out: /* Reset the object to close state in error handling path.
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60 Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564 Call Trace: kasan_report+0x93/0xc0 cachefiles_ondemand_daemon_read+0xb41/0xb60 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 116: kmem_cache_alloc+0x140/0x3a0 cachefiles_lookup_cookie+0x140/0xcd0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 792: kmem_cache_free+0xfe/0x390 cachefiles_put_object+0x241/0x480 fscache_cookie_state_machine+0x5c8/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_withdraw_cookie cachefiles_ondemand_clean_object(object) cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req msg->object_id = req->object->ondemand->ondemand_id ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req copy_to_user(_buffer, msg, n) xa_erase(&cache->reqs, id) complete(&REQ_A->done) ------ close(fd) ------ cachefiles_ondemand_fd_release cachefiles_put_object cachefiles_put_object kmem_cache_free(cachefiles_object_jar, object) REQ_A->object->ondemand->ondemand_id // object UAF !!!
When we see the request within xa_lock, req->object must not have been released yet, so grab the reference count of object before xa_unlock to avoid the above issue.
Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed") Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 10 ++++++---- include/trace/events/cachefiles.h | 6 +++++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index c1f8be89afca..0eb2160add45 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -226,10 +226,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) u32 object_id; int ret, fd;
- ret = object->fscache.cache->ops->grab_object(&object->fscache, - cachefiles_obj_get_ondemand_fd) ? 0 : -EAGAIN; - if (ret) - return ret; + object->fscache.cache->ops->grab_object(&object->fscache, + cachefiles_obj_get_ondemand_fd);
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache); @@ -356,6 +354,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, radix_tree_iter_tag_clear(&cache->reqs, &iter, CACHEFILES_REQ_NEW); cache->req_id_next = iter.index + 1; refcount_inc(&req->ref); + req->object->fscache.cache->ops->grab_object(&req->object->fscache, + cachefiles_obj_get_read_req); xa_unlock(&cache->reqs);
id = iter.index; @@ -379,6 +379,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, }
out: + req->object->fscache.cache->ops->put_object(&req->object->fscache, + cachefiles_obj_put_read_req); /* Remove error request and CLOSE request has no reply */ if (ret || msg->opcode == CACHEFILES_OP_CLOSE) { xa_lock(&cache->reqs); diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index d09e369e9d1e..90d87f0cbe15 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -23,6 +23,8 @@ enum cachefiles_obj_ref_trace { cachefiles_obj_put_wait_timeo, cachefiles_obj_get_ondemand_fd, cachefiles_obj_put_ondemand_fd, + cachefiles_obj_get_read_req, + cachefiles_obj_put_read_req, cachefiles_obj_ref__nr_traces };
@@ -47,7 +49,9 @@ enum cachefiles_obj_ref_trace { EM(fscache_obj_put_queue, "PUT queue") \ EM(fscache_obj_put_work, "PUT work") \ EM(cachefiles_obj_put_wait_retry, "PUT wait_retry") \ - E_(cachefiles_obj_put_wait_timeo, "PUT wait_timeo") + EM(cachefiles_obj_put_wait_timeo, "PUT wait_timeo") \ + EM(cachefiles_obj_get_read_req, "GET read_req") \ + E_(cachefiles_obj_put_read_req, "PUT read_req")
/* * Export enum symbols via userspace.
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
This lets us see the correct trace output.
Signed-off-by: Baokun Li libaokun1@huawei.com --- include/trace/events/cachefiles.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index 90d87f0cbe15..05ecaf2245b2 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -50,6 +50,8 @@ enum cachefiles_obj_ref_trace { EM(fscache_obj_put_work, "PUT work") \ EM(cachefiles_obj_put_wait_retry, "PUT wait_retry") \ EM(cachefiles_obj_put_wait_timeo, "PUT wait_timeo") \ + EM(cachefiles_obj_get_ondemand_fd, "GET ondemand_fd") \ + EM(cachefiles_obj_put_ondemand_fd, "PUT ondemand_fd") \ EM(cachefiles_obj_get_read_req, "GET read_req") \ E_(cachefiles_obj_put_read_req, "PUT read_req")
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
We need to make sure that the type of request being completed matches the currently called function. In addition, the object corresponding to the cread fd should match the object corresponding to the request id. This prevents malicious processes from completing random copen/cread requests and crashing the system.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 0eb2160add45..544eba85b776 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -102,10 +102,14 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
id = arg; xa_lock(&cache->reqs); - req = radix_tree_delete(&cache->reqs, id); - xa_unlock(&cache->reqs); - if (!req) + req = radix_tree_lookup(&cache->reqs, id); + if (!req || req->msg.opcode != CACHEFILES_OP_READ || + req->object != object) { + xa_unlock(&cache->reqs); return -EINVAL; + } + radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs);
complete(&req->done); return 0; @@ -155,10 +159,13 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) return ret;
xa_lock(&cache->reqs); - req = radix_tree_delete(&cache->reqs, id); - xa_unlock(&cache->reqs); - if (!req) + req = radix_tree_lookup(&cache->reqs, id); + if (!req || req->msg.opcode != CACHEFILES_OP_OPEN) { + xa_unlock(&cache->reqs); return -EINVAL; + } + radix_tree_delete(&cache->reqs, id); + xa_unlock(&cache->reqs);
/* fail OPEN request if copen format is invalid */ ret = kstrtol(psize, 0, &size);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Added CACHEFILES_ONDEMAND_OBJSTATE_dropping Indicates that the cachefiles object is being dropped, and is set after the close request for the dropped object completes, and no new requests are allowed to be sent after this state.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 2 ++ fs/cachefiles/ondemand.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index e84a8a217ee3..1499548883e3 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -35,6 +35,7 @@ enum cachefiles_object_state { CACHEFILES_ONDEMAND_OBJSTATE_close, /* Anonymous fd closed by daemon or initial state */ CACHEFILES_ONDEMAND_OBJSTATE_open, /* Anonymous fd associated with object is available */ CACHEFILES_ONDEMAND_OBJSTATE_reopening, /* Object that was closed and is being reopened. */ + CACHEFILES_ONDEMAND_OBJSTATE_dropping, /* Object is being dropped. */ };
struct cachefiles_ondemand_info { @@ -298,6 +299,7 @@ cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \ CACHEFILES_OBJECT_STATE_FUNCS(open); CACHEFILES_OBJECT_STATE_FUNCS(close); CACHEFILES_OBJECT_STATE_FUNCS(reopening); +CACHEFILES_OBJECT_STATE_FUNCS(dropping);
static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req) { diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 544eba85b776..ceeff8363502 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -460,7 +460,8 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, */ xa_lock(&cache->reqs);
- if (test_bit(CACHEFILES_DEAD, &cache->flags)) { + if (test_bit(CACHEFILES_DEAD, &cache->flags) || + cachefiles_ondemand_object_is_dropping(object)) { xa_unlock(&cache->reqs); ret = -EIO; goto out; @@ -493,7 +494,8 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, * If error occurs after creating the anonymous fd, * cachefiles_ondemand_fd_release() will set object to close. */ - if (opcode == CACHEFILES_OP_OPEN) + if (opcode == CACHEFILES_OP_OPEN && + !cachefiles_ondemand_object_is_dropping(object)) cachefiles_ondemand_set_object_close(object); kfree(req); return ret; @@ -591,8 +593,12 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
void cachefiles_ondemand_clean_object(struct cachefiles_object *object) { + if (!object->private) + return; + cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, cachefiles_ondemand_init_close_req, NULL); + cachefiles_ondemand_set_object_dropping(object); }
int cachefiles_ondemand_read(struct cachefiles_object *object,
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Because after an object is dropped, requests for that object are useless, flush them to avoid causing other problems.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index ceeff8363502..e7caafd939c3 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -593,12 +593,37 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
void cachefiles_ondemand_clean_object(struct cachefiles_object *object) { + void **slot; + struct cachefiles_req *req; + struct radix_tree_iter iter; + struct cachefiles_cache *cache; + if (!object->private) return;
cachefiles_ondemand_send_req(object, CACHEFILES_OP_CLOSE, 0, cachefiles_ondemand_init_close_req, NULL); + + if (!object->private->ondemand_id) + return; + + /* Flush all requests for the object that is being dropped. */ + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + xa_lock(&cache->reqs); cachefiles_ondemand_set_object_dropping(object); + radix_tree_for_each_slot(slot, &cache->reqs, &iter, 0) { + req = radix_tree_deref_slot_protected(slot, + &cache->reqs.xa_lock); + if (WARN_ON(!req)) + continue; + if (req->object == object) { + req->error = -EIO; + complete(&req->done); + radix_tree_delete(&cache->reqs, iter.index); + } + } + xa_unlock(&cache->reqs); }
int cachefiles_ondemand_read(struct cachefiles_object *object,
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Disable setting max_active values less than lower bound through the procfs interface. This avoids that when the object_max_active is 1, the previous work in the workqueue waits for the next work to complete, resulting in a dead wait.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/fscache/main.c b/fs/fscache/main.c index a9f059220418..b754486e085f 100644 --- a/fs/fscache/main.c +++ b/fs/fscache/main.c @@ -45,8 +45,11 @@ struct workqueue_struct *fscache_op_wq; DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
/* these values serve as lower bounds, will be adjusted in fscache_init() */ -static unsigned fscache_object_max_active = 4; -static unsigned fscache_op_max_active = 2; +#define FSCACHE_MIN_OBJECT_MAX_ACTIVE 4 +static unsigned int fscache_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE; +static unsigned int fscache_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2; +static unsigned int fscache_min_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE; +static unsigned int fscache_min_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2;
#ifdef CONFIG_SYSCTL static struct ctl_table_header *fscache_sysctl_header; @@ -55,12 +58,16 @@ static int fscache_max_active_sysctl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct workqueue_struct **wqp = table->extra1; + unsigned int *min_val = table->extra2; unsigned int *datap = table->data; int ret;
ret = proc_dointvec(table, write, buffer, lenp, ppos); - if (ret == 0) + if (ret == 0) { + if (cachefiles_ondemand_is_enabled() && *datap < *min_val) + return -EINVAL; workqueue_set_max_active(*wqp, *datap); + } return ret; }
@@ -72,6 +79,7 @@ static struct ctl_table fscache_sysctls[] = { .mode = 0644, .proc_handler = fscache_max_active_sysctl, .extra1 = &fscache_object_wq, + .extra2 = &fscache_min_object_max_active, }, { .procname = "operation_max_active", @@ -80,6 +88,7 @@ static struct ctl_table fscache_sysctls[] = { .mode = 0644, .proc_handler = fscache_max_active_sysctl, .extra1 = &fscache_op_wq, + .extra2 = &fscache_min_op_max_active, }, {} };
From: Hou Tao houtao1@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When queuing ondemand_object_worker() to re-open the object, cachefiles_object is not pinned. The cachefiles_object may be freed when the pending read request is completed intentionally and the related erofs is umounted. If ondemand_object_worker() runs after the object is freed, it will incur use-after-free problem as shown below.
process A processs B process C process D
cachefiles_ondemand_send_req() // send a read req X // wait for its completion
// close ondemand fd cachefiles_ondemand_fd_release() // set object as CLOSE
cachefiles_ondemand_daemon_read() // set object as REOPENING queue_work(fscache_wq, &info->ondemand_work)
// close /dev/cachefiles cachefiles_daemon_release cachefiles_flush_reqs complete(&req->done)
// read req X is completed // umount the erofs fs cachefiles_put_object() // object will be freed cachefiles_ondemand_deinit_obj_info() kmem_cache_free(object) // both info and object are freed ondemand_object_worker()
When dropping an object, it is no longer necessary to reopen the object, so use cancel_work_sync() to cancel or wait for ondemand_object_worker() to complete.
Signed-off-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index e7caafd939c3..6d36d3e950e8 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -624,6 +624,9 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object) } } xa_unlock(&cache->reqs); + + /* Wait for ondemand_object_worker() to finish. */ + cancel_work_sync(&object->private->work); }
int cachefiles_ondemand_read(struct cachefiles_object *object,
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
The following concurrency may cause a read request to fail to be completed and result in a hung:
t1 | t2 --------------------------------------------------------- cachefiles_ondemand_copen req = xa_erase(&cache->reqs, id) // Anon fd is maliciously closed. cachefiles_ondemand_fd_release xa_lock(&cache->reqs) cachefiles_ondemand_set_object_close(object) xa_unlock(&cache->reqs) cachefiles_ondemand_set_object_open // No one will ever close it again. cachefiles_ondemand_daemon_read cachefiles_ondemand_select_req // Get a read req but its fd is already closed. // The daemon can't issue a cread ioctl with an closed fd, then hung.
So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and state, thus we can avoid the above problem in cachefiles_ondemand_copen() by using ondemand_id to determine if fd has been released.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 13 +++++++++++++ 2 files changed, 14 insertions(+)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 1499548883e3..71c651b305bf 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -43,6 +43,7 @@ struct cachefiles_ondemand_info { int ondemand_id; enum cachefiles_object_state state; struct cachefiles_object *object; + spinlock_t lock; };
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 6d36d3e950e8..c15433196a88 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -30,8 +30,10 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct cachefiles_cache, cache);
xa_lock(&cache->reqs); + spin_lock(&info->lock); info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; cachefiles_ondemand_set_object_close(object); + spin_unlock(&info->lock);
/* Only flush CACHEFILES_REQ_NEW marked req to avoid race with daemon_read */ radix_tree_for_each_tagged(slot, &cache->reqs, &iter, 0, CACHEFILES_REQ_NEW) { @@ -131,6 +133,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) { struct cachefiles_req *req; struct fscache_cookie *cookie; + struct cachefiles_ondemand_info *info; char *pid, *psize; unsigned long id; long size; @@ -186,6 +189,14 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) goto out; }
+ info = req->object->private; + spin_lock(&info->lock); + /* The anonymous fd was closed before copen. */ + if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) { + spin_unlock(&info->lock); + req->error = -EBADFD; + goto out; + } cookie = req->object->fscache.cookie; fscache_set_store_limit(&req->object->fscache, size); if (size) @@ -194,6 +205,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
cachefiles_ondemand_set_object_open(req->object); + spin_unlock(&info->lock); wake_up_all(&cache->daemon_pollwq);
out: @@ -652,6 +664,7 @@ int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object) return -ENOMEM;
object->private->object = object; + spin_lock_init(&object->private->lock); INIT_WORK(&object->private->work, ondemand_object_worker); return 0; }
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Now every time the daemon reads an open request, it requests a new anon fd and ondemand_id. With the introduction of "restore", it is possible to read the same open request more than once, and therefore have multiple anon fd's for the same object. To avoid this, allocate a new anon fd only if no anon fd has been allocated (ondemand_id == 0) or if the previously allocated anon fd has been closed (ondemand_id == -1).
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index c15433196a88..2cbe8a1b953f 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -22,10 +22,15 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct cachefiles_cache *cache; void **slot; struct radix_tree_iter iter; - struct cachefiles_ondemand_info *info = object->private; - int object_id = info->ondemand_id; + struct cachefiles_ondemand_info *info; + int object_id; struct cachefiles_req *req;
+ if (!object) + return 0; + + info = object->private; + object_id = info->ondemand_id; cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
@@ -273,16 +278,27 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) goto err_put_fd; }
+ spin_lock(&object->private->lock); + if (object->private->ondemand_id > 0) { + spin_unlock(&object->private->lock); + ret = -EEXIST; + file->private_data = NULL; + goto err_put_file; + } + file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; fd_install(fd, file);
load = (void *)req->msg.data; load->fd = fd; object->private->ondemand_id = object_id; + spin_unlock(&object->private->lock);
cachefiles_get_unbind_pincount(cache); return 0;
+err_put_file: + fput(file); err_put_fd: put_unused_fd(fd); err_free_id: @@ -290,6 +306,12 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) idr_remove(&cache->ondemand_ids, object_id); xa_unlock(&cache->ondemand_ids.idr_rt); err: + spin_lock(&req->object->private->lock); + /* Avoid marking an opened object as closed. */ + if (ret && object->private->ondemand_id <= 0) + cachefiles_ondemand_set_object_close(req->object); + spin_unlock(&req->object->private->lock); + object->fscache.cache->ops->put_object(&object->fscache, cachefiles_obj_put_ondemand_fd); return ret; @@ -381,10 +403,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
if (msg->opcode == CACHEFILES_OP_OPEN) { ret = cachefiles_ondemand_get_fd(req); - if (ret) { - cachefiles_ondemand_set_object_close(req->object); + if (ret) goto out; - } }
msg->msg_id = id;
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Replacing wait_for_completion() with wait_for_completion_killable() in cachefiles_ondemand_send_req() allows us to kill processes that might trigger a hunk_task if the daemon is abnormal.
But now only CACHEFILES_OP_READ is killable, because OP_CLOSE and OP_OPEN is initiated from kworker context and the signal is prohibited in these kworker.
Note that when the req in xas changes, i.e. xas_load(&xas) != req, it means that a process will complete the current request soon, so wait again for the request to be completed.
Suggested-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 2cbe8a1b953f..4fb5cdfde51c 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -517,8 +517,24 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, xa_unlock(&cache->reqs);
wake_up_all(&cache->daemon_pollwq); - wait_for_completion(&req->done); - ret = req->error; +wait: + ret = wait_for_completion_killable(&req->done); + if (!ret) { + ret = req->error; + } else { + xa_lock(&cache->reqs); + if (radix_tree_lookup(&cache->reqs, id) == req) { + radix_tree_delete(&cache->reqs, id); + ret = -EINTR; + } + xa_unlock(&cache->reqs); + + /* Someone will complete it soon. */ + if (ret != -EINTR) { + cpu_relax(); + goto wait; + } + } cachefiles_req_put(req); return ret; out:
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
At present, the object->file has the NULL pointer dereference problem. The root cause is that the allocated anon_fd and object->file lifetime are inconsistent, and the user-space invocation to anon_fd uses object->file. Following is the process that triggers the issue:
process A process B cachefiles_ondemand_fd_write_iter fscache_drop_object cachefiles_drop_object fput(object->file) object->file = NULL vfs_iocb_iter_write(object->file...)
Fix this issue by add an additional reference count to the object->file before write, and decrement after it finished.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 8 +++++--- fs/cachefiles/internal.h | 2 +- fs/cachefiles/namei.c | 2 +- fs/cachefiles/ondemand.c | 14 +++++++++++--- fs/cachefiles/rdwr.c | 9 +++++---- 5 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index a13089d06e16..9c819d0d626f 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -270,6 +270,7 @@ static void cachefiles_drop_object(struct fscache_object *_object) struct cachefiles_cache *cache; const struct cred *saved_cred; struct inode *inode; + struct file *file; blkcnt_t i_blocks = 0;
ASSERT(_object); @@ -315,9 +316,10 @@ static void cachefiles_drop_object(struct fscache_object *_object) }
/* clean up file descriptor for non-index object */ - if (object->file) { - fput(object->file); - object->file = NULL; + file = rcu_dereference_protected(object->file, true); + if (file) { + fput(file); + rcu_assign_pointer(object->file, NULL); }
/* note that the object is now inactive */ diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 71c651b305bf..38c33146f4a8 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -54,7 +54,7 @@ struct cachefiles_object { struct cachefiles_lookup_data *lookup_data; /* cached lookup data */ struct dentry *dentry; /* the file/dir representing this object */ struct dentry *backer; /* backing file */ - struct file *file; /* backing file in on-demand mode */ + struct file __rcu *file; /* backing file in on-demand mode */ loff_t i_size; /* object size */ unsigned long flags; #define CACHEFILES_OBJECT_ACTIVE 0 /* T if marked active */ diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 3c7168d0beec..88afa4a80dfb 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -720,7 +720,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, * to force_page_cache_readahead() */ file->f_mode |= FMODE_RANDOM; - object->file = file; + rcu_assign_pointer(object->file, file); }
object->backer = object->dentry; diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 4fb5cdfde51c..da5afd162f64 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -69,13 +69,20 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, struct cachefiles_object *object = kiocb->ki_filp->private_data; size_t len = iter->count; struct kiocb iocb; + struct file *file; int ret;
- if (!object->file) + rcu_read_lock(); + file = rcu_dereference(object->file); + if (!file || !get_file_rcu(file)) + file = NULL; + rcu_read_unlock(); + + if (!file) return -ENOBUFS;
iocb = (struct kiocb) { - .ki_filp = object->file, + .ki_filp = file, .ki_pos = kiocb->ki_pos, .ki_flags = IOCB_WRITE, .ki_ioprio = get_current_ioprio(), @@ -84,7 +91,8 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, if (!cachefiles_buffered_ondemand) iocb.ki_flags |= IOCB_DIRECT;
- ret = vfs_iocb_iter_write(object->file, &iocb, iter); + ret = vfs_iocb_iter_write(file, &iocb, iter); + fput(file); if (ret != len) return -EIO; return len; diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 0e1992bedf71..453bf7cc88b3 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -798,7 +798,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, static int cachefiles_ondemand_check(struct cachefiles_object *object, loff_t start_pos, size_t len) { - struct file *file = object->file; + struct file *file = rcu_dereference_raw(object->file); size_t remained; loff_t pos; int ret; @@ -892,12 +892,14 @@ int cachefiles_prepare_read(struct fscache_retrieval *op, pgoff_t index) unsigned int n, nr_pages = atomic_read(&op->n_pages); size_t len = nr_pages << PAGE_SHIFT; struct page **pages; + struct file *file; size_t size; int i, ret;
object = container_of(op->op.object, struct cachefiles_object, fscache); if (!object->backer) goto all_enobufs; + file = rcu_dereference_raw(object->file);
/* * 1. Check if there's hole in the requested range, and trigger an @@ -914,8 +916,7 @@ int cachefiles_prepare_read(struct fscache_retrieval *op, pgoff_t index) * to force_page_cache_readahead(). */ page_cache_sync_readahead(d_inode(object->backer)->i_mapping, - &object->file->f_ra, object->file, - start_pos / PAGE_SIZE, nr_pages); + &file->f_ra, file, start_pos / PAGE_SIZE, nr_pages);
size = sizeof(struct cachefiles_kiocb) + nr_pages * sizeof(struct bio_vec); ki = kzalloc(size, GFP_KERNEL); @@ -940,7 +941,7 @@ int cachefiles_prepare_read(struct fscache_retrieval *op, pgoff_t index) } iov_iter_bvec(&ki->iter, READ, ki->bvs, n, n * PAGE_SIZE);
- ki->iocb.ki_filp = object->file; + ki->iocb.ki_filp = file; ki->iocb.ki_pos = start_pos; ki->iocb.ki_ioprio = get_current_ioprio(); ki->op = fscache_get_retrieval(op);
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
This is in line with the default for the mainline, and then this allows us to have a smoother experience at high concurrency.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/fscache/main.c b/fs/fscache/main.c index b754486e085f..e8c6c70092e0 100644 --- a/fs/fscache/main.c +++ b/fs/fscache/main.c @@ -46,8 +46,9 @@ DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
/* these values serve as lower bounds, will be adjusted in fscache_init() */ #define FSCACHE_MIN_OBJECT_MAX_ACTIVE 4 -static unsigned int fscache_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE; -static unsigned int fscache_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2; +#define FSCACHE_DEF_OBJECT_MAX_ACTIVE 256 +static unsigned int fscache_object_max_active = FSCACHE_DEF_OBJECT_MAX_ACTIVE; +static unsigned int fscache_op_max_active = FSCACHE_DEF_OBJECT_MAX_ACTIVE / 2; static unsigned int fscache_min_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE; static unsigned int fscache_min_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2;
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In ondemand mode, when the daemon is processing an open request, if the kernel flags the cache as CACHEFILES_DEAD due to -EIO, cachefiles_daemon_write() will always return -EIO, so the daemon can't pass the copen to the kernel. Then the kernel process that is waiting for the copen triggers the following hung_task because it can't receive the copen.
INFO: task kworker/u8:2:255269 blocked for more than 1212 seconds. Not tainted 5.10.0-00001-g24c450967e57-dirty #21 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:2 state:D stack:0 pid:255269 ppid:2 flags:0x00004080 Workqueue: fscache_object fscache_object_work_func Call Trace: __schedule+0x623/0xed0 schedule+0x84/0x140 schedule_timeout+0x559/0x610 wait_for_common+0x156/0x270 cachefiles_ondemand_send_req+0x2bd/0x390 cachefiles_ondemand_init_object+0x10a/0x120 cachefiles_walk_to_object+0x849/0xee0 cachefiles_lookup_object+0xa1/0x190 fscache_look_up_object+0x24e/0x320 fscache_object_sm_dispatcher+0xe0/0x5d0 fscache_object_work_func+0x30/0x40 process_one_work+0x40e/0x810 worker_thread+0x96/0x700 kthread+0x1f4/0x250
Since the DEAD state is irreversible, it can only be exited by reopening /dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to avoid the above hungtask. We may still be able to read some of the cached data before releasing the fd of /dev/cachefiles.
Note that this relies on the patch that adds reference counting to the req, otherwise it will UAF.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 2 +- fs/cachefiles/internal.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index e26ebbc89806..50433c6024dd 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -131,7 +131,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) return 0; }
-static void cachefiles_flush_reqs(struct cachefiles_cache *cache) +void cachefiles_flush_reqs(struct cachefiles_cache *cache) { void **slot; struct radix_tree_iter iter; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 38c33146f4a8..355fb9b79741 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -184,6 +184,7 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache); * daemon.c */ extern const struct file_operations cachefiles_daemon_fops; +extern void cachefiles_flush_reqs(struct cachefiles_cache *cache); extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache); extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
@@ -384,6 +385,8 @@ do { \ pr_err("I/O Error: " FMT"\n", ##__VA_ARGS__); \ fscache_io_error(&(___cache)->cache); \ set_bit(CACHEFILES_DEAD, &(___cache)->flags); \ + if (cachefiles_in_ondemand_mode(___cache)) \ + cachefiles_flush_reqs(___cache); \ } while (0)
#define cachefiles_io_error_obj(object, FMT, ...) \
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs read on demand scenario, the state of back_page is determined during cachefiles reading process. If the back_page status is Uptodate, this read request is completed. If the status is Error, an error is returned. When the back_page status is neither of the above, the page has probably been truncated, and the cachefiles_read_reissue() is called. If it is not truncated, the read_page function of the back-end file system is called again for reading.
When the back-end file system is ext4, if 100% fail_make_request failure is injected, it will cause the final call __read_end_io() to clear page's Uptodate and Error flags. Judging that page status is neither of the above, cachefiles_read_copier() calls the read_page function of the back-end file system again, resulting in an endless loop. The outer netfs_page waits for back_page to complete the read request, causing a kernel hung.
Fix this issue by adding an additional flag to the monitor as it corresponds to back_page. This flag is placed when the read_page is called for the first time, and when the next time it needs to be called, the flag is set to determine whether the process needs to end.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 2 ++ fs/cachefiles/rdwr.c | 17 ++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 355fb9b79741..349b1e6bb7cd 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -141,6 +141,8 @@ struct cachefiles_one_read { struct page *netfs_page; /* netfs page we're going to fill */ struct fscache_retrieval *op; /* retrieval op covering this */ struct list_head op_link; /* link in op's todo list */ + unsigned long flags; +#define CACHEFILES_MONITOR_ENTER_READ 0 /* restrict calls to read_page */ };
/* diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 453bf7cc88b3..d95435ad5542 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -108,6 +108,13 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, * need a second */ put_page(backpage2);
+ /* + * end the process if the page was not truncated + * and we have already read it before + */ + if (test_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags)) + return -EIO; + INIT_LIST_HEAD(&monitor->op_link); add_page_wait_queue(backpage, &monitor->monitor);
@@ -120,6 +127,8 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, goto unlock_discard;
_debug("reissue read"); + if (data_new_version(object->fscache.cookie)) + set_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto discard; @@ -190,7 +199,11 @@ static void cachefiles_read_copier(struct fscache_operation *_op) error = cachefiles_read_reissue(object, monitor); if (error == -EINPROGRESS) goto next; - goto recheck; + if (!data_new_version(object->fscache.cookie) || !error) + goto recheck; + pr_warn("%s, read error: %d, at page %lu, flags: %lx\n", + __func__, error, monitor->back_page->index, + (unsigned long) monitor->back_page->flags); } else { cachefiles_io_error_obj( object, @@ -284,6 +297,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, newpage = NULL;
read_backing_page: + if (data_new_version(object->fscache.cookie)) + set_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto read_error;
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs read on demand scenario, when erofs is mounted, different levels of objects are generated.
In cachefiles_alloc_object(), fscache_object_init() is called to initialize the object corresponding to the cookie, and the cookie->usage is obtained. On the outside, if cachefiles_ondemand_init_obj_info() fails, object will be released, but the added cookie->usage cannot be subtracted. As a result, the cookie->usage is leaked and cannot be released. It also affects the next mount of cookie with the same id.
Fix this issue by adding fscache_object_destroy() to the error branch, and the same for cachefiles_daemon_add_cache().
Fixes: f29507ce6670 (fscache: Fix reference overput in fscache_attach_object() error handling) Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/bind.c | 1 + fs/cachefiles/interface.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c index d076651b2931..50ac68b0ad81 100644 --- a/fs/cachefiles/bind.c +++ b/fs/cachefiles/bind.c @@ -246,6 +246,7 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache) error_add_cache: dput(cache->graveyard); cache->graveyard = NULL; + fscache_object_destroy(&fsdef->fscache); error_unsupported: mntput(cache->mnt); cache->mnt = NULL; diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 9c819d0d626f..abbf0033459e 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -109,6 +109,7 @@ static struct fscache_object *cachefiles_alloc_object( object->private = NULL; nomem_obj_info: BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)); + fscache_object_destroy(&object->fscache); kmem_cache_free(cachefiles_object_jar, object); fscache_object_destroyed(&cache->cache); nomem_object:
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
If the daemon performs an action that acquires the parent dir inode lock (e.g., statfs/cull/inuse) while processing an open request, it will result in an AA deadlock because the lock was already captured when the open request was issued.
This is avoided by issuing and processing open requests outside of the lock in advance. Although an anonymous fd has been acquired by the daemon, it cannot be used until object->file has been assigned a value, before which -ENOBUFS is returned.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/namei.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 88afa4a80dfb..799b7671e5b1 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -522,6 +522,20 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, key = NULL;
lookup_again: + + /* + * Process the open request before acquiring the dir inode lock to + * avoid AA deadlocks caused by the daemon acquiring the dir inode + * lock while processing the open request. Although the daemon gets + * an anonymous fd, it can't be used until object->file has been + * assigned a value. + */ + if (!key) { + ret = cachefiles_ondemand_init_object(object); + if (ret < 0) + goto error_out2; + } + /* search the current directory for the element name */ _debug("lookup '%s'", name);
@@ -592,10 +606,6 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, if (ret < 0) goto no_space_error;
- ret = cachefiles_ondemand_init_object(object); - if (ret < 0) - goto create_error; - path.dentry = dir; ret = security_path_mknod(&path, next, S_IFREG, 0); if (ret < 0) @@ -640,12 +650,6 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, if (!object->new) { _debug("validate '%pd'", next);
- ret = cachefiles_ondemand_init_object(object); - if (ret < 0) { - object->dentry = NULL; - goto error; - } - ret = cachefiles_check_object_xattr(object, auxdata); if (ret == -ESTALE) { /* delete the object (the deleter drops the directory
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs read on demand scenario, fd will hold a reference to object and object will also hold a reference to cookie. The cookie's reference count is subtracted only when the object reference count is 0. And only when the reference count of the cookie is 0 will the cookie be unhashed, that the next mounted cookie with the same fsid does not report duplicate cookie error.
However, the release of fd depends on the operation of user mode. If the user is abnormal, does not receive the close message or handle the message incorrectly, the reference count of the object will not be released, resulting in the cookie can not be unhashed.
Therefore, this patch adds another unhash cookie mechanism. Do unhash cookies in fscache_drop_object() as well. Because object has been cleared from the linked list of cookies at this time, and object->file has been NULL after calling cachefiles_drop_object(), even if fd is not turned off in user state, it cannot be used to write. The next time a cookie with the same fsid is mounted, there will be no write concurrency issues.
Of course, the unhash_cookie process in the original fscache_cookie_put() can not be deleted because the cookie has been added to the hash table in fscache_hash_cookie(). Note that fscache_alloc_object() may fail to start the object state machine, and the corresponding cookie unhash still needs to be done through its own process.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 12 ++++++++---- fs/fscache/internal.h | 1 + fs/fscache/object.c | 3 +++ 3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 422248fa55ca..c6e6d166cdb1 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -882,7 +882,6 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
/* Clear pointers back to the netfs */ cookie->netfs_data = NULL; - cookie->def = NULL; BUG_ON(!radix_tree_empty(&cookie->stores));
if (cookie->parent) { @@ -902,19 +901,24 @@ EXPORT_SYMBOL(__fscache_relinquish_cookie); /* * Remove a cookie from the hash table. */ -static void fscache_unhash_cookie(struct fscache_cookie *cookie) +void fscache_unhash_cookie(struct fscache_cookie *cookie) { struct hlist_bl_head *h; unsigned int bucket;
+ if (hlist_bl_unhashed(&cookie->hash_link)) + return; + bucket = cookie->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1); h = &fscache_cookie_hash[bucket];
hlist_bl_lock(h); - hlist_bl_del(&cookie->hash_link); - if (cookie->collision) + hlist_bl_del_init(&cookie->hash_link); + if (cookie->collision) { clear_and_wake_up_bit(FSCACHE_COOKIE_ACQUIRE_PENDING, &cookie->collision->flags); + cookie->collision = NULL; + } hlist_bl_unlock(h); }
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index 64aa552b296d..533c4b4586d8 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -55,6 +55,7 @@ extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *, extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *); extern void fscache_cookie_put(struct fscache_cookie *, enum fscache_cookie_trace); +extern void fscache_unhash_cookie(struct fscache_cookie *cookie);
/* * fsdef.c diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 0375f448afc4..375b9b34f005 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -745,6 +745,9 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob cache->ops->drop_object(object); fscache_stat_d(&fscache_n_cop_drop_object);
+ if (volume_new_version(cookie) || data_new_version(cookie)) + fscache_unhash_cookie(cookie); + /* The parent object wants to know when all it dependents have gone */ if (parent) { _debug("release parent OBJ%x {%d}",
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
After installing the anonymous fd, we can now see it in userland and close it. However, at this point we may not have gotten the reference count of the cache, but we will put it in the close fd, so this may cause a cache UAF.
To avoid this, we will make the anonymous fd accessible to the userland by executing fd_install() after copy_to_user() has succeeded, and by this point we will have already grabbed the reference count of the cache.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 54 ++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 21 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index da5afd162f64..cf996eadd1af 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -6,6 +6,11 @@ #include <linux/module.h> #include "internal.h"
+struct anon_file { + struct file *file; + int fd; +}; + static bool cachefiles_buffered_ondemand = true; module_param_named(buffered_ondemand, cachefiles_buffered_ondemand, bool, 0644);
@@ -249,14 +254,14 @@ int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args) return 0; }
-static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req, + struct anon_file *anon_file) { struct cachefiles_object *object = req->object; struct cachefiles_cache *cache; struct cachefiles_open *load; - struct file *file; u32 object_id; - int ret, fd; + int ret;
object->fscache.cache->ops->grab_object(&object->fscache, cachefiles_obj_get_ondemand_fd); @@ -273,16 +278,16 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) goto err; object_id = ret;
- fd = get_unused_fd_flags(O_WRONLY); - if (fd < 0) { - ret = fd; + anon_file->fd = get_unused_fd_flags(O_WRONLY); + if (anon_file->fd < 0) { + ret = anon_file->fd; goto err_free_id; }
- file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops, - object, O_WRONLY); - if (IS_ERR(file)) { - ret = PTR_ERR(file); + anon_file->file = anon_inode_getfile("[cachefiles]", + &cachefiles_ondemand_fd_fops, object, O_WRONLY); + if (IS_ERR(anon_file->file)) { + ret = PTR_ERR(anon_file->file); goto err_put_fd; }
@@ -290,15 +295,14 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) if (object->private->ondemand_id > 0) { spin_unlock(&object->private->lock); ret = -EEXIST; - file->private_data = NULL; + anon_file->file->private_data = NULL; goto err_put_file; }
- file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; - fd_install(fd, file); + anon_file->file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
load = (void *)req->msg.data; - load->fd = fd; + load->fd = anon_file->fd; object->private->ondemand_id = object_id; spin_unlock(&object->private->lock);
@@ -306,9 +310,11 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) return 0;
err_put_file: - fput(file); + fput(anon_file->file); + anon_file->file = NULL; err_put_fd: - put_unused_fd(fd); + put_unused_fd(anon_file->fd); + anon_file->fd = ret; err_free_id: xa_lock(&cache->ondemand_ids.idr_rt); idr_remove(&cache->ondemand_ids, object_id); @@ -377,6 +383,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, size_t n; int ret = 0; struct radix_tree_iter iter; + struct anon_file anon_file;
/* * Cyclically search for a request that has not ever been processed, @@ -410,7 +417,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, id = iter.index;
if (msg->opcode == CACHEFILES_OP_OPEN) { - ret = cachefiles_ondemand_get_fd(req); + ret = cachefiles_ondemand_get_fd(req, &anon_file); if (ret) goto out; } @@ -418,11 +425,16 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, msg->msg_id = id; msg->object_id = req->object->private->ondemand_id;
- if (copy_to_user(_buffer, msg, n) != 0) { + if (copy_to_user(_buffer, msg, n) != 0) ret = -EFAULT; - if (msg->opcode == CACHEFILES_OP_OPEN) - __close_fd(current->files, - ((struct cachefiles_open *)msg->data)->fd); + + if (msg->opcode == CACHEFILES_OP_OPEN) { + if (ret < 0) { + fput(anon_file.file); + put_unused_fd(anon_file.fd); + goto out; + } + fd_install(anon_file.fd, anon_file.file); }
out:
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
The ondemand_id must have been initialised before the open request was copied to the userland. Therefore, if ondemand_id is 0 at the time of copen, this is a malicious injected command, so -EINVAL is returned.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index cf996eadd1af..9f3e7d994aed 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -181,7 +181,8 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
xa_lock(&cache->reqs); req = radix_tree_lookup(&cache->reqs, id); - if (!req || req->msg.opcode != CACHEFILES_OP_OPEN) { + if (!req || req->msg.opcode != CACHEFILES_OP_OPEN || + !req->object->private->ondemand_id) { xa_unlock(&cache->reqs); return -EINVAL; }
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When an erofs file system is mounted with ondemand mode, if the cache root directory bound to cachefiles named rootdir, the resulting directory structure is seen as follows, where different directories corresponded to the objects of different levels:
rootdir ____|____ | | cache(obj0) graveyard | domain_dir(obj1) | hash_dir | back_data.img(obj2)
In the current logic, if cull is executed on the cache directory, it first determines whether the object corresponding to the directory is the active node of cache. If yes, it cannot be executed. If the fscache_object lookup successfully, it is set to avtive node; and if the lookup fails or the object state machine goes into drop, it is set to inactive.
Currently cachefiles_daemon_cull() can execute on any directory or file which have not been called cachefiles_mark_object_active(), and we want to reduce the scope of this function. On the one hand, the user state needs to add relevant constraints; on the other hand, kernel mode also needs modified. This patch adds the restriction of filesystem-level isolation.
In addition, the top-level cache dir can be culled directly because obj0 is not added as active_node. This causes the entire cache directory to be renamed graveyard, even though the underlying objects are inuse state, and cannot be resolved by mounting it again. Fix it by marking it as active in cachefiles_daemon_add_cache().
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/bind.c | 6 ++++++ fs/cachefiles/daemon.c | 6 ++++++ fs/cachefiles/internal.h | 2 ++ fs/cachefiles/namei.c | 4 ++-- 4 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c index 50ac68b0ad81..3a88bef9ed4b 100644 --- a/fs/cachefiles/bind.c +++ b/fs/cachefiles/bind.c @@ -232,6 +232,12 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache) if (ret < 0) goto error_add_cache;
+ /* + * As the cache->daemon_mutex lock hold and the cache is set to + * CACHEFILES_READY, this function must not return an error. + */ + cachefiles_mark_object_active(cache, fsdef); + /* done */ set_bit(CACHEFILES_READY, &cache->flags); dput(root); diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 50433c6024dd..c94b512da4b5 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -662,6 +662,12 @@ static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args) if (!d_can_lookup(path.dentry)) goto notdir;
+ /* limit the scope of cull */ + if (cache->mnt != path.mnt) { + path_put(&path); + return -EOPNOTSUPP; + } + cachefiles_begin_secure(cache, &saved_cred); ret = cachefiles_cull(cache, path.dentry, args); cachefiles_end_secure(cache, saved_cred); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 349b1e6bb7cd..97c4b4c639b4 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -210,6 +210,8 @@ extern char *cachefiles_cook_key(struct cachefiles_object *object, extern void cachefiles_mark_object_inactive(struct cachefiles_cache *cache, struct cachefiles_object *object, blkcnt_t i_blocks); +extern int cachefiles_mark_object_active(struct cachefiles_cache *cache, + struct cachefiles_object *object); extern int cachefiles_delete_object(struct cachefiles_cache *cache, struct cachefiles_object *object); extern int cachefiles_walk_to_object(struct cachefiles_object *parent, diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 799b7671e5b1..281e53d63972 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -133,8 +133,8 @@ static void cachefiles_mark_object_buried(struct cachefiles_cache *cache, /* * record the fact that an object is now active */ -static int cachefiles_mark_object_active(struct cachefiles_cache *cache, - struct cachefiles_object *object) +int cachefiles_mark_object_active(struct cachefiles_cache *cache, + struct cachefiles_object *object) { struct cachefiles_object *xobject; struct rb_node **_p, *_parent = NULL;
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
As follows, moving cachefiles_ondemand_init_object() outside the parent dir lock also moves it before fscache_object_lookup_negative(). This causes FSCACHE_COOKIE_NO_DATA_YET to be cleared before setting it when creating a new data object. Eventually __fscache_read_or_alloc_page() reads the flag and returns -ENODATA causing the mount to fail.
mount | kworker ----------------------------------------------- __fscache_read_or_alloc_page cachefiles_walk_to_object cachefiles_ondemand_init_object cachefiles_ondemand_send_req cachefiles_ondemand_copen clear_bit(FSCACHE_COOKIE_NO_DATA_YET) fscache_object_lookup_negative set_bit(FSCACHE_COOKIE_NO_DATA_YET) if (test_bit(FSCACHE_COOKIE_NO_DATA_YET)) object->cache->ops->allocate_page ret = -ENODATA
Hence for newly created data objects whose size is not 0, clear the FSCACHE_COOKIE_NO_DATA_YET bit.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/namei.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 281e53d63972..93c511d1fff3 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -725,6 +725,11 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent, */ file->f_mode |= FMODE_RANDOM; rcu_assign_pointer(object->file, file); + + /* Now the pages can be read. */ + if (object->new && object->fscache.store_limit_l) + clear_bit_unlock(FSCACHE_COOKIE_NO_DATA_YET, + &object->fscache.cookie->flags); }
object->backer = object->dentry;
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs ondemand loading mode, after erofs is umounted and object->fd is closed, run the inuse() command and it may reveal that the corresponding object is still busy. This is because non-busy depends on the cachefiles_mark_object_inactive() called, but it comes after send close req.
Fix this issue by moving cachefiles_mark_object_inactive() before send close request in erofs ondemand mode, as erofs ondemand mode will not set retired flag, there is no extra impact.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/interface.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index abbf0033459e..fa131db764c4 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -288,6 +288,11 @@ static void cachefiles_drop_object(struct fscache_object *_object) ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000); #endif
+ if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags) && + (volume_new_version(object->fscache.cookie) || + data_new_version(object->fscache.cookie))) + cachefiles_mark_object_inactive(cache, object, 0); + cachefiles_ondemand_clean_object(object);
/* We need to tidy the object up if we did in fact manage to open it.
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When mounting an erofs image locally with a chunk layout as below, inconsistent data may be read:
mkfs.erofs -E noinline_data test-plain.img /tmp/stress/ mkfs.erofs --chunksize=1048576 test-chunk-array.img /tmp/stress/ mount -o loop -t erofs test-plain.img mnt-plain mount -o loop -t erofs test-chunk-array.img mnt-array diff mnt-array/lib/compress.c mnt-plain/lib/compress.c // The output is non-empty, so the two files are inconsistent.
This is because erofs_read_raw_page() doesn't take current_block offset into account when calculating blknr, so it will take the first block of the map as the currently read block, resulting in data inconsistency, correct the blknr calculation logic to fix the problem.
Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/data.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 44fea7bf7bb8..e228a2d52aaf 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -269,6 +269,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, struct inode *const inode = mapping->host; struct super_block *const sb = inode->i_sb; erofs_off_t current_block = (erofs_off_t)page->index; + erofs_off_t pos = blknr_to_addr(current_block); int err;
DBG_BUGON(!nblocks); @@ -289,7 +290,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio,
if (!bio) { struct erofs_map_blocks map = { - .m_la = blknr_to_addr(current_block), + .m_la = pos, }; struct erofs_map_dev mdev; erofs_blk_t blknr; @@ -319,8 +320,8 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* for RAW access mode, m_plen must be equal to m_llen */ DBG_BUGON(map.m_plen != map.m_llen);
- blknr = erofs_blknr(mdev.m_pa); - blkoff = erofs_blkoff(mdev.m_pa); + blknr = erofs_blknr(mdev.m_pa + (pos - map.m_la)); + blkoff = erofs_blkoff(mdev.m_pa + (pos - map.m_la));
/* deal with inline page */ if (map.m_flags & EROFS_MAP_META) {
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Commit 27756c07db81 ("cachefiles: add consistency check for copen/cread") added some validation to prevent malicious commands. But there is another case: if copen is maliciously called in the user mode, it may delete the request corresponding to the random id. And the request may have not been read yet.
Note that when the object is set to reopen, the open request will be done with the still reopen state in above case. As a result, the request corresponding to this object is always skipped in select_req function, so the read request is never completed and blocks other process.
Fix this issue by simply set object to close if its id < 0 in copen. In addition, the "ret != 0" condition is no need in get_fd function as it always meet, remove it.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 9f3e7d994aed..528da20e1119 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -189,6 +189,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) radix_tree_delete(&cache->reqs, id); xa_unlock(&cache->reqs);
+ info = req->object->private; /* fail OPEN request if copen format is invalid */ ret = kstrtol(psize, 0, &size); if (ret) { @@ -208,7 +209,6 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) goto out; }
- info = req->object->private; spin_lock(&info->lock); /* The anonymous fd was closed before copen. */ if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) { @@ -228,6 +228,11 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) wake_up_all(&cache->daemon_pollwq);
out: + spin_lock(&info->lock); + /* Need to set object close to avoid reopen status continuing */ + if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) + cachefiles_ondemand_set_object_close(req->object); + spin_unlock(&info->lock); complete(&req->done); return ret; } @@ -323,7 +328,7 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req, err: spin_lock(&req->object->private->lock); /* Avoid marking an opened object as closed. */ - if (ret && object->private->ondemand_id <= 0) + if (object->private->ondemand_id <= 0) cachefiles_ondemand_set_object_close(req->object); spin_unlock(&req->object->private->lock);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
The following concurrency may cause cachefiles to hang on exit due to n_ops leaks:
child | close devfd | parent ------------------------------------------------------------------- fscache_object_init oob_table = fscache_osm_init_oob fscache_object_available fscache_jumpstart_dependents fscache_enqueue_dependents fscache_parent_ready parent->n_ops++ parent->n_obj_ops++ cachefiles_daemon_release cachefiles_put_unbind_pincount cachefiles_daemon_unbind fscache_withdraw_cache fscache_withdraw_all_objects fscache_raise_event(FSCACHE_OBJECT_EV_KILL) fscache_object_sm_dispatcher // get KILL oob_event_mask fscache_abort_initialisation fscache_kill_object fscache_drop_object fscache_object_dead fscache_kill_object // n_ops != 0 will not drop transit_to(WAIT_FOR_CLEARANCE) // No one will ever touch it wait_event(&cache->object_count == 0) // object_count is never 0 resulting in hung
Therefore the n_ops/n_obj_ops of the parent are held after updating the oob_table to fscache_osm_lookup_oob in fscache_look_up_object(). This always ensures that fscache_done_parent_op() will be called to release the corresponding counts. Since we've added the n_children of the parent, the parent won't be freed when looking up the child object, so moving the n_ops/n_obj_ops won't have an effect.
Fixes: caaef6900bef ("FS-Cache: Fix object state machine to have separate work and wait states") Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/object.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 375b9b34f005..297b90eca985 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -427,17 +427,9 @@ static const struct fscache_state *fscache_initialise_object(struct fscache_obje static const struct fscache_state *fscache_parent_ready(struct fscache_object *object, int event) { - struct fscache_object *parent = object->parent; - _enter("{OBJ%x},%d", object->debug_id, event);
- ASSERT(parent != NULL); - - spin_lock(&parent->lock); - parent->n_ops++; - parent->n_obj_ops++; object->lookup_jif = jiffies; - spin_unlock(&parent->lock);
_leave(""); return transit_to(LOOK_UP_OBJECT); @@ -460,6 +452,12 @@ static const struct fscache_state *fscache_look_up_object(struct fscache_object object->oob_table = fscache_osm_lookup_oob;
ASSERT(parent != NULL); + + spin_lock(&parent->lock); + parent->n_ops++; + parent->n_obj_ops++; + spin_unlock(&parent->lock); + ASSERTCMP(parent->n_ops, >, 0); ASSERTCMP(parent->n_obj_ops, >, 0);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Reusing the msg_id after a maliciously completed reopen request may cause a read request to remain unprocessed and result in a hung, as shown below:
t1 | t2 | t3 ------------------------------------------------- cachefiles_ondemand_select_req cachefiles_ondemand_object_is_close(A) cachefiles_ondemand_set_object_reopening(A) queue_work(fscache_object_wq, &info->work) ondemand_object_worker cachefiles_ondemand_init_object(A) cachefiles_ondemand_send_req(OPEN) // get msg_id 6 wait_for_completion(&req_A->done) cachefiles_ondemand_daemon_read // read msg_id 6 req_A cachefiles_ondemand_get_fd copy_to_user // Malicious completion msg_id 6 copen 6,-1 // reopen fails, want daemon to close fd, // then set object to close, retrigger reopen cachefiles_ondemand_init_object(B) cachefiles_ondemand_send_req(OPEN) // new open req_B reuse msg_id 6 // daemon successfully copen msg_id 6, so it won't close the fd. // object is always reopening, so read requests are not processed // resulting in a hung
Therefore allocate the msg_id cyclically to avoid reusing the msg_id for a very short duration of time causing the above problem.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 528da20e1119..76c936fc9a68 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -535,9 +535,11 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, goto out; }
- while (radix_tree_insert(&cache->reqs, - id = atomic64_read(&global_index), req)) - atomic64_inc(&global_index); + do { + id = atomic64_inc_return(&global_index); + if (unlikely(id == UINT_MAX)) + atomic64_set(&global_index, 0); + } while (radix_tree_insert(&cache->reqs, id, req));
radix_tree_tag_set(&cache->reqs, id, CACHEFILES_REQ_NEW); xa_unlock(&cache->reqs);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
A reference count for dep_link is put twice when there is a contention as follows, which causes an assertion failure in cachefiles_put_object().
t1 | t2 --------------------------------- fscache_dequeue_object if (!list_empty(&object->dep_link)) fscache_enqueue_dependents spin_lock(&object->lock); list_del_init(&dep->dep_link); fscache_put_object(dep, fscache_obj_put_enq_dep); spin_unlock(&object->lock); spin_lock(&object->parent->lock); list_del_init(&object->dep_link); fscache_put_object(object, fscache_obj_put_dequeue); spin_unlock(&object->parent->lock);
fscache_put_object cachefiles_put_object ASSERTCMP(u, !=, -1) // Assertion Failure!
Avoid this problem by again checking whether object->dep_link is empty under lock protection.
Fixes: bfba3a3ac037 ("[Huawei] fscache: fix reference count leakage during abort init") Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/object.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 297b90eca985..11170f079d33 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -902,13 +902,16 @@ static void fscache_dequeue_object(struct fscache_object *object) { _enter("{OBJ%x}", object->debug_id);
+ if (list_empty(&object->dep_link)) + goto out; + + spin_lock(&object->parent->lock); if (!list_empty(&object->dep_link)) { - spin_lock(&object->parent->lock); list_del_init(&object->dep_link); fscache_put_object(object, fscache_obj_put_dequeue); - spin_unlock(&object->parent->lock); } - + spin_unlock(&object->parent->lock); +out: _leave(""); }
From: Jingbo Xu jefflexu@linux.alibaba.com
mainline inclusion from mainline-v6.3-rc1 commit bdfa90142eb1f1272d2efc00dda6c0f35814e36a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently metadata is always on bootstrap, and thus device mapping is not needed so far. Remove the redundant device mapping in the meta routine.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Chao Yu chao@kernel.org Link: https://lore.kernel.org/r/20230209063913.46341-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com
Conflicts: fs/erofs/fscache.c [ Because it hasn't been switched to folio yet. ] Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 17640adf574d..23b55a67feb5 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -57,16 +57,9 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page) { int ret; struct super_block *sb = page->mapping->host->i_sb; - struct erofs_map_dev mdev = { - .m_deviceid = 0, - .m_pa = page_offset(page), - }; - - ret = erofs_map_dev(sb, &mdev); - if (ret) - goto out; + struct erofs_fscache *ctx = page->mapping->host->i_private;
- ret = fscache_read_or_alloc_page(mdev.m_fscache->cookie, page, + ret = fscache_read_or_alloc_page(ctx->cookie, page, erofs_readpage_from_fscache_complete, NULL, GFP_KERNEL); @@ -423,6 +416,7 @@ struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, inode->i_size = OFFSET_MAX; inode->i_mapping->a_ops = &erofs_fscache_meta_aops; mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); + inode->i_private = ctx;
ctx->inode = inode; }
From: Jingbo Xu jefflexu@linux.alibaba.com
mainline inclusion from mainline-v6.3-rc1 commit 61fef98945d0b2fea522ef958f57a783e2a072a9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Currently there're two anonymous inodes (inode and anon_inode in struct erofs_fscache) for each blob. The former was introduced as the address_space of page cache for bootstrap.
The latter was initially introduced as both the address_space of page cache and also a sentinel in the shared domain. Since now the management of cookies in share domain has been decoupled with the anonymous inode, there's no need to maintain an extra anonymous inode. Let's unify these two anonymous inodes.
Besides, in non-share-domain mode only bootstrap will allocate anonymous inode. To simplify the implementation, always allocate anonymous inode for both bootstrap and data blobs. Similarly release anonymous inodes for data blobs when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning.
Also remove the redundant set_nlink() when initializing the anonymous inode, since i_nlink has already been initialized to 1 when the inode gets allocated.
Signed-off-by: Jingbo Xu jefflexu@linux.alibaba.com Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Link: https://lore.kernel.org/r/20230209063913.46341-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com
Conflicts: fs/erofs/fscache.c fs/erofs/internal.h fs/erofs/super.c [ Because it hasn't switched to folio yet and doesn't have the volume introduced by the cachefile rewrite patch. ] Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/erofs/fscache.c | 84 ++++++++++++++++++--------------------------- fs/erofs/internal.h | 7 ++-- fs/erofs/super.c | 2 ++ 3 files changed, 38 insertions(+), 55 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 23b55a67feb5..cfa000c6175b 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -375,13 +375,13 @@ static int erofs_fscache_register_domain(struct super_block *sb) return err; }
-static -struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, - char *name, - unsigned int flags) +static struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, + char *name, unsigned int flags) { struct erofs_fscache *ctx; struct fscache_cookie *cookie; + struct super_block *isb; + struct inode *inode; int ret;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); @@ -401,31 +401,31 @@ struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, }
//fscache_use_cookie(cookie, false); - ctx->cookie = cookie;
- if (flags & EROFS_REG_COOKIE_NEED_INODE) { - struct inode *const inode = new_inode(sb); - - if (!inode) { - erofs_err(sb, "failed to get anon inode for %s", name); - ret = -ENOMEM; - goto err_cookie; - } - - set_nlink(inode, 1); - inode->i_size = OFFSET_MAX; - inode->i_mapping->a_ops = &erofs_fscache_meta_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); - inode->i_private = ctx; - - ctx->inode = inode; + /* + * Allocate anonymous inode in global pseudo mount for shareable blobs, + * so that they are accessible among erofs fs instances. + */ + isb = flags & EROFS_REG_COOKIE_SHARE ? erofs_pseudo_mnt->mnt_sb : sb; + inode = new_inode(isb); + if (!inode) { + erofs_err(sb, "failed to get anon inode for %s", name); + ret = -ENOMEM; + goto err_cookie; }
+ inode->i_size = OFFSET_MAX; + inode->i_mapping->a_ops = &erofs_fscache_meta_aops; + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); + inode->i_private = ctx; + + ctx->cookie = cookie; + ctx->inode = inode; return ctx;
err_cookie: -// fscache_unuse_cookie(ctx->cookie, NULL, NULL); - fscache_relinquish_cookie(ctx->cookie, NULL, false); +// fscache_unuse_cookie(cookie, NULL, NULL); + fscache_relinquish_cookie(cookie, NULL, false); err: kfree(ctx); return ERR_PTR(ret); @@ -436,18 +436,13 @@ static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx) //fscache_unuse_cookie(ctx->cookie, NULL, NULL); fscache_relinquish_cookie(ctx->cookie, NULL, false); iput(ctx->inode); - iput(ctx->anon_inode); kfree(ctx->name); kfree(ctx); }
-static -struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb, - char *name, - unsigned int flags) +static struct erofs_fscache *erofs_domain_init_cookie(struct super_block *sb, + char *name, unsigned int flags) { - int err; - struct inode *inode; struct erofs_fscache *ctx; struct erofs_domain *domain = EROFS_SB(sb)->domain;
@@ -457,35 +452,23 @@ struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
ctx->name = kstrdup(name, GFP_KERNEL); if (!ctx->name) { - err = -ENOMEM; - goto out; - } - - inode = new_inode(erofs_pseudo_mnt->mnt_sb); - if (!inode) { - err = -ENOMEM; - goto out; + erofs_fscache_relinquish_cookie(ctx); + return ERR_PTR(-ENOMEM); }
+ refcount_inc(&domain->ref); ctx->domain = domain; - ctx->anon_inode = inode; list_add(&ctx->node, &erofs_domain_cookies_list); - inode->i_private = ctx; - refcount_inc(&domain->ref); return ctx; -out: - erofs_fscache_relinquish_cookie(ctx); - return ERR_PTR(err); }
-static -struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, - char *name, - unsigned int flags) +static struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, + char *name, unsigned int flags) { struct erofs_fscache *ctx; struct erofs_domain *domain = EROFS_SB(sb)->domain;
+ flags |= EROFS_REG_COOKIE_SHARE; mutex_lock(&erofs_domain_cookies_lock); list_for_each_entry(ctx, &erofs_domain_cookies_list, node) { if (ctx->domain != domain || strcmp(ctx->name, name)) @@ -500,7 +483,7 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, mutex_unlock(&erofs_domain_cookies_lock); return ctx; } - ctx = erofs_fscache_domain_init_cookie(sb, name, flags); + ctx = erofs_domain_init_cookie(sb, name, flags); mutex_unlock(&erofs_domain_cookies_lock); return ctx; } @@ -539,7 +522,7 @@ int erofs_fscache_register_fs(struct super_block *sb) int ret; struct erofs_sb_info *sbi = EROFS_SB(sb); struct erofs_fscache *fscache; - unsigned int flags; + unsigned int flags = 0;
if (sbi->domain_id) ret = erofs_fscache_register_domain(sb); @@ -558,7 +541,6 @@ int erofs_fscache_register_fs(struct super_block *sb) * * Acquired domain/volume will be relinquished in kill_sb() on error. */ - flags = EROFS_REG_COOKIE_NEED_INODE; if (sbi->domain_id) flags |= EROFS_REG_COOKIE_NEED_NOEXIST; fscache = erofs_fscache_register_cookie(sb, sbi->fsid, flags); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index f961256bd469..283ae94abbf4 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -91,8 +91,7 @@ struct erofs_domain {
struct erofs_fscache { struct fscache_cookie *cookie; - struct inode *inode; - struct inode *anon_inode; + struct inode *inode; /* anonymous inode for the blob */
/* used for share domain mode */ struct erofs_domain *domain; @@ -496,8 +495,8 @@ static inline void z_erofs_exit_zip_subsystem(void) {} #endif /* !CONFIG_EROFS_FS_ZIP */
/* flags for erofs_fscache_register_cookie() */ -#define EROFS_REG_COOKIE_NEED_INODE 1 -#define EROFS_REG_COOKIE_NEED_NOEXIST 2 +#define EROFS_REG_COOKIE_SHARE 0x0001 +#define EROFS_REG_COOKIE_NEED_NOEXIST 0x0002
/* fscache.c */ #ifdef CONFIG_EROFS_FS_ONDEMAND diff --git a/fs/erofs/super.c b/fs/erofs/super.c index eb9f7b71ec14..8ed69d0fcce0 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -717,6 +717,8 @@ static void erofs_put_super(struct super_block *sb) iput(sbi->managed_cache); sbi->managed_cache = NULL; #endif + erofs_free_dev_context(sbi->devs); + sbi->devs = NULL; erofs_fscache_unregister_cookie(sbi->s_fscache); sbi->s_fscache = NULL; }
From: Mikulas Patocka mpatocka@redhat.com
mainline inclusion from mainline-v6.0-rc3 commit 8238b4579866b7c1bb99883cfe102a43db5506ff category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
There are several places in the kernel where wait_on_bit is not followed by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).
On architectures with weak memory ordering, it may happen that memory accesses that follow wait_on_bit are reordered before wait_on_bit and they may return invalid data.
Fix this class of bugs by introducing a new function "test_bit_acquire" that works like test_bit, but has acquire memory ordering semantics.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Acked-by: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Conflicts: arch/x86/include/asm/bitops.h include/asm-generic/bitops/non-atomic.h include/asm-generic/bitops/instrumented-non-atomic.h include/asm-generic/bitops/non-instrumented-non-atomic.h include/linux/bitops.h include/linux/buffer_head.h include/asm-generic/bitops/generic-non-atomic.h Documentation/atomic_bitops.txt [Due to the large number of patch modifications and conflicts, patch adaptation is performed in the original contexts] Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- Documentation/atomic_bitops.txt | 10 ++++----- arch/x86/include/asm/bitops.h | 21 +++++++++++++++++++ .../bitops/instrumented-non-atomic.h | 12 +++++++++++ include/asm-generic/bitops/non-atomic.h | 14 +++++++++++++ include/linux/wait_bit.h | 8 +++---- kernel/sched/wait_bit.c | 2 +- 6 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/Documentation/atomic_bitops.txt b/Documentation/atomic_bitops.txt index 093cdaefdb37..edea4656c5c0 100644 --- a/Documentation/atomic_bitops.txt +++ b/Documentation/atomic_bitops.txt @@ -58,13 +58,11 @@ Like with atomic_t, the rule of thumb is:
- RMW operations that have a return value are fully ordered.
- - RMW operations that are conditional are unordered on FAILURE, - otherwise the above rules apply. In the case of test_and_{}_bit() operations, - if the bit in memory is unchanged by the operation then it is deemed to have - failed. + - RMW operations that are conditional are fully ordered.
-Except for a successful test_and_set_bit_lock() which has ACQUIRE semantics and -clear_bit_unlock() which has RELEASE semantics. +Except for a successful test_and_set_bit_lock() which has ACQUIRE semantics, +clear_bit_unlock() which has RELEASE semantics and test_bit_acquire which has +ACQUIRE semantics.
Since a platform only has a single means of achieving atomic operations the same barriers as for atomic_t are used, see atomic_t.txt. diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 0367efdc5b7a..c14e861f9956 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -207,6 +207,20 @@ static __always_inline bool constant_test_bit(long nr, const volatile unsigned l (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; }
+static __always_inline bool constant_test_bit_acquire(long nr, const volatile unsigned long *addr) +{ + bool oldbit; + + asm volatile("testb %2,%1" + CC_SET(nz) + : CC_OUT(nz) (oldbit) + : "m" (((unsigned char *)addr)[nr >> 3]), + "i" (1 << (nr & 7)) + : "memory"); + + return oldbit; +} + static __always_inline bool variable_test_bit(long nr, volatile const unsigned long *addr) { bool oldbit; @@ -224,6 +238,13 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l ? constant_test_bit((nr), (addr)) \ : variable_test_bit((nr), (addr)))
+static __always_inline bool +arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) +{ + return __builtin_constant_p(nr) ? constant_test_bit_acquire(nr, addr) : + variable_test_bit(nr, addr); +} + /** * __ffs - find first set bit in word * @word: The word to search diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h index 37363d570b9b..da7f0d0a707c 100644 --- a/include/asm-generic/bitops/instrumented-non-atomic.h +++ b/include/asm-generic/bitops/instrumented-non-atomic.h @@ -135,4 +135,16 @@ static inline bool test_bit(long nr, const volatile unsigned long *addr) return arch_test_bit(nr, addr); }
+/** + * test_bit_acquire - Determine, with acquire semantics, whether a bit is set + * @nr: bit number to test + * @addr: Address to start counting from + */ +static __always_inline bool +test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) +{ + instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long)); + return arch_test_bit_acquire(nr, addr); +} + #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */ diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h index 7e10c4b50c5d..46437282d6d0 100644 --- a/include/asm-generic/bitops/non-atomic.h +++ b/include/asm-generic/bitops/non-atomic.h @@ -3,6 +3,7 @@ #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#include <asm/types.h> +#include <asm/barrier.h>
/** * __set_bit - Set a bit in memory @@ -106,4 +107,17 @@ static inline int test_bit(int nr, const volatile unsigned long *addr) return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); }
+/** + * arch_test_bit_acquire - Determine, with acquire semantics, whether a bit is set + * @nr: bit number to test + * @addr: Address to start counting from + */ +static __always_inline bool +arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) +{ + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); + return 1UL & (smp_load_acquire(p) >> (nr & (BITS_PER_LONG-1))); +} +#define test_bit_acquire arch_test_bit_acquire + #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */ diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h index 7dec36aecbd9..7725b7579b78 100644 --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -71,7 +71,7 @@ static inline int wait_on_bit(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait, @@ -96,7 +96,7 @@ static inline int wait_on_bit_io(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait_io, @@ -123,7 +123,7 @@ wait_on_bit_timeout(unsigned long *word, int bit, unsigned mode, unsigned long timeout) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit_timeout(word, bit, bit_wait_timeout, @@ -151,7 +151,7 @@ wait_on_bit_action(unsigned long *word, int bit, wait_bit_action_f *action, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, action, mode); } diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c index 02ce292b9bc0..99be72732ee0 100644 --- a/kernel/sched/wait_bit.c +++ b/kernel/sched/wait_bit.c @@ -47,7 +47,7 @@ __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_ prepare_to_wait(wq_head, &wbq_entry->wq_entry, mode); if (test_bit(wbq_entry->key.bit_nr, wbq_entry->key.flags)) ret = (*action)(&wbq_entry->key, mode); - } while (test_bit(wbq_entry->key.bit_nr, wbq_entry->key.flags) && !ret); + } while (test_bit_acquire(wbq_entry->key.bit_nr, wbq_entry->key.flags) && !ret);
finish_wait(wq_head, &wbq_entry->wq_entry);
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Architecture s390 did not define arch_test_bit_acquire() implementation, thus the compilation will fail.
This patch has carried the function implementation to fix this problem. The mainline patch to fix this problem is d6ffe6067a54 ("provide arch_test_bit_acquire for architectures that define test_bit"), but due to a large adaptation conflicts, we came up with a patch of own.
Fixes: 8238b4579866 ("wait_on_bit: add an acquire memory barrier") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- arch/s390/include/asm/bitops.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h index 431e208a5ea4..caf8099bd955 100644 --- a/arch/s390/include/asm/bitops.h +++ b/arch/s390/include/asm/bitops.h @@ -241,6 +241,13 @@ static inline void arch___clear_bit_unlock(unsigned long nr, arch___clear_bit(nr, ptr); }
+static __always_inline bool +arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) +{ + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); + return 1UL & (smp_load_acquire(p) >> (nr & (BITS_PER_LONG-1))); +} + #include <asm-generic/bitops/instrumented-atomic.h> #include <asm-generic/bitops/instrumented-non-atomic.h> #include <asm-generic/bitops/instrumented-lock.h>
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
For the previously combined 8238b4579866 ("wait_on_bit: add an acquire memory barrier") mainline patch, in order to avoid additional effects, revert the test_bit_acquire() in the wait_on_bit_xx function to the original test_bit().
In addition, define wait_on_bit_acquire() to narrow down the scope, which contains test_bit_acquire() that provide memory barrier.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- include/linux/wait_bit.h | 33 +++++++++++++++++++++++++++---- kernel/sched/wait_bit.c | 42 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h index 7725b7579b78..d31d256ce885 100644 --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -28,7 +28,9 @@ int __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry * int __wait_on_bit_lock(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, wait_bit_action_f *action, unsigned int mode); void wake_up_bit(void *word, int bit); int out_of_line_wait_on_bit(void *word, int, wait_bit_action_f *action, unsigned int mode); +int out_of_line_wait_on_bit_acquire(void *word, int, wait_bit_action_f *action, unsigned int mode); int out_of_line_wait_on_bit_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout); +int out_of_line_wait_on_bit_timeout_acquire(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout); int out_of_line_wait_on_bit_lock(void *word, int, wait_bit_action_f *action, unsigned int mode); struct wait_queue_head *bit_waitqueue(void *word, int bit); extern void __init wait_bit_init(void); @@ -71,7 +73,7 @@ static inline int wait_on_bit(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit_acquire(bit, word)) + if (!test_bit(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait, @@ -96,7 +98,7 @@ static inline int wait_on_bit_io(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit_acquire(bit, word)) + if (!test_bit(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait_io, @@ -123,7 +125,7 @@ wait_on_bit_timeout(unsigned long *word, int bit, unsigned mode, unsigned long timeout) { might_sleep(); - if (!test_bit_acquire(bit, word)) + if (!test_bit(bit, word)) return 0; return out_of_line_wait_on_bit_timeout(word, bit, bit_wait_timeout, @@ -151,7 +153,7 @@ wait_on_bit_action(unsigned long *word, int bit, wait_bit_action_f *action, unsigned mode) { might_sleep(); - if (!test_bit_acquire(bit, word)) + if (!test_bit(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, action, mode); } @@ -235,6 +237,29 @@ wait_on_bit_lock_action(unsigned long *word, int bit, wait_bit_action_f *action, return out_of_line_wait_on_bit_lock(word, bit, action, mode); }
+static inline int +wait_on_bit_acquire(unsigned long *word, int bit, unsigned mode) +{ + might_sleep(); + if (!test_bit_acquire(bit, word)) + return 0; + return out_of_line_wait_on_bit_acquire(word, bit, + bit_wait, + mode); +} + +static inline int +wait_on_bit_timeout_acquire(unsigned long *word, int bit, unsigned mode, + unsigned long timeout) +{ + might_sleep(); + if (!test_bit_acquire(bit, word)) + return 0; + return out_of_line_wait_on_bit_timeout_acquire(word, bit, + bit_wait_timeout, + mode, timeout); +} + extern void init_wait_var_entry(struct wait_bit_queue_entry *wbq_entry, void *var, int flags); extern void wake_up_var(void *var); extern wait_queue_head_t *__var_waitqueue(void *p); diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c index 99be72732ee0..b795085b0b84 100644 --- a/kernel/sched/wait_bit.c +++ b/kernel/sched/wait_bit.c @@ -47,7 +47,7 @@ __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_ prepare_to_wait(wq_head, &wbq_entry->wq_entry, mode); if (test_bit(wbq_entry->key.bit_nr, wbq_entry->key.flags)) ret = (*action)(&wbq_entry->key, mode); - } while (test_bit_acquire(wbq_entry->key.bit_nr, wbq_entry->key.flags) && !ret); + } while (test_bit(wbq_entry->key.bit_nr, wbq_entry->key.flags) && !ret);
finish_wait(wq_head, &wbq_entry->wq_entry);
@@ -55,6 +55,23 @@ __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_ } EXPORT_SYMBOL(__wait_on_bit);
+static int __sched +__wait_on_bit_acquire(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, + wait_bit_action_f *action, unsigned mode) +{ + int ret = 0; + + do { + prepare_to_wait(wq_head, &wbq_entry->wq_entry, mode); + if (test_bit(wbq_entry->key.bit_nr, wbq_entry->key.flags)) + ret = (*action)(&wbq_entry->key, mode); + } while (test_bit_acquire(wbq_entry->key.bit_nr, wbq_entry->key.flags) && !ret); + + finish_wait(wq_head, &wbq_entry->wq_entry); + + return ret; +} + int __sched out_of_line_wait_on_bit(void *word, int bit, wait_bit_action_f *action, unsigned mode) { @@ -65,6 +82,29 @@ int __sched out_of_line_wait_on_bit(void *word, int bit, } EXPORT_SYMBOL(out_of_line_wait_on_bit);
+int __sched out_of_line_wait_on_bit_acquire(void *word, int bit, + wait_bit_action_f *action, unsigned mode) +{ + struct wait_queue_head *wq_head = bit_waitqueue(word, bit); + DEFINE_WAIT_BIT(wq_entry, word, bit); + + return __wait_on_bit_acquire(wq_head, &wq_entry, action, mode); +} +EXPORT_SYMBOL(out_of_line_wait_on_bit_acquire); + +int __sched out_of_line_wait_on_bit_timeout_acquire( + void *word, int bit, wait_bit_action_f *action, + unsigned mode, unsigned long timeout) +{ + struct wait_queue_head *wq_head = bit_waitqueue(word, bit); + DEFINE_WAIT_BIT(wq_entry, word, bit); + + wq_entry.key.timeout = jiffies + timeout; + + return __wait_on_bit_acquire(wq_head, &wq_entry, action, mode); +} +EXPORT_SYMBOL_GPL(out_of_line_wait_on_bit_timeout_acquire); + int __sched out_of_line_wait_on_bit_timeout( void *word, int bit, wait_bit_action_f *action, unsigned mode, unsigned long timeout)
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Both fscache_object_lookup_negative() and fscache_obtained_object() call clear_bit_unlock() and then wake_up_bit(). Although there are memory barriers in clear_bit_unlock() to ensure that the memory order before it is normal, but not to guarantee the order after it. And wake_up_bit() has no memory order guarantee.
This will probably trigger a problem. Execute wake_up_bit() first to wake up the waiting thread, but the cookie flag is not cleared. After that, the wait thread detects that the flag still exists and will continue to sleep to be awakened again. The wake thread then clears the bit, but there is no wake operation, leaving mount waiting to be woken that blocking other process.
Fix this issue by using clear_and_wake_up_bit() to adding a memory barrier between clearing the flag and waking. In addition, we also need to ensure the memory order of the wait side, so use wait_on_bit_acquire().
Fixes: caaef6900bef ("FS-Cache: Fix object state machine to have separate work and wait states") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 4 ++-- fs/fscache/object.c | 6 ++---- fs/fscache/page.c | 4 ++-- 3 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index c6e6d166cdb1..c2b1637eaa03 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -518,8 +518,8 @@ static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie, /* we may be required to wait for lookup to complete at this point */ if (!fscache_defer_lookup) { _debug("non-deferred lookup %p", &cookie->flags); - wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP, - TASK_UNINTERRUPTIBLE); + wait_on_bit_acquire(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP, + TASK_UNINTERRUPTIBLE); _debug("complete"); if (test_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags)) goto unavailable; diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 11170f079d33..f05003bb743c 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -521,8 +521,7 @@ void fscache_object_lookup_negative(struct fscache_object *object) clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
_debug("wake up lookup %p", &cookie->flags); - clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags); - wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP); + clear_and_wake_up_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags); } _leave(""); } @@ -556,8 +555,7 @@ void fscache_obtained_object(struct fscache_object *object) /* Allow write requests to begin stacking up and read requests * to begin shovelling data. */ - clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags); - wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP); + clear_and_wake_up_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags); } else { fscache_stat(&fscache_n_object_created); } diff --git a/fs/fscache/page.c b/fs/fscache/page.c index b08568743370..3ff3799e42ef 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -352,8 +352,8 @@ int fscache_wait_for_deferred_lookup(struct fscache_cookie *cookie) fscache_stat(&fscache_n_retrievals_wait);
jif = jiffies; - if (wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP, - TASK_INTERRUPTIBLE) != 0) { + if (wait_on_bit_acquire(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP, + TASK_INTERRUPTIBLE) != 0) { fscache_stat(&fscache_n_retrievals_intr); _leave(" = -ERESTARTSYS"); return -ERESTARTSYS;
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
This patch modifies some functions, adds barriers in fscache/cachefiles to both wake and wait processes that may be problematic due to out-of-order memory.
Fixes: ef778e7ae67c ("FS-Cache: Provide proper invalidation") Fixes: da9803bc8812 ("FS-Cache: Add interface to check consistency of a cached object") Fixes: 94d30ae90a00 ("FS-Cache: Provide the ability to enable/disable cookies") Fixes: b73608d715eb ("fscache: add a waiting mechanism when duplicate cookies are detected") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/namei.c | 4 +--- fs/fscache/cookie.c | 16 +++++++--------- fs/fscache/page.c | 8 ++++---- 3 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 93c511d1fff3..6eeef666c609 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -264,11 +264,9 @@ void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
write_lock(&cache->active_lock); rb_erase(&object->active_node, &cache->active_nodes); - clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags); + clear_and_wake_up_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags); write_unlock(&cache->active_lock);
- wake_up_bit(&object->flags, CACHEFILES_OBJECT_ACTIVE); - /* This object can now be culled, so we need to let the daemon know * that there is something it can remove if it needs to. */ diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index c2b1637eaa03..ea1beed4df11 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -223,14 +223,14 @@ static int fscache_wait_on_cookie_collision(struct fscache_cookie *candidate) { int ret;
- ret = wait_on_bit_timeout(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, - TASK_INTERRUPTIBLE, 20 * HZ); + ret = wait_on_bit_timeout_acquire(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE, 20 * HZ); if (ret == -EINTR) return ret; if (fscache_is_acquire_pending(candidate)) { pr_notice("Potential cookie collision!"); - return wait_on_bit(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, - TASK_INTERRUPTIBLE); + return wait_on_bit_acquire(&candidate->flags, FSCACHE_COOKIE_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE); } return 0; } @@ -445,8 +445,7 @@ void __fscache_enable_cookie(struct fscache_cookie *cookie, }
out_unlock: - clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags); - wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK); + clear_and_wake_up_bit(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags); } EXPORT_SYMBOL(__fscache_enable_cookie);
@@ -725,7 +724,7 @@ void __fscache_wait_on_invalidate(struct fscache_cookie *cookie) { _enter("%p", cookie);
- wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING, + wait_on_bit_acquire(&cookie->flags, FSCACHE_COOKIE_INVALIDATING, TASK_UNINTERRUPTIBLE);
_leave(""); @@ -842,8 +841,7 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie, }
out_unlock_enable: - clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags); - wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK); + clear_and_wake_up_bit(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags); _leave(""); } EXPORT_SYMBOL(__fscache_disable_cookie); diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 3ff3799e42ef..95490d809b5e 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -383,8 +383,8 @@ int fscache_wait_for_operation_activation(struct fscache_object *object, _debug(">>> WT"); if (stat_op_waits) fscache_stat(stat_op_waits); - if (wait_on_bit(&op->flags, FSCACHE_OP_WAITING, - TASK_INTERRUPTIBLE) != 0) { + if (wait_on_bit_acquire(&op->flags, FSCACHE_OP_WAITING, + TASK_INTERRUPTIBLE) != 0) { trace_fscache_op(object->cookie, op, fscache_op_signal); ret = fscache_cancel_op(op, false); if (ret == 0) @@ -392,8 +392,8 @@ int fscache_wait_for_operation_activation(struct fscache_object *object,
/* it's been removed from the pending queue by another party, * so we should get to run shortly */ - wait_on_bit(&op->flags, FSCACHE_OP_WAITING, - TASK_UNINTERRUPTIBLE); + wait_on_bit_acquire(&op->flags, FSCACHE_OP_WAITING, + TASK_UNINTERRUPTIBLE); } _debug("<<< GO");
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In some functions with wake_up_bit(&cookie->flags, 0), the wake_up depends on removing the cookie from the radix tree, but there is no memory barrier between them.
Fix this issue by adding the memory barrier.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/page.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 95490d809b5e..0b5e477cf62b 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -113,6 +113,8 @@ bool __fscache_maybe_release_page(struct fscache_cookie *cookie, fscache_stat(&fscache_n_store_vmscan_gone); }
+ /* Make sure the delete operation is performed before waking. */ + smp_mb(); wake_up_bit(&cookie->flags, 0); trace_fscache_wake_cookie(cookie); if (xpage) @@ -171,6 +173,8 @@ static void fscache_end_page_write(struct fscache_object *object, trace_fscache_page(cookie, page, fscache_page_write_end_pend); } spin_unlock(&cookie->stores_lock); + /* Make sure the delete operation is performed before waking. */ + smp_mb(); wake_up_bit(&cookie->flags, 0); trace_fscache_wake_cookie(cookie); } else { @@ -988,6 +992,8 @@ void fscache_invalidate_writes(struct fscache_cookie *cookie) put_page(results[i]); }
+ /* Make sure the delete operation is performed before waking. */ + smp_mb(); wake_up_bit(&cookie->flags, 0); trace_fscache_wake_cookie(cookie);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
ABBA deadlocks may be triggered during the following concurrency:
write(2) | page fault | mmap(2) ------------------------------------------------------------------ asm_exc_page_fault exc_page_fault handle_page_fault do_user_addr_fault mmap_read_lock(mm) ---> read lock B vfs_write vm_mmap_pgoff() new_sync_write mmap_write_lock_killable(mm) cachefiles_ondemand_fd_write_iter ---> try write lock B vfs_iocb_iter_write ext4_file_write_iter ext4_buffered_write_iter inode_lock(inode) ----------> write lock A handle_mm_fault __handle_mm_fault handle_pte_fault do_fault do_read_fault __do_fault filemap_fault erofs_fscache_readpage __fscache_read_or_alloc_page cachefiles_read_or_alloc_page bmap ext4_bmap inode_lock_shared(inode) -> try read lock A generic_perform_write | iov_iter_fault_in_readable | __get_user_nocheck_1 | asm_exc_page_fault | exc_page_fault | handle_page_fault ABBA deadlcok do_user_addr_fault | mmap_read_lock(mm) | down_read(&mm->mmap_lock) -----------------> try read lock B
This is due to the fact that inode lock cannot be used anywhere in the page fault path, whereas in ondemand mode erofs call bmap when triggering a page fault, and bmap may try to lock the inode and trigger the above issue.
There's no good way to fix this completely, so bring in the user pages early in cachefiles_ondemand_fd_write_iter() to avoid triggering a page fault in generic_perform_write() to circumvent the problem.
Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/ondemand.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 76c936fc9a68..ed3b49a4fd4e 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -72,10 +72,11 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, struct iov_iter *iter) { struct cachefiles_object *object = kiocb->ki_filp->private_data; - size_t len = iter->count; struct kiocb iocb; struct file *file; - int ret; + ssize_t ret = 0; + ssize_t written = 0; + size_t bytes;
rcu_read_lock(); file = rcu_dereference(object->file); @@ -95,12 +96,29 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
if (!cachefiles_buffered_ondemand) iocb.ki_flags |= IOCB_DIRECT; +retry: + bytes = iov_iter_count(iter); + if (unlikely(!bytes)) + goto out; + + ret = iov_iter_fault_in_readable(iter, bytes); + if (unlikely(ret)) + goto out;
+ pagefault_disable(); ret = vfs_iocb_iter_write(file, &iocb, iter); + pagefault_enable(); + if (ret > 0) { + written += ret; + goto retry; + } else if (ret == -EFAULT) { + goto retry; + } +out: fput(file); - if (ret != len) + if (!ret && iov_iter_count(iter)) return -EIO; - return len; + return ret < 0 ? ret : written; }
static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
On demand loading mode in fscache_hash_cookie() have triggered softlockup. Following is the stack:
fscache_hash_cookie hlist_bl_lock fscache_print_cookie pr_err printk ... console_unlock --continuous print data hlist_bl_unlock --not called
Due to the slow transmission speed of the physical serial port and the large amount of data printing, printk will loop in console_unlock. So the process holds the lock and blocks others.
To avoid similar problems, simply delete the fscache_print_cookie() as it is not necessary to print the internal state of the cookie. And replace pr_err() with pr_err_ratelimited() to limit its print speed.
Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/fscache/cookie.c | 30 +----------------------------- 1 file changed, 1 insertion(+), 29 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index ea1beed4df11..66a4db1eee36 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -27,32 +27,6 @@ static int fscache_alloc_object(struct fscache_cache *cache, static int fscache_attach_object(struct fscache_cookie *cookie, struct fscache_object *object);
-static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix) -{ - struct hlist_node *object; - const u8 *k; - unsigned loop; - - pr_err("%c-cookie c=%p [p=%p fl=%lx nc=%u na=%u]\n", - prefix, cookie, cookie->parent, cookie->flags, - atomic_read(&cookie->n_children), - atomic_read(&cookie->n_active)); - pr_err("%c-cookie d=%p n=%p\n", - prefix, cookie->def, cookie->netfs_data); - - object = READ_ONCE(cookie->backing_objects.first); - if (object) - pr_err("%c-cookie o=%p\n", - prefix, hlist_entry(object, struct fscache_object, cookie_link)); - - pr_err("%c-key=[%u] '", prefix, cookie->key_len); - k = (cookie->key_len <= sizeof(cookie->inline_key)) ? - cookie->inline_key : cookie->key; - for (loop = 0; loop < cookie->key_len; loop++) - pr_cont("%02x", k[loop]); - pr_cont("'\n"); -} - void fscache_free_cookie(struct fscache_cookie *cookie) { if (cookie) { @@ -284,10 +258,8 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate) if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags)) { trace_fscache_cookie(cursor, fscache_cookie_collision, atomic_read(&cursor->usage)); - fscache_print_cookie(cursor, 'O'); - fscache_print_cookie(candidate, 'N'); hlist_bl_unlock(h); - pr_err("Duplicate cookie detected\n"); + pr_err_ratelimited("Duplicate cookie detected\n"); return NULL; }
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In cachefiles_daemon_secctx(), if it is detected that secctx has been written to the cache, the error code returned is -EINVAL, which is inappropriate and does not distinguish the situation well.
Like cachefiles_daemon_dir(), fix this issue by return -EEXIST to the user if it has already been defined once.
Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index c94b512da4b5..3128cbc733ea 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -594,7 +594,7 @@ static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args)
if (cache->secctx) { pr_err("Second security context specified\n"); - return -EINVAL; + return -EEXIST; }
secctx = kstrdup(args, GFP_KERNEL);
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/14289 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/A...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/14289 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/A...