Changes since V1: - Modify the bugzilla link to synchronize with 5.10. - Fix comment syntax errors in patch 9.
Changes since V2: - Merge the introduced problem patch with the subsequent bugfix patches.
Changes since V3: - Fix a missing merge patch.
Baokun Li (1): cachefiles: add support for buffer I/O in ondemand mode
Yang Erkun (1): erofs: remove EXPERIMENTAL feature warning for fscache-based
Zizhi Wo (13): fscache: add a memory barrier for FSCACHE_VOLUME_CREATING cachefiles: modify inappropriate error return value in cachefiles_daemon_secctx fscache: modify the waiting mechanism with duplicate volumes erofs: add erofs switch to better control it erofs: add erofs_ondemand switch cachefiles: Add restrictions to cachefiles_daemon_cull() cachefiles: Introduce "dir_has_put" in cachefiles_volume fscache: Add the synchronous waiting mechanism for the volume unhash in erofs ondemand mode fscache: clean up for fscache_clear_volume_priv cachefiles: Fix NULL pointer dereference in object->file cachefiles: Clean up in cachefiles_commit_tmpfile() cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter() cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
fs/Makefile | 1 + fs/cachefiles/cache.c | 2 + fs/cachefiles/daemon.c | 12 +++++- fs/cachefiles/interface.c | 26 ++++++++++--- fs/cachefiles/internal.h | 5 +++ fs/cachefiles/io.c | 12 ++++++ fs/cachefiles/namei.c | 5 --- fs/cachefiles/ondemand.c | 39 ++++++++++++++----- fs/cachefiles/volume.c | 51 ++++++++++++++++++++++--- fs/erofs/fscache.c | 4 ++ fs/erofs/internal.h | 1 + fs/erofs/super.c | 20 ++++++++-- fs/fs_ctl.c | 43 +++++++++++++++++++++ fs/fscache/cache.c | 1 - fs/fscache/cookie.c | 28 ++++++++++++++ fs/fscache/volume.c | 70 +++++++++++++++++++++++++---------- include/linux/fs.h | 17 +++++++++ include/linux/fscache-cache.h | 28 +++++++++++++- include/linux/fscache.h | 2 +- 19 files changed, 315 insertions(+), 52 deletions(-) create mode 100644 fs/fs_ctl.c
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In fscache_create_volume(), there is a missing memory barrier between the bit-clearing operation and the wake-up operation. This may cause a situation where, after a wake-up, the bit-clearing operation hasn't been detected yet, leading to an indefinite wait. The triggering process is as follows:
cookie1 cookie2 volume_work fscache_perform_lookup fscache_create_volume fscache_perform_lookup fscache_create_volume cachefiles_acquire_volume clear_and_wake_up_bit test_and_set_bit test_and_set_bit goto maybe_wait goto no_wait
In the above process, cookie1 and cookie2 has the same volume. When cookie1 enters the -no_wait- process, it will clear the bit and wake up the waiting process. If a barrier is missing, it may cause cookie2 to remain in the -wait- process indefinitely.
In commit 3288666c7256 ("fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work()"), barriers were added to similar operations in fscache_create_volume_work(), but fscache_create_volume() was missed.
By combining the clear and wake operations into clear_and_wake_up_bit() to fix this issue.
Fixes: bfa22da3ed65 ("fscache: Provide and use cache methods to lookup/create/free a volume") Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/fscache/volume.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c index cb75c07b5281..ced14ac78cc1 100644 --- a/fs/fscache/volume.c +++ b/fs/fscache/volume.c @@ -322,8 +322,7 @@ void fscache_create_volume(struct fscache_volume *volume, bool wait) } return; no_wait: - clear_bit_unlock(FSCACHE_VOLUME_CREATING, &volume->flags); - wake_up_bit(&volume->flags, FSCACHE_VOLUME_CREATING); + clear_and_wake_up_bit(FSCACHE_VOLUME_CREATING, &volume->flags); }
/*
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In cachefiles_daemon_secctx(), if it is detected that secctx has been written to the cache, the error code returned is -EINVAL, which is inappropriate and does not distinguish the situation well.
Like cachefiles_daemon_dir(), fix this issue by return -EEXIST to the user if it has already been defined once.
Fixes: 8667d434b2a9 ("cachefiles: Register a miscdev and parse commands over it") Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 89b11336a836..59e576951c68 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -587,7 +587,7 @@ static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args)
if (cache->secctx) { pr_err("Second security context specified\n"); - return -EINVAL; + return -EEXIST; }
secctx = kstrdup(args, GFP_KERNEL);
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs ondemand scenario, fd holds a reference to object, object holds a reference to cookie, and cookie holds a reference to volume. The cookie->ref is subtracted when the object->ref reaches 0. And only when cookie->ref reaches 0 will volume->ref be decremented by one. It means a dependency chain is formed between fd->object->cookie->volume.
The unhashing of the cookie does not depend on its reference count being zero; it only requires its state to be FSCACHE_COOKIE_STATE_RELINQUISHING.
But for the volume, the situation is different. Volume will be unhashed only when its reference count is zero. That means it depends on the user space closing the fd of the cachefiles_object. If the relinquish process is finished but the user-space fails to properly close the fd, it will cause the kernel to hang in fscache_wait_on_volume_collision(), which is an inappropriate behavior.
Modify the waiting mechanism to be interruptible to fix this problem.
Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/fscache/volume.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c index ced14ac78cc1..89efb8071a75 100644 --- a/fs/fscache/volume.c +++ b/fs/fscache/volume.c @@ -151,18 +151,23 @@ static bool fscache_is_acquire_pending(struct fscache_volume *volume) return test_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &volume->flags); }
-static void fscache_wait_on_volume_collision(struct fscache_volume *candidate, +static int fscache_wait_on_volume_collision(struct fscache_volume *candidate, unsigned int collidee_debug_id) { - wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, - TASK_UNINTERRUPTIBLE, 20 * HZ); + int ret; + + ret = wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE, 20 * HZ); + if (ret == -EINTR) + return ret; if (fscache_is_acquire_pending(candidate)) { pr_notice("Potential volume collision new=%08x old=%08x", candidate->debug_id, collidee_debug_id); fscache_stat(&fscache_n_volumes_collision); - wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, - TASK_UNINTERRUPTIBLE); + return wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, + TASK_INTERRUPTIBLE); } + return 0; }
/* @@ -183,8 +188,10 @@ static bool fscache_hash_volume(struct fscache_volume *candidate) hlist_bl_lock(h); hlist_bl_for_each_entry(cursor, p, h, hash_link) { if (fscache_volume_same(candidate, cursor)) { - if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags)) + if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags)) { + fscache_see_volume(cursor, fscache_volume_collision); goto collision; + } fscache_see_volume(cursor, fscache_volume_get_hash_collision); set_bit(FSCACHE_VOLUME_COLLIDED_WITH, &cursor->flags); set_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags); @@ -196,12 +203,16 @@ static bool fscache_hash_volume(struct fscache_volume *candidate) hlist_bl_add_head(&candidate->hash_link, h); hlist_bl_unlock(h);
- if (fscache_is_acquire_pending(candidate)) - fscache_wait_on_volume_collision(candidate, collidee_debug_id); + if (fscache_is_acquire_pending(candidate) && + fscache_wait_on_volume_collision(candidate, collidee_debug_id)) { + hlist_bl_lock(h); + hlist_bl_del_init(&candidate->hash_link); + pr_err("Wait duplicate volume unhashed interrupted\n"); + goto collision; + } return true;
collision: - fscache_see_volume(cursor, fscache_volume_collision); hlist_bl_unlock(h); return false; }
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In order to make the user state control the characteristics of the erofs, a dynamic switch about erofs is added.
The switch named erofs_enabled is isolated by CONFIG_EROFS_FS. Users can open it by echo 1/on/ON/y/Y to /sys/module/fs_ctl/parameters/erofs_enabled.
Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/Makefile | 1 + fs/erofs/super.c | 3 +++ fs/fs_ctl.c | 36 ++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 4 ++++ 4 files changed, 44 insertions(+) create mode 100644 fs/fs_ctl.c
diff --git a/fs/Makefile b/fs/Makefile index f1b15345e6a1..45e54c34c309 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -16,6 +16,7 @@ obj-y := open.o read_write.o file_table.o super.o \ stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \ fs_types.o fs_context.o fs_parser.o fsopen.o init.o \ kernel_read_file.o mnt_idmapping.o remap_range.o +obj-y += fs_ctl.o
obj-$(CONFIG_BUFFER_HEAD) += buffer.o mpage.o obj-$(CONFIG_PROC_FS) += proc_namespace.o diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 113414e6f35b..43fc432337c5 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -766,6 +766,9 @@ static int erofs_init_fs_context(struct fs_context *fc) { struct erofs_sb_info *sbi;
+ if (!READ_ONCE(erofs_enabled)) + return -EOPNOTSUPP; + sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) return -ENOMEM; diff --git a/fs/fs_ctl.c b/fs/fs_ctl.c new file mode 100644 index 000000000000..14ddf0b5607f --- /dev/null +++ b/fs/fs_ctl.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2024. Huawei Technologies Co., Ltd */ + +#include <linux/moduleparam.h> +#include <linux/kernel.h> +#include <linux/fs.h> + +#if IS_ENABLED(CONFIG_EROFS_FS) +static int param_set_bool_on_only_once(const char *s, const struct kernel_param *kp) +{ + int ret; + bool value, *res = kp->arg; + + if (!s) + s = "1"; + + ret = strtobool(s, &value); + if (ret) + return ret; + + if (!value && *res) + return -EBUSY; + + if (value && !*res) + WRITE_ONCE(*res, true); + + return 0; +} +#endif + +#if IS_ENABLED(CONFIG_EROFS_FS) +bool erofs_enabled = true; +EXPORT_SYMBOL(erofs_enabled); +module_param_call(erofs_enabled, param_set_bool_on_only_once, param_get_bool, + &erofs_enabled, 0644); +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index db0ad52d3b48..8e84757981d3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3582,4 +3582,8 @@ static inline void fs_file_read_update_args_by_trace(struct kiocb *iocb) {} static inline void fs_file_read_do_trace(struct kiocb *iocb) {} #endif
+#if IS_ENABLED(CONFIG_EROFS_FS) +extern bool erofs_enabled; +#endif + #endif /* _LINUX_FS_H */
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In order to make the user state better control the characteristics of the on demand load mode, a dynamic switch about the on-demand mode is added.
The switch named cachefiles_ondemand_enabled is closed by default, and is isolated by CONFIG_CACHEFILES_ONDEMAND. Users can open it by echo 1/on/ON/y/Y to /sys/module/fs_ctl/parameters/cachefiles_ondemand_enabled.
Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/daemon.c | 4 ++++ fs/erofs/internal.h | 1 + fs/erofs/super.c | 16 +++++++++++++++- fs/fs_ctl.c | 9 ++++++++- include/linux/fs.h | 13 +++++++++++++ 5 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 59e576951c68..afec4032878c 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -780,6 +780,10 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND)) { if (!strcmp(args, "ondemand")) { + if (!cachefiles_ondemand_is_enabled()) { + pr_err("ondemand mode is disabled\n"); + return -EINVAL; + } set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags); } else if (*args) { pr_err("Invalid argument to the 'bind' command\n"); diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 787cc9ff9029..a53152693ae2 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -176,6 +176,7 @@ struct erofs_sb_info { struct erofs_domain *domain; char *fsid; char *domain_id; + bool ondemand_enabled; };
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 43fc432337c5..db9f4bcb8c78 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -514,12 +514,20 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; #ifdef CONFIG_EROFS_FS_ONDEMAND case Opt_fsid: + if (!sbi->ondemand_enabled) { + errorfc(fc, "fsid option not supported"); + break; + } kfree(sbi->fsid); sbi->fsid = kstrdup(param->string, GFP_KERNEL); if (!sbi->fsid) return -ENOMEM; break; case Opt_domain_id: + if (!sbi->ondemand_enabled) { + errorfc(fc, "domain_id option not supported"); + break; + } kfree(sbi->domain_id); sbi->domain_id = kstrdup(param->string, GFP_KERNEL); if (!sbi->domain_id) @@ -762,11 +770,16 @@ static const struct fs_context_operations erofs_context_ops = { .free = erofs_fc_free, };
+static inline bool erofs_can_init(void) +{ + return READ_ONCE(erofs_enabled) || cachefiles_ondemand_is_enabled(); +} + static int erofs_init_fs_context(struct fs_context *fc) { struct erofs_sb_info *sbi;
- if (!READ_ONCE(erofs_enabled)) + if (!erofs_can_init()) return -EOPNOTSUPP;
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); @@ -778,6 +791,7 @@ static int erofs_init_fs_context(struct fs_context *fc) kfree(sbi); return -ENOMEM; } + sbi->ondemand_enabled = cachefiles_ondemand_is_enabled(); fc->s_fs_info = sbi;
idr_init(&sbi->devs->tree); diff --git a/fs/fs_ctl.c b/fs/fs_ctl.c index 14ddf0b5607f..37f8a7567b96 100644 --- a/fs/fs_ctl.c +++ b/fs/fs_ctl.c @@ -5,7 +5,7 @@ #include <linux/kernel.h> #include <linux/fs.h>
-#if IS_ENABLED(CONFIG_EROFS_FS) +#if IS_ENABLED(CONFIG_EROFS_FS) || IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) static int param_set_bool_on_only_once(const char *s, const struct kernel_param *kp) { int ret; @@ -34,3 +34,10 @@ EXPORT_SYMBOL(erofs_enabled); module_param_call(erofs_enabled, param_set_bool_on_only_once, param_get_bool, &erofs_enabled, 0644); #endif + +#if IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) +bool cachefiles_ondemand_enabled; +EXPORT_SYMBOL(cachefiles_ondemand_enabled); +module_param_call(cachefiles_ondemand_enabled, param_set_bool_on_only_once, param_get_bool, + &cachefiles_ondemand_enabled, 0644); +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index 8e84757981d3..0efc8bea182d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3586,4 +3586,17 @@ static inline void fs_file_read_do_trace(struct kiocb *iocb) {} extern bool erofs_enabled; #endif
+#if IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) +extern bool cachefiles_ondemand_enabled; +static inline bool cachefiles_ondemand_is_enabled(void) +{ + return READ_ONCE(cachefiles_ondemand_enabled); +} +#else +static inline bool cachefiles_ondemand_is_enabled(void) +{ + return false; +} +#endif + #endif /* _LINUX_FS_H */
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
When an erofs filesystem is mounted with ondemand mode, if the cache root directory bound to cachefiles named rootdir, the resulting directory structure is seen as follows, where different directories corresponded to the objects of different levels:
rootdir ____|____ | | cache graveyard | domain_dir | back_data.img
In the current logic, if cull is executed on the cache directory, it first determines whether the inode corresponding to the directory has the S_KERNEL_FILE flag. In on-demand loading mode, when a cache directory or file is created, its corresponding inode will be set with this flag, indicating it is -inuse-. When the cache directory is put or the corresponding object is closed, the flag will be cleared, indicating it is no longer -inuse- and can be culled.
Currently cachefiles_daemon_cull() can execute on any directory or file which the corresponding inode does not has the S_KERNEL_FILE flag. We want to reduce the scope of this function. On the one hand, the user state needs to add relevant constraints; on the other hand, kernel mode can also modified. This patch adds the restriction of filesystem-level isolation.
Fixes: 8667d434b2a9 ("cachefiles: Register a miscdev and parse commands over it") Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/daemon.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index afec4032878c..81d12106ce7a 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -654,6 +654,12 @@ static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args) if (!d_can_lookup(path.dentry)) goto notdir;
+ /* limit the scope of cull */ + if (cache->mnt != path.mnt) { + path_put(&path); + return -EOPNOTSUPP; + } + cachefiles_begin_secure(cache, &saved_cred); ret = cachefiles_cull(cache, path.dentry, args); cachefiles_end_secure(cache, saved_cred);
From: Yang Erkun yangerkun@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
----------------------------------------
It is a formal feature we will support latter, remove this EXPERIMENTAL warning.
Signed-off-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/erofs/super.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index db9f4bcb8c78..43e3a8322a6c 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -359,9 +359,6 @@ static int erofs_read_superblock(struct super_block *sb)
/* handle multiple devices */ ret = erofs_scan_devices(sb, dsb); - - if (erofs_is_fscache_mode(sb)) - erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!"); out: erofs_put_metabuf(&buf); return ret;
hulk inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Introduce "dir_has_put" in "struct cachefiles_volume" to indicate whether the cache directory has been put. It is used to replace the previous method of determining whether __cachefiles_free_volume() has been called by checking if "vcookie->cache_priv" is NULL.
__cachefiles_free_volume() function primarily performs cleanup operations related to "cachefiles_volume". Therefore, it is more appropriate to set "vcookie->cache_priv" to NULL within cachefiles_free_volume() as it interacts more directly with "fscache_volume".
It also prepares for subsequent patches.
Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/internal.h | 1 + fs/cachefiles/volume.c | 16 +++++++++------- 2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 111ad6ecd4ba..cb9607d643c7 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -40,6 +40,7 @@ struct cachefiles_volume { struct cachefiles_cache *cache; struct list_head cache_link; /* Link in cache->volumes */ struct fscache_volume *vcookie; /* The netfs's representation */ + bool dir_has_put; /* Indicates cache dir has been put */ struct dentry *dentry; /* The volume dentry */ struct dentry *fanout[256]; /* Fanout subdirs */ }; diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c index 781aac4ef274..aa5467335d85 100644 --- a/fs/cachefiles/volume.c +++ b/fs/cachefiles/volume.c @@ -111,7 +111,10 @@ static void __cachefiles_free_volume(struct cachefiles_volume *volume)
_enter("");
- volume->vcookie->cache_priv = NULL; + if (volume->dir_has_put) + return; + + volume->dir_has_put = true;
for (i = 0; i < 256; i++) cachefiles_put_directory(volume->fanout[i]); @@ -123,12 +126,11 @@ void cachefiles_free_volume(struct fscache_volume *vcookie) { struct cachefiles_volume *volume = vcookie->cache_priv;
- if (volume) { - spin_lock(&volume->cache->object_list_lock); - list_del_init(&volume->cache_link); - spin_unlock(&volume->cache->object_list_lock); - __cachefiles_free_volume(volume); - } + spin_lock(&volume->cache->object_list_lock); + list_del_init(&volume->cache_link); + spin_unlock(&volume->cache->object_list_lock); + vcookie->cache_priv = NULL; + __cachefiles_free_volume(volume); }
void cachefiles_withdraw_volume(struct cachefiles_volume *volume)
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
This patch adds the synchronous waiting mechanism for the volume unhash in erofs ondemand mode. There are 2 main reasons:
1) Currently volume's unhash relies on its own reference count to be 0, partly depending on the user state release fd. That is, if the user state fails to turn off fd properly, then the volume cannot be unhashed. Next time when mount the specified volume with the same name, the system will waits for the release of the previous volume. However, the write semaphore of the corresponding superblock is not released. When traversing the sb, hung task will occur because the read semaphore cannot be obtained:
[mount] [sync] vfs_get_super ksys_sync sget_fc iterate_supers alloc_super down_write_nested ----pin sb->s_umount list_add_tail(&s->s_list, &super_blocks) erofs_fc_fill_super super_lock ... down_read ----hungtask fscache_hash_volume ----wait for volume
2) During the umount process, object generates the cache directory entry (cachefiles_commit_tmpfile), but it is not synchronized. After umount is complete, the user may see that the cache directory is not generated. When inuse() is called in user mode, the cache directory cannot be found.
The solution: 1) For the erofs on-demand loading scenario, "n_hash_cookies" has been introduced in "struct fscache_volume", increased whenever there is a child cookie hashed and decreased if the cookie has unhashed. When it returns zero, the volume is awakened with unhash. FSCACHE_CACHE_SYNC_VOLUME_UNHASH flag is introduced in "struct fscache_cache", which is used to indicate whether this feature is enabled. 2) cachefiles_free_volume() need to be called to ensure the next mount successful, otherwise -ENODATA will be returned because cache_priv will not be created in new volume and the object may not be initialized. To prevent use-after-free issue caused by the kfree(volume)/kfree(cache), "ref" and "lock" have been introduced in "struct cachefiles_volume".
There are three benefits to this: 1) The unhash of volume does not depend on whether the fd is handled correctly by the userland. If fd is not closed after umount, it does not matter if it continues to be used, because object->file is already NULL as cachefiles_clean_up_object() is called before fscache_unhash_cookie(). 2) The cache directory can be guaranteed to be generated before the umount completes unless the process is interruped. This is because cachefiles_commit_tmpfile() is called before fscache_unhash_cookie(). 3) Before this patch, it is possible that after umount, calling inuse() on a cache entry may still shows -EBUSY, because the order of umount and cachefiles_put_directory() is indeterminate. Now thanks to volume->lock, calling cull/inuse after umount is finished will not return an error.
Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/cache.c | 2 ++ fs/cachefiles/interface.c | 12 ++++++++- fs/cachefiles/internal.h | 4 +++ fs/cachefiles/volume.c | 51 ++++++++++++++++++++++++++++++----- fs/erofs/fscache.c | 4 +++ fs/fscache/cache.c | 1 - fs/fscache/cookie.c | 28 +++++++++++++++++++ fs/fscache/volume.c | 38 +++++++++++++++++++++----- include/linux/fscache-cache.h | 28 ++++++++++++++++++- include/linux/fscache.h | 2 +- 10 files changed, 154 insertions(+), 16 deletions(-)
diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c index 9fb06dc16520..1bbe4393d289 100644 --- a/fs/cachefiles/cache.c +++ b/fs/cachefiles/cache.c @@ -367,6 +367,7 @@ static void cachefiles_withdraw_volumes(struct cachefiles_cache *cache) continue; } list_del_init(&volume->cache_link); + cachefiles_get_volume(volume); } spin_unlock(&cache->object_list_lock); if (!volume) @@ -374,6 +375,7 @@ static void cachefiles_withdraw_volumes(struct cachefiles_cache *cache)
cachefiles_withdraw_volume(volume); fscache_put_volume(vcookie, fscache_volume_put_withdraw); + cachefiles_put_volume(volume); }
_leave(""); diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 35ba2117a6f6..c1bb52588d65 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -22,7 +22,7 @@ static struct cachefiles_object *cachefiles_alloc_object(struct fscache_cookie *cookie) { struct fscache_volume *vcookie = cookie->volume; - struct cachefiles_volume *volume = vcookie->cache_priv; + struct cachefiles_volume *volume = READ_ONCE(vcookie->cache_priv); struct cachefiles_object *object;
_enter("{%s},%x,", vcookie->key, cookie->debug_id); @@ -38,6 +38,15 @@ struct cachefiles_object *cachefiles_alloc_object(struct fscache_cookie *cookie)
refcount_set(&object->ref, 1);
+ /* + * After enabling sync_volume_unhash, fscache_relinquish_volume() may + * release cachefiles_volume while fscache_volume->ref is non-zero. + * However, cachefiles_object may still access cachefiles_cache through + * object->volume->cache (e.g., in cachefiles_ondemand_fd_release). + * Therefore, we need to pin cachefiles_volume through the object. + */ + cachefiles_get_volume(volume); + spin_lock_init(&object->lock); INIT_LIST_HEAD(&object->cache_link); object->volume = volume; @@ -96,6 +105,7 @@ void cachefiles_put_object(struct cachefiles_object *object, cachefiles_ondemand_deinit_obj_info(object); cache = object->volume->cache->cache; fscache_put_cookie(object->cookie, fscache_cookie_put_object); + cachefiles_put_volume(object->volume); object->cookie = NULL; kmem_cache_free(cachefiles_object_jar, object); fscache_uncount_object(cache); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index cb9607d643c7..7215e96fc2cf 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -37,9 +37,11 @@ enum cachefiles_content { * Cached volume representation. */ struct cachefiles_volume { + refcount_t ref; struct cachefiles_cache *cache; struct list_head cache_link; /* Link in cache->volumes */ struct fscache_volume *vcookie; /* The netfs's representation */ + struct mutex lock; /* Lock for cachefiles_volume */ bool dir_has_put; /* Indicates cache dir has been put */ struct dentry *dentry; /* The volume dentry */ struct dentry *fanout[256]; /* Fanout subdirs */ @@ -406,6 +408,8 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache, /* * volume.c */ +void cachefiles_get_volume(struct cachefiles_volume *volume); +void cachefiles_put_volume(struct cachefiles_volume *volume); void cachefiles_acquire_volume(struct fscache_volume *volume); void cachefiles_free_volume(struct fscache_volume *volume); void cachefiles_withdraw_volume(struct cachefiles_volume *volume); diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c index aa5467335d85..582dd42192b1 100644 --- a/fs/cachefiles/volume.c +++ b/fs/cachefiles/volume.c @@ -10,6 +10,19 @@ #include "internal.h" #include <trace/events/fscache.h>
+void cachefiles_get_volume(struct cachefiles_volume *volume) +{ + refcount_inc(&volume->ref); +} + +void cachefiles_put_volume(struct cachefiles_volume *volume) +{ + if (refcount_dec_and_test(&volume->ref)) { + mutex_destroy(&volume->lock); + kfree(volume); + } +} + /* * Allocate and set up a volume representation. We make sure all the fanout * directories are created and pinned. @@ -33,6 +46,7 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie) volume->vcookie = vcookie; volume->cache = cache; INIT_LIST_HEAD(&volume->cache_link); + mutex_init(&volume->lock);
cachefiles_begin_secure(cache, &saved_cred);
@@ -77,7 +91,17 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
cachefiles_end_secure(cache, saved_cred);
- vcookie->cache_priv = volume; + /* + * The purpose of introducing volume->ref is twofold: + * 1) To allow cachefiles_object to pin cachefiles_volume. + * 2) To handle the concurrency between cachefiles_free_volume() and + * cachefiles_withdraw_volume() introduced by enabling sync_unhash + * volume, preventing the former from releasing cachefiles_volume and + * causing a use-after-free in the latter. + */ + refcount_set(&volume->ref, 1); + /* Prevent writing vcookie->cache_priv before writing volume->ref. */ + smp_store_release(&vcookie->cache_priv, volume); n_accesses = atomic_inc_return(&vcookie->n_accesses); /* Stop wakeups on dec-to-0 */ trace_fscache_access_volume(vcookie->debug_id, 0, refcount_read(&vcookie->ref), @@ -98,6 +122,7 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie) error_name: kfree(name); error_vol: + mutex_destroy(&volume->lock); kfree(volume); cachefiles_end_secure(cache, saved_cred); } @@ -111,26 +136,40 @@ static void __cachefiles_free_volume(struct cachefiles_volume *volume)
_enter("");
- if (volume->dir_has_put) + mutex_lock(&volume->lock); + if (volume->dir_has_put) { + mutex_unlock(&volume->lock); return; + }
volume->dir_has_put = true;
for (i = 0; i < 256; i++) cachefiles_put_directory(volume->fanout[i]); cachefiles_put_directory(volume->dentry); - kfree(volume); + mutex_unlock(&volume->lock); }
void cachefiles_free_volume(struct fscache_volume *vcookie) { struct cachefiles_volume *volume = vcookie->cache_priv;
- spin_lock(&volume->cache->object_list_lock); - list_del_init(&volume->cache_link); - spin_unlock(&volume->cache->object_list_lock); + /* + * Prevents access to the cachefiles_cache that has been freed caused + * by the concurrency between cachefiles_free_volume() and + * cachefiles_daemon_release(), the later may kree(cache). + */ + mutex_lock(&volume->lock); + if (!volume->dir_has_put) { + spin_lock(&volume->cache->object_list_lock); + list_del_init(&volume->cache_link); + spin_unlock(&volume->cache->object_list_lock); + } + mutex_unlock(&volume->lock); + vcookie->cache_priv = NULL; __cachefiles_free_volume(volume); + cachefiles_put_volume(volume); }
void cachefiles_withdraw_volume(struct cachefiles_volume *volume) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index afc37c9029ce..7bcd1d261e7d 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -5,6 +5,7 @@ */ #include <linux/pseudo_fs.h> #include <linux/fscache.h> +#include <linux/fscache-cache.h> #include "internal.h"
static DEFINE_MUTEX(erofs_domain_list_lock); @@ -366,6 +367,9 @@ static int erofs_fscache_register_volume(struct super_block *sb) erofs_err(sb, "failed to register volume for %s", name); ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; volume = NULL; + } else { + /* enable synchronous unhashing for the associated volumes */ + fscache_set_sync_volume_unhash(volume->cache); }
sbi->volume = volume; diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c index 9397ed39b0b4..2ff7ea90f13f 100644 --- a/fs/fscache/cache.c +++ b/fs/fscache/cache.c @@ -213,7 +213,6 @@ void fscache_relinquish_cache(struct fscache_cache *cache) fscache_cache_put_prep_failed : fscache_cache_put_relinquish;
- cache->ops = NULL; cache->cache_priv = NULL; fscache_set_cache_state(cache, FSCACHE_CACHE_IS_NOT_PRESENT); fscache_put_cache(cache, where); diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index d4d4b3a8b106..a42985d5033c 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -319,6 +319,26 @@ static bool fscache_cookie_same(const struct fscache_cookie *a,
static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
+/* + * Increase the volume->n_hash_cookies count. Must be called when a cookie + * associated with the volume is added on the hash link. + */ +static void fscache_increase_volume_hash_cookies(struct fscache_volume *volume) +{ + atomic_inc(&volume->n_hash_cookies); +} + +/* + * Decrease the volume->n_hash_cookies count. Must be called when a cookie + * associated with the volume is unhashed. When the count reaches 0, the volume + * waiting to be unhash needs to be awakened. + */ +static void fscache_decrease_volume_hash_cookies(struct fscache_volume *volume) +{ + if (atomic_dec_and_test(&volume->n_hash_cookies)) + wake_up_var(&volume->n_hash_cookies); +} + /* * Allocate a cookie. */ @@ -425,6 +445,9 @@ static bool fscache_hash_cookie(struct fscache_cookie *candidate) set_bit(FSCACHE_COOKIE_IS_HASHED, &candidate->flags); hlist_bl_unlock(h);
+ if (fscache_test_sync_volume_unhash(candidate->volume->cache)) + fscache_increase_volume_hash_cookies(candidate->volume); + if (wait_for) { fscache_wait_on_collision(candidate, wait_for); fscache_put_cookie(wait_for, fscache_cookie_put_hash_collision); @@ -933,6 +956,7 @@ static void fscache_cookie_drop_from_lru(struct fscache_cookie *cookie) static void fscache_unhash_cookie(struct fscache_cookie *cookie) { struct hlist_bl_head *h; + struct fscache_volume *volume = cookie->volume; unsigned int bucket;
bucket = cookie->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1); @@ -942,6 +966,10 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie) hlist_bl_del(&cookie->hash_link); clear_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags); hlist_bl_unlock(h); + + if (fscache_test_sync_volume_unhash(volume->cache)) + fscache_decrease_volume_hash_cookies(volume); + fscache_stat(&fscache_n_relinquishes_dropped); }
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c index 89efb8071a75..4b371d8fafed 100644 --- a/fs/fscache/volume.c +++ b/fs/fscache/volume.c @@ -389,27 +389,35 @@ static void fscache_unhash_volume(struct fscache_volume *volume) h = &fscache_volume_hash[bucket];
hlist_bl_lock(h); - hlist_bl_del(&volume->hash_link); + hlist_bl_del_init(&volume->hash_link); if (test_bit(FSCACHE_VOLUME_COLLIDED_WITH, &volume->flags)) fscache_wake_pending_volume(volume, h); hlist_bl_unlock(h); }
/* - * Drop a cache's volume attachments. + * Clear the volume->cache_priv. */ -static void fscache_free_volume(struct fscache_volume *volume) +static void fscache_clear_volume_priv(struct fscache_volume *volume) { - struct fscache_cache *cache = volume->cache; - if (volume->cache_priv) { __fscache_begin_volume_access(volume, NULL, fscache_access_relinquish_volume); if (volume->cache_priv) - cache->ops->free_volume(volume); + volume->cache->ops->free_volume(volume); fscache_end_volume_access(volume, NULL, fscache_access_relinquish_volume_end); } +} + +/* + * Drop a cache's volume attachments. + */ +static void fscache_free_volume(struct fscache_volume *volume) +{ + struct fscache_cache *cache = volume->cache; + + fscache_clear_volume_priv(volume);
down_write(&fscache_addremove_sem); list_del_init(&volume->proc_link); @@ -461,6 +469,24 @@ void __fscache_relinquish_volume(struct fscache_volume *volume, memcpy(volume->coherency, coherency_data, volume->coherency_len); }
+ if (fscache_test_sync_volume_unhash(volume->cache)) { + int ret = wait_var_event_killable(&volume->n_hash_cookies, + (atomic_read_acquire(&volume->n_hash_cookies) == 0)); + if (ret == -ERESTARTSYS) { + pr_info("Waiting for unhashing has been interrupted!" \ + "(n_hash_cookies %d)\n", + atomic_read(&volume->n_hash_cookies)); + } else if (!hlist_bl_unhashed(&volume->hash_link)) { + /* + * To ensure that the corresponding cache entries can + * be created on the next mount, thereby completing + * the mount successfully. + */ + fscache_clear_volume_priv(volume); + fscache_unhash_volume(volume); + } + } + fscache_put_volume(volume, fscache_volume_put_relinquish); } EXPORT_SYMBOL(__fscache_relinquish_volume); diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 151043af41a1..642fc1276c63 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -44,8 +44,9 @@ struct fscache_cache { unsigned int debug_id; enum fscache_cache_state state; char *name; + KABI_USE(1, unsigned long flags) +#define FSCACHE_CACHE_SYNC_VOLUME_UNHASH 0 /* Mark to enable volume sync unhash */
- KABI_RESERVE(1) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) @@ -199,6 +200,31 @@ static inline void fscache_wait_for_objects(struct fscache_cache *cache) atomic_read(&cache->object_count) == 0); }
+/** + * fscache_set_sync_volume_unhash - Enable the synchronization unhash mechanism + * @cache: The cache to set + * + * Enable the volume synchronization wait-unhash mechanism. The volume will be + * unhashed in advance, without relying on the reference count reaching zero. + */ +static inline void fscache_set_sync_volume_unhash(struct fscache_cache *cache) +{ + set_bit(FSCACHE_CACHE_SYNC_VOLUME_UNHASH, &cache->flags); +} + +/** + * fscache_test_sync_volume_unhash - Check whether the synchronization + * unhash mechanism is enabled + * @cache: The cache to query + * + * Indicate whether the volume synchronization wait-unhash mechanism is + * currently enabled. + */ +static inline bool fscache_test_sync_volume_unhash(struct fscache_cache *cache) +{ + return test_bit(FSCACHE_CACHE_SYNC_VOLUME_UNHASH, &cache->flags); +} + #ifdef CONFIG_FSCACHE_STATS extern atomic_t fscache_n_read; extern atomic_t fscache_n_write; diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 3afe007834d1..99cb06ee3f65 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -90,7 +90,7 @@ struct fscache_volume { #define FSCACHE_VOLUME_CREATING 4 /* Volume is being created on disk */ u8 coherency_len; /* Length of the coherency data */
- KABI_RESERVE(1) + KABI_USE(1, atomic_t n_hash_cookies) /* Number of hashed data cookies in volume */ KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4)
hulk inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the previous patch, we introduced mutex_lock to prevent concurrency between cachefiles_free_volume() and cachefiles_withdraw_volume().
Now in fscache_clear_volume_priv(), there is no need to increase fscache_volume->n_accesses to prevent concurrency. Remove the related code.
Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/fscache/volume.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c index 4b371d8fafed..2f8e7bd8ab1c 100644 --- a/fs/fscache/volume.c +++ b/fs/fscache/volume.c @@ -400,14 +400,8 @@ static void fscache_unhash_volume(struct fscache_volume *volume) */ static void fscache_clear_volume_priv(struct fscache_volume *volume) { - if (volume->cache_priv) { - __fscache_begin_volume_access(volume, NULL, - fscache_access_relinquish_volume); - if (volume->cache_priv) - volume->cache->ops->free_volume(volume); - fscache_end_volume_access(volume, NULL, - fscache_access_relinquish_volume_end); - } + if (volume->cache_priv) + volume->cache->ops->free_volume(volume); }
/*
From: Baokun Li libaokun1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
Make cachefiles support buffer I/O when reading and writing backend files, which reduces the latency of concurrent reads when there is a local cache. This can be useful in container batch recovery scenarios.
arm64 nvme:
ext4 BIO no cache 1399MiB/s, 8t 11.5GiB/s, with cache 1t 1880MiB/s, 8t 12.7GiB/s.
thread|type | cache miss | local cache | erofs pagecache | ------+-----+------------+-------------+-----------------| 1t | DIO | 243 MiB/s | 898 MiB/s | 1907 MiB/s | 1t | BIO | 243 MiB/s | 1147 MiB/s | 1894MiB/s | +27% ------+-----+------------+-------------+-----------------| 8t | DIO | 1950 MiB/s | 5155 MiB/s | 12.8GiB/s | 8t | BIO | 1951 MiB/s | 11.2GiB/s | 12.6GiB/s | +122%
x86 HDD:
ext4 BIO no cache 494MiB/s, 8t 3949MiB/s, with cache 1t 1066MiB/s, 8t 8669MiB/s. thread|type | cache miss | local cache | erofs pagecache | ------+-----+------------+-------------+-----------------| 1t | DIO | 228 MiB/s | 493 MiB/s | 1133 MiB/s | 1t | BIO | 224 MiB/s | 858 MiB/s | 1133 MiB/s | +74% ------+-----+------------+-------------+-----------------| 8t | DIO | 1885 MiB/s | 3893 MiB/s | 8357 MiB/s | 8t | BIO | 1898 MiB/s | 7948 MiB/s | 8357 MiB/s | +104%
The controls are the same as in the lower versions and also affect only the on-demand mode.
Enable: modprobe cachefiles buffered_ondemand=1 echo 1 > /sys/module/cachefiles/parameters/buffered_ondemand
Disable: modprobe cachefiles buffered_ondemand=0 echo 0 > /sys/module/cachefiles/parameters/buffered_ondemand
Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/cachefiles/io.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 009d23cd435b..e55c7cc6a997 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -12,8 +12,12 @@ #include <linux/falloc.h> #include <linux/sched/mm.h> #include <trace/events/fscache.h> +#include <linux/module.h> #include "internal.h"
+static bool cachefiles_buffered_ondemand = true; +module_param_named(buffered_ondemand, cachefiles_buffered_ondemand, bool, 0644); + struct cachefiles_kiocb { struct kiocb iocb; refcount_t ki_refcnt; @@ -78,6 +82,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres, void *term_func_priv) { struct cachefiles_object *object; + struct cachefiles_cache *cache; struct cachefiles_kiocb *ki; struct file *file; unsigned int old_nofs; @@ -89,6 +94,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
fscache_count_read(); object = cachefiles_cres_object(cres); + cache = object->volume->cache; file = cachefiles_cres_file(cres);
_enter("%pD,%li,%llx,%zx/%llx", @@ -146,6 +152,9 @@ static int cachefiles_read(struct netfs_cache_resources *cres, ki->term_func_priv = term_func_priv; ki->was_async = true;
+ if (cachefiles_in_ondemand_mode(cache) && cachefiles_buffered_ondemand) + ki->iocb.ki_flags &= ~IOCB_DIRECT; + if (ki->term_func) ki->iocb.ki_complete = cachefiles_read_complete;
@@ -315,6 +324,9 @@ int __cachefiles_write(struct cachefiles_object *object, ki->was_async = true; ki->b_writing = (len + (1 << cache->bshift) - 1) >> cache->bshift;
+ if (cachefiles_in_ondemand_mode(cache) && cachefiles_buffered_ondemand) + ki->iocb.ki_flags &= ~IOCB_DIRECT; + if (ki->term_func) ki->iocb.ki_complete = cachefiles_write_complete; atomic_long_add(ki->b_writing, &cache->b_writing);
mainline inclusion from mainline-v6.13-rc1 commit 31ad74b20227ce6b40910ff78b1c604e42975cf1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
At present, the object->file has the NULL pointer dereference problem in ondemand-mode. The root cause is that the allocated fd and object->file lifetime are inconsistent, and the user-space invocation to anon_fd uses object->file. Following is the process that triggers the issue:
[write fd] [umount] cachefiles_ondemand_fd_write_iter fscache_cookie_state_machine cachefiles_withdraw_cookie if (!file) return -ENOBUFS cachefiles_clean_up_object cachefiles_unmark_inode_in_use fput(object->file) object->file = NULL // file NULL pointer dereference! __cachefiles_write(..., file, ...)
Fix this issue by add an additional reference count to the object->file before write/llseek, and decrement after it finished.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Conflicts: fs/cachefiles/ondemand.c [Context conflict for __cachefiles_prepare_write parameters, not related to this patch.] Signed-off-by: Zizhi Wo wozizhi@huawei.com Link: https://lore.kernel.org/r/20241107110649.3980193-5-wozizhi@huawei.com Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org --- fs/cachefiles/interface.c | 14 ++++++++++---- fs/cachefiles/ondemand.c | 31 ++++++++++++++++++++++++------- 2 files changed, 34 insertions(+), 11 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index c1bb52588d65..4d44e74c22fd 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -337,6 +337,8 @@ static void cachefiles_commit_object(struct cachefiles_object *object, static void cachefiles_clean_up_object(struct cachefiles_object *object, struct cachefiles_cache *cache) { + struct file *file; + if (test_bit(FSCACHE_COOKIE_RETIRED, &object->cookie->flags)) { if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) { cachefiles_see_object(object, cachefiles_obj_see_clean_delete); @@ -352,10 +354,14 @@ static void cachefiles_clean_up_object(struct cachefiles_object *object, }
cachefiles_unmark_inode_in_use(object, object->file); - if (object->file) { - fput(object->file); - object->file = NULL; - } + + spin_lock(&object->lock); + file = object->file; + object->file = NULL; + spin_unlock(&object->lock); + + if (file) + fput(file); }
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 2185e2908dba..29220a72be9d 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -61,26 +61,33 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, { struct cachefiles_object *object = kiocb->ki_filp->private_data; struct cachefiles_cache *cache = object->volume->cache; - struct file *file = object->file; + struct file *file; size_t len = iter->count; loff_t pos = kiocb->ki_pos; const struct cred *saved_cred; int ret;
- if (!file) + spin_lock(&object->lock); + file = object->file; + if (!file) { + spin_unlock(&object->lock); return -ENOBUFS; + } + get_file(file); + spin_unlock(&object->lock);
cachefiles_begin_secure(cache, &saved_cred); ret = __cachefiles_prepare_write(object, file, &pos, &len, true); cachefiles_end_secure(cache, saved_cred); if (ret < 0) - return ret; + goto out;
trace_cachefiles_ondemand_fd_write(object, file_inode(file), pos, len); ret = __cachefiles_write(object, file, pos, iter, NULL, NULL); if (!ret) ret = len; - +out: + fput(file); return ret; }
@@ -88,12 +95,22 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, int whence) { struct cachefiles_object *object = filp->private_data; - struct file *file = object->file; + struct file *file; + loff_t ret;
- if (!file) + spin_lock(&object->lock); + file = object->file; + if (!file) { + spin_unlock(&object->lock); return -ENOBUFS; + } + get_file(file); + spin_unlock(&object->lock);
- return vfs_llseek(file, pos, whence); + ret = vfs_llseek(file, pos, whence); + fput(file); + + return ret; }
static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
mainline inclusion from mainline-v6.13-rc1 commit 09ecf8f5505465b5527a39dff4b159af62306eee category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently, cachefiles_commit_tmpfile() will only be called if object->flags is set to CACHEFILES_OBJECT_USING_TMPFILE. Only cachefiles_create_file() and cachefiles_invalidate_cookie() set this flag. Both of these functions replace object->file with the new tmpfile, and both are called by fscache_cookie_state_machine(), so there are no concurrency issues.
So the equation "d_backing_inode(dentry) == file_inode(object->file)" in cachefiles_commit_tmpfile() will never hold true according to the above conditions. This patch removes this part of the redundant code and does not involve any other logical changes.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Link: https://lore.kernel.org/r/20241107110649.3980193-4-wozizhi@huawei.com Acked-by: David Howells dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org --- fs/cachefiles/namei.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 594e41582ae9..6de170514831 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -690,11 +690,6 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache, }
if (!d_is_negative(dentry)) { - if (d_backing_inode(dentry) == file_inode(object->file)) { - success = true; - goto out_dput; - } - ret = cachefiles_unlink(volume->cache, object, fan, dentry, FSCACHE_OBJECT_IS_STALE); if (ret < 0)
mainline inclusion from mainline-v6.13-rc1 commit 10c35abd35aa62c9aac56898ae0c63b4d7d115e5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
cachefiles_ondemand_fd_write_iter() function first aligns "pos" and "len" to block boundaries. When calling __cachefiles_write(), the aligned "pos" is passed in, but "len" is the original unaligned value(iter->count). Additionally, the returned length of the write operation is the modified "len" aligned by block size, which is unreasonable.
The alignment of "pos" and "len" is intended only to check whether the cache has enough space. But the modified len should not be used as the return value of cachefiles_ondemand_fd_write_iter() because the length we passed to __cachefiles_write() is the previous "len". Doing so would result in a mismatch in the data written on-demand. For example, if the length of the user state passed in is not aligned to the block size (the preread scene/DIO writes only need 512 alignment/Fault injection), the length of the write will differ from the actual length of the return.
To solve this issue, since the __cachefiles_prepare_write() modifies the size of "len", we pass "aligned_len" to __cachefiles_prepare_write() to calculate the free blocks and use the original "len" as the return value of cachefiles_ondemand_fd_write_iter().
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Conflicts: fs/cachefiles/ondemand.c [The number of parameters in function __cachefiles_prepare_write is different, therefore, the patch has been adapted.] Signed-off-by: Zizhi Wo wozizhi@huawei.com Link: https://lore.kernel.org/r/20241107110649.3980193-2-wozizhi@huawei.com Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org --- fs/cachefiles/ondemand.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 29220a72be9d..086a056cf7ef 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -62,7 +62,7 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, struct cachefiles_object *object = kiocb->ki_filp->private_data; struct cachefiles_cache *cache = object->volume->cache; struct file *file; - size_t len = iter->count; + size_t len = iter->count, aligned_len = len; loff_t pos = kiocb->ki_pos; const struct cred *saved_cred; int ret; @@ -77,7 +77,7 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb, spin_unlock(&object->lock);
cachefiles_begin_secure(cache, &saved_cred); - ret = __cachefiles_prepare_write(object, file, &pos, &len, true); + ret = __cachefiles_prepare_write(object, file, &pos, &aligned_len, true); cachefiles_end_secure(cache, saved_cred); if (ret < 0) goto out;
mainline inclusion from mainline-v6.13-rc1 commit 56f4856b425a30e1d8b3e41e6cde8bfba90ba5f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
In the erofs on-demand loading scenario, read and write operations are usually delivered through "off" and "len" contained in read req in user mode. Naturally, pwrite is used to specify a specific offset to complete write operations.
However, if the write(not pwrite) syscall is called multiple times in the read-ahead scenario, we need to manually update ki_pos after each write operation to update file->f_pos.
This step is currently missing from the cachefiles_ondemand_fd_write_iter function, added to address this issue.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Conflicts: fs/cachefiles/ondemand.c [Context conflict for object->file fix, not related to this patch.] Signed-off-by: Zizhi Wo wozizhi@huawei.com Link: https://lore.kernel.org/r/20241107110649.3980193-3-wozizhi@huawei.com Acked-by: David Howells dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org --- fs/cachefiles/ondemand.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 086a056cf7ef..6a0921f31a8a 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -84,8 +84,10 @@ static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
trace_cachefiles_ondemand_fd_write(object, file_inode(file), pos, len); ret = __cachefiles_write(object, file, pos, iter, NULL, NULL); - if (!ret) + if (!ret) { ret = len; + kiocb->ki_pos += ret; + } out: fput(file); return ret;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/14206 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/14206 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X...