[PATCH OLK-5.10 0/7] kernfs: Fix UAF issue
Fix UAF issue. Imran Khan (1): kernfs: remove redundant kernfs_rwsem declaration. Minchan Kim (3): kernfs: switch global kernfs_rwsem lock to per-fs lock kernfs: prevent early freeing of root node kernfs: fix NULL dereferencing in kernfs_remove Sebastian Andrzej Siewior (1): kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir(). Yushan Zhou (1): kernfs: fix potential NULL dereference in __kernfs_remove Zizhi Wo (1): kernfs: Fix kabi broken in struct kernfs_root fs/kernfs/dir.c | 132 +++++++++++++++++++++++------------- fs/kernfs/file.c | 6 +- fs/kernfs/inode.c | 22 +++--- fs/kernfs/kernfs-internal.h | 1 - fs/kernfs/mount.c | 15 ++-- fs/kernfs/symlink.c | 5 +- include/linux/kernfs.h | 2 + 7 files changed, 115 insertions(+), 68 deletions(-) -- 2.39.2
From: Minchan Kim <minchan@kernel.org> mainline inclusion from mainline-v5.17-rc1 commit 393c3714081a53795bbff0e985d24146def6f57f category: perf bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The kernfs implementation has big lock granularity(kernfs_rwsem) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. It makes trouble for some cases to wait the global lock for a long time even though they are totally independent contexts each other. A general example is process A goes under direct reclaim with holding the lock when it accessed the file in sysfs and process B is waiting the lock with exclusive mode and then process C is waiting the lock until process B could finish the job after it gets the lock from process A. This patch switches the global kernfs_rwsem to per-fs lock, which put the rwsem into kernfs_root. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: fs/kernfs/dir.c fs/kernfs/inode.c [1. d3380369dc0c ("kernfs: Separate kernfs_pr_cont_buf and rename_lock.") has merged, which leads to some conflicts and not related to this patch. 2. 9c14d1f43ef1 ("kernfs: fix use-after-free in __kernfs_remove") has merged, which leads to some conflicts in kernfs_remove_by_name_ns(), not related to this patch. 3. e65ce2a50cf6 ("acl: handle idmapped mounts") not merged. This has led to parameter conflicts in some functions related to "user_namespace", which is not related to this patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/dir.c | 110 ++++++++++++++++++++++++----------------- fs/kernfs/file.c | 6 ++- fs/kernfs/inode.c | 22 ++++++--- fs/kernfs/mount.c | 15 +++--- fs/kernfs/symlink.c | 5 +- include/linux/kernfs.h | 2 + 6 files changed, 97 insertions(+), 63 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 1ea4763824c7..08cd8caa2754 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -17,7 +17,6 @@ #include "kernfs-internal.h" -DECLARE_RWSEM(kernfs_rwsem); static DEFINE_SPINLOCK(kernfs_rename_lock); /* kn->parent and ->name */ /* * Don't use rename_lock to piggy back on pr_cont_buf. We don't want to @@ -34,7 +33,7 @@ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */ static bool kernfs_active(struct kernfs_node *kn) { - lockdep_assert_held(&kernfs_rwsem); + lockdep_assert_held(&kernfs_root(kn)->kernfs_rwsem); return atomic_read(&kn->active) >= 0; } @@ -465,14 +464,15 @@ void kernfs_put_active(struct kernfs_node *kn) * return after draining is complete. */ static void kernfs_drain(struct kernfs_node *kn) - __releases(&kernfs_rwsem) __acquires(&kernfs_rwsem) + __releases(&kernfs_root(kn)->kernfs_rwsem) + __acquires(&kernfs_root(kn)->kernfs_rwsem) { struct kernfs_root *root = kernfs_root(kn); - lockdep_assert_held_write(&kernfs_rwsem); + lockdep_assert_held_write(&root->kernfs_rwsem); WARN_ON_ONCE(kernfs_active(kn)); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); if (kernfs_lockdep(kn)) { rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_); @@ -491,7 +491,7 @@ static void kernfs_drain(struct kernfs_node *kn) kernfs_drain_open_files(kn); - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); } /** @@ -561,6 +561,7 @@ EXPORT_SYMBOL_GPL(kernfs_put); static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) { struct kernfs_node *kn; + struct kernfs_root *root; if (flags & LOOKUP_RCU) return -ECHILD; @@ -572,18 +573,19 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) /* If the kernfs parent node has changed discard and * proceed to ->lookup. */ - down_read(&kernfs_rwsem); spin_lock(&dentry->d_lock); parent = kernfs_dentry_node(dentry->d_parent); if (parent) { + spin_unlock(&dentry->d_lock); + root = kernfs_root(parent); + down_read(&root->kernfs_rwsem); if (kernfs_dir_changed(parent, dentry)) { - spin_unlock(&dentry->d_lock); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return 0; } - } - spin_unlock(&dentry->d_lock); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); + } else + spin_unlock(&dentry->d_lock); /* The kernfs parent node hasn't changed, leave the * dentry negative and return success. @@ -592,7 +594,8 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) } kn = kernfs_dentry_node(dentry); - down_read(&kernfs_rwsem); + root = kernfs_root(kn); + down_read(&root->kernfs_rwsem); /* The kernfs node has been deactivated */ if (!kernfs_active(kn)) @@ -611,10 +614,10 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) kernfs_info(dentry->d_sb)->ns != kn->ns) goto out_bad; - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return 1; out_bad: - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return 0; } @@ -805,11 +808,12 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root, int kernfs_add_one(struct kernfs_node *kn) { struct kernfs_node *parent = kn->parent; + struct kernfs_root *root = kernfs_root(parent); struct kernfs_iattrs *ps_iattr; bool has_ns; int ret; - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); ret = -EINVAL; has_ns = kernfs_ns_enabled(parent); @@ -840,7 +844,7 @@ int kernfs_add_one(struct kernfs_node *kn) ps_iattr->ia_mtime = ps_iattr->ia_ctime; } - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); /* * Activate the new node unless CREATE_DEACTIVATED is requested. @@ -854,7 +858,7 @@ int kernfs_add_one(struct kernfs_node *kn) return 0; out_unlock: - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); return ret; } @@ -875,7 +879,7 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent, bool has_ns = kernfs_ns_enabled(parent); unsigned int hash; - lockdep_assert_held(&kernfs_rwsem); + lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem); if (has_ns != (bool)ns) { WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n", @@ -907,7 +911,7 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent, size_t len; char *p, *name; - lockdep_assert_held_read(&kernfs_rwsem); + lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem); spin_lock_irq(&kernfs_pr_cont_lock); @@ -945,11 +949,12 @@ struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent, const char *name, const void *ns) { struct kernfs_node *kn; + struct kernfs_root *root = kernfs_root(parent); - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); kernfs_get(kn); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return kn; } @@ -969,11 +974,12 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent, const char *path, const void *ns) { struct kernfs_node *kn; + struct kernfs_root *root = kernfs_root(parent); - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); kn = kernfs_walk_ns(parent, path, ns); kernfs_get(kn); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return kn; } @@ -998,6 +1004,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, return ERR_PTR(-ENOMEM); idr_init(&root->ino_idr); + init_rwsem(&root->kernfs_rwsem); INIT_LIST_HEAD(&root->supers); /* @@ -1124,10 +1131,12 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, { struct kernfs_node *parent = dir->i_private; struct kernfs_node *kn; + struct kernfs_root *root; struct inode *inode = NULL; const void *ns = NULL; - down_read(&kernfs_rwsem); + root = kernfs_root(parent); + down_read(&root->kernfs_rwsem); if (kernfs_ns_enabled(parent)) ns = kernfs_info(dir->i_sb)->ns; @@ -1138,7 +1147,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, * create a negative. */ if (!kernfs_active(kn)) { - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return NULL; } inode = kernfs_get_inode(dir->i_sb, kn); @@ -1153,7 +1162,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, */ if (!IS_ERR(inode)) kernfs_set_rev(parent, dentry); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); /* instantiate and hash (possibly negative) dentry */ return d_splice_alias(inode, dentry); @@ -1274,7 +1283,7 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, { struct rb_node *rbn; - lockdep_assert_held_write(&kernfs_rwsem); + lockdep_assert_held_write(&kernfs_root(root)->kernfs_rwsem); /* if first iteration, visit leftmost descendant which may be root */ if (!pos) @@ -1309,8 +1318,9 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, void kernfs_activate(struct kernfs_node *kn) { struct kernfs_node *pos; + struct kernfs_root *root = kernfs_root(kn); - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); pos = NULL; while ((pos = kernfs_next_descendant_post(pos, kn))) { @@ -1324,14 +1334,14 @@ void kernfs_activate(struct kernfs_node *kn) pos->flags |= KERNFS_ACTIVATED; } - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); } static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; - lockdep_assert_held_write(&kernfs_rwsem); + lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem); /* * Short-circuit if non-root @kn has already finished removal. @@ -1401,9 +1411,11 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - down_write(&kernfs_rwsem); + struct kernfs_root *root = kernfs_root(kn); + + down_write(&root->kernfs_rwsem); __kernfs_remove(kn); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); } /** @@ -1489,8 +1501,9 @@ void kernfs_unbreak_active_protection(struct kernfs_node *kn) bool kernfs_remove_self(struct kernfs_node *kn) { bool ret; + struct kernfs_root *root = kernfs_root(kn); - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); kernfs_break_active_protection(kn); /* @@ -1518,9 +1531,9 @@ bool kernfs_remove_self(struct kernfs_node *kn) atomic_read(&kn->active) == KN_DEACTIVATED_BIAS) break; - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); schedule(); - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); } finish_wait(waitq, &wait); WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb)); @@ -1533,7 +1546,7 @@ bool kernfs_remove_self(struct kernfs_node *kn) */ kernfs_unbreak_active_protection(kn); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); return ret; } @@ -1550,6 +1563,7 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, const void *ns) { struct kernfs_node *kn; + struct kernfs_root *root; if (!parent) { WARN(1, KERN_WARNING "kernfs: can not remove '%s', no directory\n", @@ -1557,7 +1571,8 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, return -ENOENT; } - down_write(&kernfs_rwsem); + root = kernfs_root(parent); + down_write(&root->kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); if (kn) { @@ -1566,7 +1581,7 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, kernfs_put(kn); } - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); if (kn) return 0; @@ -1585,6 +1600,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name, const void *new_ns) { struct kernfs_node *old_parent; + struct kernfs_root *root; const char *old_name = NULL; int error; @@ -1592,7 +1608,8 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, if (!kn->parent) return -EINVAL; - down_write(&kernfs_rwsem); + root = kernfs_root(kn); + down_write(&root->kernfs_rwsem); error = -ENOENT; if (!kernfs_active(kn) || !kernfs_active(new_parent) || @@ -1646,7 +1663,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, error = 0; out: - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); return error; } @@ -1717,11 +1734,14 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) struct dentry *dentry = file->f_path.dentry; struct kernfs_node *parent = kernfs_dentry_node(dentry); struct kernfs_node *pos = file->private_data; + struct kernfs_root *root; const void *ns = NULL; if (!dir_emit_dots(file, ctx)) return 0; - down_read(&kernfs_rwsem); + + root = kernfs_root(parent); + down_read(&root->kernfs_rwsem); if (kernfs_ns_enabled(parent)) ns = kernfs_info(dentry->d_sb)->ns; @@ -1738,12 +1758,12 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) file->private_data = pos; kernfs_get(pos); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); if (!dir_emit(ctx, name, len, ino, type)) return 0; - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); } - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); file->private_data = NULL; ctx->pos = INT_MAX; return 0; diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 60e2a86c535e..9414a7a60a9f 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -847,6 +847,7 @@ static void kernfs_notify_workfn(struct work_struct *work) { struct kernfs_node *kn; struct kernfs_super_info *info; + struct kernfs_root *root; repeat: /* pop one off the notify_list */ spin_lock_irq(&kernfs_notify_lock); @@ -859,8 +860,9 @@ static void kernfs_notify_workfn(struct work_struct *work) kn->attr.notify_next = NULL; spin_unlock_irq(&kernfs_notify_lock); + root = kernfs_root(kn); /* kick fsnotify */ - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; @@ -898,7 +900,7 @@ static void kernfs_notify_workfn(struct work_struct *work) iput(inode); } - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); kernfs_put(kn); goto repeat; } diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 702385efddea..fb0ea9c247e9 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -105,10 +105,11 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { int ret; + struct kernfs_root *root = kernfs_root(kn); - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); ret = __kernfs_setattr(kn, iattr); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); return ret; } @@ -116,12 +117,14 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) { struct inode *inode = d_inode(dentry); struct kernfs_node *kn = inode->i_private; + struct kernfs_root *root; int error; if (!kn) return -EINVAL; - down_write(&kernfs_rwsem); + root = kernfs_root(kn); + down_write(&root->kernfs_rwsem); error = setattr_prepare(dentry, iattr); if (error) goto out; @@ -134,7 +137,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) setattr_copy(inode, iattr); out: - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); return error; } @@ -188,14 +191,15 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat, { struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; + struct kernfs_root *root = kernfs_root(kn); - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); generic_fillattr(inode, stat); spin_unlock(&inode->i_lock); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return 0; } @@ -278,19 +282,21 @@ void kernfs_evict_inode(struct inode *inode) int kernfs_iop_permission(struct inode *inode, int mask) { struct kernfs_node *kn; + struct kernfs_root *root; int ret; if (mask & MAY_NOT_BLOCK) return -ECHILD; kn = inode->i_private; + root = kernfs_root(kn); - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); ret = generic_permission(inode, mask); spin_unlock(&inode->i_lock); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return ret; } diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index f2f909d09f52..cfa79715fc1a 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -236,6 +236,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn, static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *kfc) { struct kernfs_super_info *info = kernfs_info(sb); + struct kernfs_root *kf_root = kfc->root; struct inode *inode; struct dentry *root; @@ -255,9 +256,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k sb->s_shrink.seeks = 0; /* get root inode, initialize and unlock it */ - down_read(&kernfs_rwsem); + down_read(&kf_root->kernfs_rwsem); inode = kernfs_get_inode(sb, info->root->kn); - up_read(&kernfs_rwsem); + up_read(&kf_root->kernfs_rwsem); if (!inode) { pr_debug("kernfs: could not get root inode\n"); return -ENOMEM; @@ -334,6 +335,7 @@ int kernfs_get_tree(struct fs_context *fc) if (!sb->s_root) { struct kernfs_super_info *info = kernfs_info(sb); + struct kernfs_root *root = kfc->root; kfc->new_sb_created = true; @@ -344,9 +346,9 @@ int kernfs_get_tree(struct fs_context *fc) } sb->s_flags |= SB_ACTIVE; - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); list_add(&info->node, &info->root->supers); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); } fc->root = dget(sb->s_root); @@ -371,10 +373,11 @@ void kernfs_free_fs_context(struct fs_context *fc) void kernfs_kill_sb(struct super_block *sb) { struct kernfs_super_info *info = kernfs_info(sb); + struct kernfs_root *root = info->root; - down_write(&kernfs_rwsem); + down_write(&root->kernfs_rwsem); list_del(&info->node); - up_write(&kernfs_rwsem); + up_write(&root->kernfs_rwsem); /* * Remove the superblock from fs_supers/s_instances diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c index c8f8e41b8411..efb0b9ca9057 100644 --- a/fs/kernfs/symlink.c +++ b/fs/kernfs/symlink.c @@ -114,11 +114,12 @@ static int kernfs_getlink(struct inode *inode, char *path) struct kernfs_node *kn = inode->i_private; struct kernfs_node *parent = kn->parent; struct kernfs_node *target = kn->symlink.target_kn; + struct kernfs_root *root = kernfs_root(parent); int error; - down_read(&kernfs_rwsem); + down_read(&root->kernfs_rwsem); error = kernfs_get_target_path(parent, target, path); - up_read(&kernfs_rwsem); + up_read(&root->kernfs_rwsem); return error; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 27399569b8f1..d1077ddfda04 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -17,6 +17,7 @@ #include <linux/atomic.h> #include <linux/uidgid.h> #include <linux/wait.h> +#include <linux/rwsem.h> struct file; struct dentry; @@ -203,6 +204,7 @@ struct kernfs_root { struct list_head supers; wait_queue_head_t deactivate_waitq; + struct rw_semaphore kernfs_rwsem; }; struct kernfs_open_file { -- 2.39.2
From: Minchan Kim <minchan@kernel.org> mainline inclusion from mainline-v5.17-rc1 commit 555a0ce4558d87d5b97c4321f34b19e051c7b0c1 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Marek reported the warning below. ========================= WARNING: held lock freed! 5.16.0-rc2+ #10984 Not tainted ------------------------- kworker/1:0/18 is freeing memory ffff00004034e200-ffff00004034e3ff, with a lock still held there! ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at: __kernfs_remove+0x310/0x37c 3 locks held by kworker/1:0/18: #0: ffff000040107938 ((wq_completion)cgroup_destroy){+.+.}-{0:0}, at: process_one_work+0x1f0/0x6f0 #1: ffff80000b55bdc0 ((work_completion)(&(&css->destroy_rwork)->work)){+.+.}-{0:0}, at: process_one_work+0x1f0/0x6f0 #2: ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at: __kernfs_remove+0x310/0x37c stack backtrace: CPU: 1 PID: 18 Comm: kworker/1:0 Not tainted 5.16.0-rc2+ #10984 Hardware name: Raspberry Pi 4 Model B (DT) Workqueue: cgroup_destroy css_free_rwork_fn Call trace: dump_backtrace+0x0/0x1ac show_stack+0x18/0x24 dump_stack_lvl+0x8c/0xb8 dump_stack+0x18/0x34 debug_check_no_locks_freed+0x124/0x140 kfree+0xf0/0x3a4 kernfs_put+0x1f8/0x224 __kernfs_remove+0x1b8/0x37c kernfs_destroy_root+0x38/0x50 css_free_rwork_fn+0x288/0x3d4 process_one_work+0x288/0x6f0 worker_thread+0x74/0x470 kthread+0x188/0x194 ret_from_fork+0x10/0x20 Since kernfs moves the kernfs_rwsem lock into root, it couldn't hold the lock when the root node is tearing down. Thus, get the refcount of root node. Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211201231648.1027165-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/dir.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 08cd8caa2754..688aa66e1174 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1050,7 +1050,13 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, */ void kernfs_destroy_root(struct kernfs_root *root) { - kernfs_remove(root->kn); /* will also free @root */ + /* + * kernfs_remove holds kernfs_rwsem from the root so the root + * shouldn't be freed during the operation. + */ + kernfs_get(root->kn); + kernfs_remove(root->kn); + kernfs_put(root->kn); /* will also free @root */ } /** -- 2.39.2
From: Imran Khan <imran.f.khan@oracle.com> mainline inclusion from mainline-v5.18-rc1 commit f3a690227f07390b0cfccb96ebbd87bf4ab0a5b2 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Since 'commit 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock")' per-fs kernfs_rwsem has replaced global kernfs_rwsem. Remove redundant declaration of global kernfs_rwsem. Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220218010205.717582-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/kernfs-internal.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index c933d9bd8a78..1106102159d3 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -119,7 +119,6 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr); /* * dir.c */ -extern struct rw_semaphore kernfs_rwsem; extern const struct dentry_operations kernfs_dops; extern const struct file_operations kernfs_dir_fops; extern const struct inode_operations kernfs_dir_iops; -- 2.39.2
From: Minchan Kim <minchan@kernel.org> mainline inclusion from mainline-v5.18-rc5 commit ad8d869343ae4a07a2038a4ca923f699308c8323 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- kernfs_remove supported NULL kernfs_node param to bail out but revent per-fs lock change introduced regression that dereferencing the param without NULL check so kernel goes crash. This patch checks the NULL kernfs_node in kernfs_remove and if so, just return. Quote from bug report by Jirka ``` The bug is triggered by running NAS Parallel benchmark suite on SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error log: [ 247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 247.036009] #PF: supervisor read access in kernel mode [ 247.036009] #PF: error_code(0x0000) - not-present page [ 247.036009] PGD 0 P4D 0 [ 247.036009] Oops: 0000 [#1] PREEMPT SMP PTI [ 247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16 [ 247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 2.0b 03/07/2018 [ 247.058060] RIP: 0010:kernfs_remove+0x8/0x50 [ 247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4 ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83 c4 60 [ 247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246 [ 247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018 [ 247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000 [ 247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018 [ 247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000 [ 247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100 [ 247.122048] FS: 00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000) knlGS:0000000000000000 [ 247.122048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0 [ 247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 247.122048] PKRU: 55555554 [ 247.122048] Call Trace: [ 247.122048] <TASK> [ 247.122048] rdt_kill_sb+0x29d/0x350 [ 247.122048] deactivate_locked_super+0x36/0xa0 [ 247.122048] cleanup_mnt+0x131/0x190 [ 247.122048] task_work_run+0x5c/0x90 [ 247.122048] exit_to_user_mode_prepare+0x229/0x230 [ 247.122048] syscall_exit_to_user_mode+0x18/0x40 [ 247.122048] do_syscall_64+0x48/0x90 [ 247.122048] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 247.122048] RIP: 0033:0x7f01be2d735b ``` Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696 Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BE... Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock) Cc: stable@vger.kernel.org Reported-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Jirka Hladky <jhladky@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/dir.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 688aa66e1174..37dbd6a7b34c 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1417,7 +1417,12 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - struct kernfs_root *root = kernfs_root(kn); + struct kernfs_root *root; + + if (!kn) + return; + + root = kernfs_root(kn); down_write(&root->kernfs_rwsem); __kernfs_remove(kn); -- 2.39.2
From: Yushan Zhou <katrinzhou@tencent.com> mainline inclusion from mainline-v6.0-rc1 commit 72b5d5aef246a0387cefa23121dd90901c7a691a category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When lockdep is enabled, lockdep_assert_held_write would cause potential NULL pointer dereference. Fix the following smatch warnings: fs/kernfs/dir.c:1353 __kernfs_remove() warn: variable dereferenced before check 'kn' (see line 1346) Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Yushan Zhou <katrinzhou@tencent.com> Link: https://lore.kernel.org/r/20220630082512.3482581-1-zys.zljxml@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/dir.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 37dbd6a7b34c..328f88cf653f 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1347,14 +1347,17 @@ static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; + /* Short-circuit if non-root @kn has already finished removal. */ + if (!kn) + return; + lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem); /* - * Short-circuit if non-root @kn has already finished removal. * This is for kernfs_remove_self() which plays with active ref * after removal. */ - if (!kn || (kn->parent && RB_EMPTY_NODE(&kn->rb))) + if (kn->parent && RB_EMPTY_NODE(&kn->rb)) return; pr_debug("kernfs %s: removing\n", kn->name); -- 2.39.2
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> mainline inclusion from mainline-v6.15-rc1 commit 9aab10a0249eab4ec77c6a5e4f66442610c12a09 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The readdir operation iterates over all entries and invokes dir_emit() for every entry passing kernfs_node::name as argument. Since the name argument can change, and become invalid, the kernfs_root::kernfs_rwsem lock should not be dropped to prevent renames during the operation. The lock drop around dir_emit() has been initially introduced in commit 1e5289c97bba2 ("sysfs: Cache the last sysfs_dirent to improve readdir scalability v2") to avoid holding a global lock during a page fault. The lock drop is wrong since the support of renames and not a big burden since the lock is no longer global. Don't re-acquire kernfs_root::kernfs_rwsem while copying the name to the userpace buffer. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250213145023.2820193-5-bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- fs/kernfs/dir.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 328f88cf653f..38a544ee2c06 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1772,10 +1772,10 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) file->private_data = pos; kernfs_get(pos); - up_read(&root->kernfs_rwsem); - if (!dir_emit(ctx, name, len, ino, type)) + if (!dir_emit(ctx, name, len, ino, type)) { + up_read(&root->kernfs_rwsem); return 0; - down_read(&root->kernfs_rwsem); + } } up_read(&root->kernfs_rwsem); file->private_data = NULL; -- 2.39.2
hulk inclusion category: other bugzilla: https://atomgit.com/openeuler/kernel/issues/8346 -------------------------------- Struct kernfs_root has add a new member named "kernfs_rwsem", but it does not affect the actual module because the influence of "kernfs_root" is limited within the kernel. Therefore, KABI_EXTEND is used to prevent KABI broken. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- include/linux/kernfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index d1077ddfda04..9f286f2aa5c8 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -204,7 +204,7 @@ struct kernfs_root { struct list_head supers; wait_queue_head_t deactivate_waitq; - struct rw_semaphore kernfs_rwsem; + KABI_EXTEND(struct rw_semaphore kernfs_rwsem) }; struct kernfs_open_file { -- 2.39.2
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/20016 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/XCU... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/20016 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/XCU...
participants (2)
-
patchwork bot -
Zizhi Wo