From: Dave Hansen dave.hansen@linux.intel.com
stable inclusion from stable-v5.10.170 commit 3b6ce54cfa2c04f0636fd0c985913af8703b408d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I71N8L CVE: CVE-2023-0459
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream.
The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively:
if (access_ok(from, size)) // Right here
even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated.
But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down.
"copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this:
if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar);
If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values.
Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy.
Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code.
Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away.
Reported-by: Jordy Zomer jordyzomer@google.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Daniel Borkmann daniel@iogearbox.net # BPF bits Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: lib/usercopy.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/nospec.h | 4 ++++ kernel/bpf/core.c | 2 -- lib/usercopy.c | 7 +++++++ 3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/nospec.h b/include/linux/nospec.h index c1e79f72cd89..9f0af4f116d9 100644 --- a/include/linux/nospec.h +++ b/include/linux/nospec.h @@ -11,6 +11,10 @@
struct task_struct;
+#ifndef barrier_nospec +# define barrier_nospec() do { } while (0) +#endif + /** * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise * @index: array element index diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index fd2aa6b9909e..c18aed60ce40 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1642,9 +1642,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) * reuse preexisting logic from Spectre v1 mitigation that * happens to produce the required code on x86 for v4 as well. */ -#ifdef CONFIG_X86 barrier_nospec(); -#endif CONT; #define LDST(SIZEOP, SIZE) \ STX_MEM_##SIZEOP: \ diff --git a/lib/usercopy.c b/lib/usercopy.c index 7413dd300516..7ee63df042d7 100644 --- a/lib/usercopy.c +++ b/lib/usercopy.c @@ -3,6 +3,7 @@ #include <linux/fault-inject-usercopy.h> #include <linux/instrumented.h> #include <linux/uaccess.h> +#include <linux/nospec.h>
/* out-of-line parts */
@@ -12,6 +13,12 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n unsigned long res = n; might_fault(); if (!should_fail_usercopy() && likely(access_ok(from, n))) { + /* + * Ensure that bad access_ok() speculation will not + * lead to nasty side effects *after* the copy is + * finished: + */ + barrier_nospec(); instrument_copy_from_user(to, from, n); res = raw_copy_from_user(to, from, n); }
From: Linus Torvalds torvalds@linux-foundation.org
stable inclusion from stable-v5.10.170 commit 12e3119a87627741bd3871c895ce198f21529eb3 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I71N8L
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc upstream.
Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") built fine on x86-64 and arm64, and that's the extent of my local build testing.
It turns out those got the <linux/nospec.h> include incidentally through other header files (<linux/kvm_host.h> in particular), but that was not true of other architectures, resulting in build errors
kernel/bpf/core.c: In function ‘___bpf_prog_run’: kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’
so just make sure to explicitly include the proper <linux/nospec.h> header file to make everybody see it.
Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") Reported-by: kernel test robot lkp@intel.com Reported-by: Viresh Kumar viresh.kumar@linaro.org Reported-by: Huacai Chen chenhuacai@loongson.cn Tested-by: Geert Uytterhoeven geert@linux-m68k.org Tested-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Alexei Starovoitov alexei.starovoitov@gmail.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: kernel/bpf/core.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/bpf/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index c18aed60ce40..73d4b1e32fbd 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -32,6 +32,7 @@ #include <linux/perf_event.h> #include <linux/extable.h> #include <linux/log2.h> +#include <linux/nospec.h>
#include <asm/barrier.h> #include <asm/unaligned.h>
From: Miklos Szeredi mszeredi@redhat.com
mainline inclusion from mainline-v5.15-rc1 commit 0cad6246621b5887d5b33fea84219d2a71f2f99a category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I6ZCW0 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Szeredi mszeredi@redhat.com [chengzhihao: rename get_acl to get_acl2 to prevent KABI changes, and only backport(realize) overlayfs] Conflicts: fs/overlayfs/dir.c fs/overlayfs/inode.c fs/overlayfs/overlayfs.h fs/posix_acl.c include/linux/fs.h Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/overlayfs/dir.c | 2 +- fs/overlayfs/inode.c | 9 ++++++--- fs/overlayfs/overlayfs.h | 2 +- fs/posix_acl.c | 7 +++++-- include/linux/fs.h | 1 + 5 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index d0e5cde27702..9601acd2dc7b 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -1303,6 +1303,6 @@ const struct inode_operations ovl_dir_inode_operations = { .permission = ovl_permission, .getattr = ovl_getattr, .listxattr = ovl_listxattr, - .get_acl = ovl_get_acl, + .get_acl2 = ovl_get_acl, .update_time = ovl_update_time, }; diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index d7a410d83743..0148d819910e 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -441,12 +441,15 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size) return res; }
-struct posix_acl *ovl_get_acl(struct inode *inode, int type) +struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) { struct inode *realinode = ovl_inode_real(inode); const struct cred *old_cred; struct posix_acl *acl;
+ if (rcu) + return ERR_PTR(-ECHILD); + if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode)) return NULL;
@@ -496,7 +499,7 @@ static const struct inode_operations ovl_file_inode_operations = { .permission = ovl_permission, .getattr = ovl_getattr, .listxattr = ovl_listxattr, - .get_acl = ovl_get_acl, + .get_acl2 = ovl_get_acl, .update_time = ovl_update_time, .fiemap = ovl_fiemap, }; @@ -514,7 +517,7 @@ static const struct inode_operations ovl_special_inode_operations = { .permission = ovl_permission, .getattr = ovl_getattr, .listxattr = ovl_listxattr, - .get_acl = ovl_get_acl, + .get_acl2 = ovl_get_acl, .update_time = ovl_update_time, };
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 898de3bf884e..a262a60b82c2 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -466,7 +466,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name, void *value, size_t size); ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size); -struct posix_acl *ovl_get_acl(struct inode *inode, int type); +struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu); int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags); bool ovl_is_private_xattr(struct super_block *sb, const char *name);
diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 95882b3f5f62..c7eeb841c2ad 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -134,11 +134,14 @@ struct posix_acl *get_acl(struct inode *inode, int type) * If the filesystem doesn't have a get_acl() function at all, we'll * just create the negative cache entry. */ - if (!inode->i_op->get_acl) { + if (!inode->i_op->get_acl && !inode->i_op->get_acl2) { set_cached_acl(inode, type, NULL); return NULL; } - acl = inode->i_op->get_acl(inode, type); + if (inode->i_op->get_acl) + acl = inode->i_op->get_acl(inode, type); + else + acl = inode->i_op->get_acl2(inode, type, false);
if (IS_ERR(acl)) { /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 98236a86cca0..d7e2e95209b5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1933,6 +1933,7 @@ struct inode_operations { umode_t create_mode); int (*tmpfile) (struct inode *, struct dentry *, umode_t); int (*set_acl)(struct inode *, struct posix_acl *, int); + struct posix_acl * (*get_acl2)(struct inode *, int, bool);
KABI_RESERVE(1) KABI_RESERVE(2)
From: Miklos Szeredi mszeredi@redhat.com
mainline inclusion from mainline-v5.15-rc1 commit 332f606b32b6291a944c8cf23b91f53a6e676525 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I6ZCW0 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Overlayfs does not cache ACL's (to avoid double caching). Instead it just calls the underlying filesystem's i_op->get_acl(), which will return the cached value, if possible.
In rcu path walk, however, get_cached_acl_rcu() is employed to get the value from the cache, which will fail on overlayfs resulting in dropping out of rcu walk mode. This can result in a big performance hit in certain situations.
Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which indicates pass-through)
Reported-by: garyhuang zjh.20052005@163.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Conflicts: fs/posix_acl.c Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/overlayfs/inode.c | 7 ++++--- fs/posix_acl.c | 13 ++++++++++++- include/linux/fs.h | 5 +++++ include/linux/posix_acl.h | 3 ++- 4 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 0148d819910e..f319505bc9f0 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -11,6 +11,7 @@ #include <linux/posix_acl.h> #include <linux/ratelimit.h> #include <linux/fiemap.h> +#include <linux/namei.h> #include "overlayfs.h"
@@ -447,12 +448,12 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) const struct cred *old_cred; struct posix_acl *acl;
- if (rcu) - return ERR_PTR(-ECHILD); - if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode)) return NULL;
+ if (rcu) + return get_cached_acl_rcu(realinode, type); + old_cred = ovl_override_creds(inode->i_sb); acl = get_acl(realinode, type); revert_creds(old_cred); diff --git a/fs/posix_acl.c b/fs/posix_acl.c index c7eeb841c2ad..5e908c3cec46 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -22,6 +22,7 @@ #include <linux/xattr.h> #include <linux/export.h> #include <linux/user_namespace.h> +#include <linux/namei.h>
static struct posix_acl **acl_by_type(struct inode *inode, int type) { @@ -56,7 +57,17 @@ EXPORT_SYMBOL(get_cached_acl);
struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type) { - return rcu_dereference(*acl_by_type(inode, type)); + struct posix_acl *acl = rcu_dereference(*acl_by_type(inode, type)); + + if (acl == ACL_DONT_CACHE && inode->i_op->get_acl2) { + struct posix_acl *ret; + + ret = inode->i_op->get_acl2(inode, type, LOOKUP_RCU); + if (!IS_ERR(ret)) + acl = ret; + } + + return acl; } EXPORT_SYMBOL(get_cached_acl_rcu);
diff --git a/include/linux/fs.h b/include/linux/fs.h index d7e2e95209b5..065d1e672126 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -600,6 +600,11 @@ static inline void mapping_allow_writable(struct address_space *mapping)
struct posix_acl; #define ACL_NOT_CACHED ((void *)(-1)) +/* + * ACL_DONT_CACHE is for stacked filesystems, that rely on underlying fs to + * cache the ACL. This also means that ->get_acl2() can be called in RCU mode + * with the LOOKUP_RCU flag. + */ #define ACL_DONT_CACHE ((void *)(-3))
static inline struct posix_acl * diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 90797f1b421d..c23654c2adfe 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -71,6 +71,8 @@ extern int __posix_acl_chmod(struct posix_acl **, gfp_t, umode_t); extern struct posix_acl *get_posix_acl(struct inode *, int); extern int set_posix_acl(struct inode *, int, struct posix_acl *);
+struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type); + #ifdef CONFIG_FS_POSIX_ACL extern int posix_acl_chmod(struct inode *, umode_t); extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **, @@ -81,7 +83,6 @@ extern int simple_set_acl(struct inode *, struct posix_acl *, int); extern int simple_acl_create(struct inode *, struct inode *);
struct posix_acl *get_cached_acl(struct inode *inode, int type); -struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type); void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl); void forget_cached_acl(struct inode *inode, int type); void forget_all_cached_acls(struct inode *inode);
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I6ZCW0 CVE: NA
--------------------------------
Fix kabi broken for importing new inode operation get_inode_acl.
Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 065d1e672126..1e789a30c4c0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1938,9 +1938,8 @@ struct inode_operations { umode_t create_mode); int (*tmpfile) (struct inode *, struct dentry *, umode_t); int (*set_acl)(struct inode *, struct posix_acl *, int); - struct posix_acl * (*get_acl2)(struct inode *, int, bool);
- KABI_RESERVE(1) + KABI_USE(1, struct posix_acl * (*get_acl2)(struct inode *, int, bool)) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4)
From: Zhihao Cheng chengzhihao1@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I70MZX CVE: NA
--------------------------------
Following process: P1 P2 path_openat link_path_walk may_lookup inode_permission(rcu) ovl_permission acl_permission_check check_acl get_cached_acl_rcu ovl_get_inode_acl realinode = ovl_inode_real(ovl_inode) drop_cache __dentry_kill(ovl_dentry) iput(ovl_inode) ovl_destroy_inode(ovl_inode) dput(oi->__upperdentry) dentry_kill(upperdentry) dentry_unlink_inode upperdentry->d_inode = NULL ovl_inode_upper upperdentry = ovl_i_dentry_upper(ovl_inode) d_inode(upperdentry) // returns NULL IS_POSIXACL(realinode) // NULL pointer dereference , will trigger an null pointer dereference at realinode: [ 205.472797] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 205.476701] CPU: 2 PID: 2713 Comm: ls Not tainted 6.3.0-12064-g2edfa098e750-dirty #1216 [ 205.478754] RIP: 0010:do_ovl_get_acl+0x5d/0x300 [ 205.489584] Call Trace: [ 205.489812] <TASK> [ 205.490014] ovl_get_inode_acl+0x26/0x30 [ 205.490466] get_cached_acl_rcu+0x61/0xa0 [ 205.490908] generic_permission+0x1bf/0x4e0 [ 205.491447] ovl_permission+0x79/0x1b0 [ 205.491917] inode_permission+0x15e/0x2c0 [ 205.492425] link_path_walk+0x115/0x550 [ 205.493311] path_lookupat.isra.0+0xb2/0x200 [ 205.493803] filename_lookup+0xda/0x240 [ 205.495747] vfs_fstatat+0x7b/0xb0
Fetch a reproducer in [Link].
Fix it by checking realinode whether to be NULL before accessing it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217404 Fixes: 332f606b32b6 ("ovl: enable RCU'd ->get_acl()") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/overlayfs/inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index f319505bc9f0..4103b487ae63 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -448,7 +448,15 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) const struct cred *old_cred; struct posix_acl *acl;
- if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode)) + if (!IS_ENABLED(CONFIG_FS_POSIX_ACL)) + return NULL; + + if (!realinode) { + WARN_ON(!rcu); + return ERR_PTR(-ECHILD); + } + + if (!IS_POSIXACL(realinode)) return NULL;
if (rcu)
From: Dave Chinner dchinner@redhat.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6ZXOM CVE: NA
Reference: https://patchwork.kernel.org/project/xfs/patch/20230420033550.339934-1-yange...
--------------------------------
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count.
This results in a use after free that can only be triggered in active filesystem shutdown situations.
To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete.
Reported-by: yangerkun yangerkun@huawei.com Signed-off-by: Dave Chinner dchinner@redhat.com Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index b89e8fcb49c9..90b2efc555a5 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -370,10 +370,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -388,13 +396,30 @@ xfs_buf_item_pin(
trace_xfs_buf_item_pin(bip);
+ xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); }
/* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -411,24 +436,18 @@ xfs_buf_item_unpin(
trace_xfs_buf_item_unpin(bip);
- /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters);
- /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active + */ + if (!freed) { + xfs_buf_rele(bp); return; + }
if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -440,6 +459,15 @@ xfs_buf_item_unpin(
trace_xfs_buf_item_unpin_stale(bip);
+ /* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will @@ -456,16 +484,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); }
STATIC uint
From: Edward Lo edward.lo@ambergroup.io
mainline inclusion from mainline-v6.2-rc1 commit 6db620863f8528ed9a9aa5ad323b26554a17881d category: bugfix bugzilla: 188526, https://gitee.com/src-openeuler/kernel/issues/I71SYO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This adds sanity checks for data run offset. We should make sure data run offset is legit before trying to unpack them, otherwise we may encounter use-after-free or some unexpected memory access behaviors.
[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570 [ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240 [ 82.941670] [ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15 [ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 82.943720] Call Trace: [ 82.944204] <TASK> [ 82.944471] dump_stack_lvl+0x49/0x63 [ 82.944908] print_report.cold+0xf5/0x67b [ 82.945141] ? __wait_on_bit+0x106/0x120 [ 82.945750] ? run_unpack+0x2e3/0x570 [ 82.946626] kasan_report+0xa7/0x120 [ 82.947046] ? run_unpack+0x2e3/0x570 [ 82.947280] __asan_load1+0x51/0x60 [ 82.947483] run_unpack+0x2e3/0x570 [ 82.947709] ? memcpy+0x4e/0x70 [ 82.947927] ? run_pack+0x7a0/0x7a0 [ 82.948158] run_unpack_ex+0xad/0x3f0 [ 82.948399] ? mi_enum_attr+0x14a/0x200 [ 82.948717] ? run_unpack+0x570/0x570 [ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0 [ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0 [ 82.949611] ? mi_read+0x262/0x2c0 [ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180 [ 82.950249] ntfs_iget5+0x632/0x1870 [ 82.950621] ? ntfs_get_block_bmap+0x70/0x70 [ 82.951192] ? evict+0x223/0x280 [ 82.951525] ? iput.part.0+0x286/0x320 [ 82.951969] ntfs_fill_super+0x1321/0x1e20 [ 82.952436] ? put_ntfs+0x1d0/0x1d0 [ 82.952822] ? vsprintf+0x20/0x20 [ 82.953188] ? mutex_unlock+0x81/0xd0 [ 82.953379] ? set_blocksize+0x95/0x150 [ 82.954001] get_tree_bdev+0x232/0x370 [ 82.954438] ? put_ntfs+0x1d0/0x1d0 [ 82.954700] ntfs_fs_get_tree+0x15/0x20 [ 82.955049] vfs_get_tree+0x4c/0x130 [ 82.955292] path_mount+0x645/0xfd0 [ 82.955615] ? putname+0x80/0xa0 [ 82.955955] ? finish_automount+0x2e0/0x2e0 [ 82.956310] ? kmem_cache_free+0x110/0x390 [ 82.956723] ? putname+0x80/0xa0 [ 82.957023] do_mount+0xd6/0xf0 [ 82.957411] ? path_mount+0xfd0/0xfd0 [ 82.957638] ? __kasan_check_write+0x14/0x20 [ 82.957948] __x64_sys_mount+0xca/0x110 [ 82.958310] do_syscall_64+0x3b/0x90 [ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 82.959341] RIP: 0033:0x7fd0d1ce948a [ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a [ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0 [ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020 [ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0 [ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/attrib.c | 13 +++++++++++++ fs/ntfs3/attrlist.c | 5 +++++ fs/ntfs3/frecord.c | 14 ++++++++++++++ fs/ntfs3/fslog.c | 9 +++++++++ fs/ntfs3/inode.c | 5 +++++ 5 files changed, 46 insertions(+)
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c index 4e74bc8f01ed..5d4ad0f4feb3 100644 --- a/fs/ntfs3/attrib.c +++ b/fs/ntfs3/attrib.c @@ -101,6 +101,10 @@ int attr_load_runs(struct ATTRIB *attr, struct ntfs_inode *ni,
asize = le32_to_cpu(attr->size); run_off = le16_to_cpu(attr->nres.run_off); + + if (run_off > asize) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, vcn ? *vcn : svcn, Add2Ptr(attr, run_off), asize - run_off); @@ -1157,6 +1161,10 @@ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, }
ro = le16_to_cpu(attr->nres.run_off); + + if (ro > le32_to_cpu(attr->size)) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, ro), le32_to_cpu(attr->size) - ro); if (err < 0) @@ -1832,6 +1840,11 @@ int attr_collapse_range(struct ntfs_inode *ni, u64 vbo, u64 bytes) u16 le_sz; u16 roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn1 - 1, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); diff --git a/fs/ntfs3/attrlist.c b/fs/ntfs3/attrlist.c index bad6d8a849a2..c0c6bcbc8c05 100644 --- a/fs/ntfs3/attrlist.c +++ b/fs/ntfs3/attrlist.c @@ -68,6 +68,11 @@ int ntfs_load_attr_list(struct ntfs_inode *ni, struct ATTRIB *attr)
run_init(&ni->attr_list.run);
+ if (run_off > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(&ni->attr_list.run, ni->mi.sbi, ni->mi.rno, 0, le64_to_cpu(attr->nres.evcn), 0, Add2Ptr(attr, run_off), diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c index 6f47a9c17f89..c831153f0d70 100644 --- a/fs/ntfs3/frecord.c +++ b/fs/ntfs3/frecord.c @@ -567,6 +567,12 @@ static int ni_repack(struct ntfs_inode *ni) }
roff = le16_to_cpu(attr->nres.run_off); + + if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + break; + } + err = run_unpack(&run, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); @@ -1541,6 +1547,9 @@ int ni_delete_all(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) + return -EINVAL; + /* run==1 means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); @@ -2238,6 +2247,11 @@ int ni_decompress_file(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + /*run==1 Means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c index fc36c53b865a..6489953d4617 100644 --- a/fs/ntfs3/fslog.c +++ b/fs/ntfs3/fslog.c @@ -2727,6 +2727,9 @@ static inline bool check_attr(const struct MFT_REC *rec, return false; }
+ if (run_off > asize) + return false; + if (run_unpack(NULL, sbi, 0, svcn, evcn, svcn, Add2Ptr(attr, run_off), asize - run_off) < 0) { return false; @@ -4767,6 +4770,12 @@ int log_replay(struct ntfs_inode *ni, bool *initialized) u16 roff = le16_to_cpu(attr->nres.run_off); CLST svcn = le64_to_cpu(attr->nres.svcn);
+ if (roff > t32) { + kfree(oa->attr); + oa->attr = NULL; + goto fake_attr; + } + err = run_unpack(&oa->run0, sbi, inode->i_ino, svcn, le64_to_cpu(attr->nres.evcn), svcn, Add2Ptr(attr, roff), t32 - roff); diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index a4ee2954346f..db2a0bce67f4 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -373,6 +373,11 @@ static struct inode *ntfs_read_mft(struct inode *inode, attr_unpack_run: roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + t64 = le64_to_cpu(attr->nres.svcn); err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff);
From: Hawkins Jiawei yin31149@gmail.com
mainline inclusion from mainline-v6.2-rc1 commit 887bfc546097fbe8071dac13b2fef73b77920899 category: bugfix bugzilla: 188526, https://gitee.com/src-openeuler/kernel/issues/I71SYO CVE: CVE-2023-26544
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Syzkaller reports slab-out-of-bounds bug as follows: ================================================================== BUG: KASAN: slab-out-of-bounds in run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 Read of size 1 at addr ffff88801bbdff02 by task syz-executor131/3611
[...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 run_unpack_ex+0xb0/0x7c0 fs/ntfs3/run.c:1057 ntfs_read_mft fs/ntfs3/inode.c:368 [inline] ntfs_iget5+0xc20/0x3280 fs/ntfs3/inode.c:501 ntfs_loadlog_and_replay+0x124/0x5d0 fs/ntfs3/fsntfs.c:272 ntfs_fill_super+0x1eff/0x37f0 fs/ntfs3/super.c:1018 get_tree_bdev+0x440/0x760 fs/super.c:1323 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK>
The buggy address belongs to the physical page: page:ffffea00006ef600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1bbd8 head:ffffea00006ef600 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88801bbdfe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbdfe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88801bbdff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88801bbdff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbe0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Kernel will tries to read record and parse MFT from disk in ntfs_read_mft().
Yet the problem is that during enumerating attributes in record, kernel doesn't check whether run_off field loading from the disk is a valid value.
To be more specific, if attr->nres.run_off is larger than attr->size, kernel will passes an invalid argument run_buf_size in run_unpack_ex(), which having an integer overflow. Then this invalid argument will triggers the slab-out-of-bounds Read bug as above.
This patch solves it by adding the sanity check between the offset to packed runs and attribute size.
link: https://lore.kernel.org/all/0000000000009145fc05e94bd5c3@google.com/#t Reported-and-tested-by: syzbot+8d6fbb27a6aded64b25b@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index db2a0bce67f4..e8e20685acf8 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -379,6 +379,13 @@ static struct inode *ntfs_read_mft(struct inode *inode, }
t64 = le64_to_cpu(attr->nres.svcn); + + /* offset to packed runs is out-of-bounds */ + if (roff > asize) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Dan Carpenter dan.carpenter@oracle.com
mainline inclusion from mainline-v6.2-rc1 commit 658015167a8432b88f5d032e9d85d8fd50e5bf2c category: bugfix bugzilla: 188526, https://gitee.com/src-openeuler/kernel/issues/I71SYO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There were two patches which addressed the same bug and added the same condition:
commit 6db620863f85 ("fs/ntfs3: Validate data run offset") commit 887bfc546097 ("fs/ntfs3: Fix slab-out-of-bounds read in run_unpack")
Delete one condition.
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/inode.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index e8e20685acf8..870f235bf226 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -380,12 +380,6 @@ static struct inode *ntfs_read_mft(struct inode *inode,
t64 = le64_to_cpu(attr->nres.svcn);
- /* offset to packed runs is out-of-bounds */ - if (roff > asize) { - err = -EINVAL; - goto out; - } - err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Ruihan Li lrh2000@pku.edu.cn
mainline inclusion from mainline-v6.4-rc1 commit 25c150ac103a4ebeed0319994c742a90634ddf18 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6WHKQ CVE: CVE-2023-2002
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id...
----------------------------------------
Previously, capability was checked using capable(), which verified that the caller of the ioctl system call had the required capability. In addition, the result of the check would be stored in the HCI_SOCK_TRUSTED flag, making it persistent for the socket.
However, malicious programs can abuse this approach by deliberately sharing an HCI socket with a privileged task. The HCI socket will be marked as trusted when the privileged task occasionally makes an ioctl call.
This problem can be solved by using sk_capable() to check capability, which ensures that not only the current task but also the socket opener has the specified capability, thus reducing the risk of privilege escalation through the previously identified vulnerability.
Cc: stable@vger.kernel.org Fixes: f81f5b2db869 ("Bluetooth: Send control open and close messages for HCI raw sockets") Signed-off-by: Ruihan Li lrh2000@pku.edu.cn Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/bluetooth/hci_sock.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 71d18d3295f5..d28e263acb62 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -1000,7 +1000,14 @@ static int hci_sock_ioctl(struct socket *sock, unsigned int cmd, if (hci_sock_gen_cookie(sk)) { struct sk_buff *skb;
- if (capable(CAP_NET_ADMIN)) + /* Perform careful checks before setting the HCI_SOCK_TRUSTED + * flag. Make sure that not only the current task but also + * the socket opener has the required capability, since + * privileged programs can be tricked into making ioctl calls + * on HCI sockets, and the socket should not be marked as + * trusted simply because the ioctl caller is privileged. + */ + if (sk_capable(sk, CAP_NET_ADMIN)) hci_sock_set_flag(sk, HCI_SOCK_TRUSTED);
/* Send event to monitor */
From: Hyunwoo Kim v4bel@theori.io
stable inclusion from stable-v5.10.168 commit dd6991251a1382a9b4984962a0c7a467e9d71812 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I70OFF CVE: CVE-2023-32269
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 611792920925fb088ddccbe2783c7f92fdfb6b64 ]
If you call listen() and accept() on an already connect()ed AF_NETROM socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and nr_accept() dequeues the skb waiting in the sk->sk_receive_queue.
As a result, nr_accept() allocates and returns a sock with the sk of the parent AF_NETROM socket.
And here use-after-free can happen through complex race conditions: ``` cpu0 cpu1 1. socket_2 = socket(AF_NETROM) . . listen(socket_2) accepted_socket = accept(socket_2) 2. socket_1 = socket(AF_NETROM) nr_create() // sk refcount : 1 connect(socket_1) 3. write(accepted_socket) nr_sendmsg() nr_output() nr_kick() nr_send_iframe() nr_transmit_buffer() nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() nr_process_rx_frame(sk, skb); // sk : socket_1's sk nr_state3_machine() nr_queue_rx_frame() sock_queue_rcv_skb() sock_queue_rcv_skb_reason() __sock_queue_rcv_skb() __skb_queue_tail(list, skb); // list : socket_1's sk->sk_receive_queue 4. listen(socket_1) nr_listen() uaf_socket = accept(socket_1) nr_accept() skb_dequeue(&sk->sk_receive_queue); 5. close(accepted_socket) nr_release() nr_write_internal(sk, NR_DISCREQ) nr_transmit_buffer() // NR_DISCREQ nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() // sk : socket_1's sk nr_process_rx_frame() // NR_STATE_3 nr_state3_machine() // NR_DISCREQ nr_disconnect() nr_sk(sk)->state = NR_STATE_0; 6. close(socket_1) // sk refcount : 3 nr_release() // NR_STATE_0 sock_put(sk); // sk refcount : 0 sk_free(sk); close(uaf_socket) nr_release() sock_hold(sk); // UAF ```
KASAN report by syzbot: ``` BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520 Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128
Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x461 mm/kasan/report.c:417 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:102 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline] __refcount_add include/linux/refcount.h:193 [inline] __refcount_inc include/linux/refcount.h:250 [inline] refcount_inc include/linux/refcount.h:267 [inline] sock_hold include/net/sock.h:775 [inline] nr_release+0x66/0x460 net/netrom/af_netrom.c:520 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6c19e3c9b9 Code: Unable to access opcode bytes at 0x7f6c19e3c98f. RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9 RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006 RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0 R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Allocated by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] ____kasan_kmalloc mm/kasan/common.c:330 [inline] __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0xd0 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x140/0x290 net/core/sock.c:2038 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433 __sock_create+0x359/0x790 net/socket.c:1515 sock_create net/socket.c:1566 [inline] __sys_socket_create net/socket.c:1603 [inline] __sys_socket_create net/socket.c:1588 [inline] __sys_socket+0x133/0x250 net/socket.c:1636 __do_sys_socket net/socket.c:1649 [inline] __se_sys_socket net/socket.c:1647 [inline] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] __cache_free mm/slab.c:3394 [inline] __do_kmem_cache_free mm/slab.c:3580 [inline] __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587 sk_prot_free net/core/sock.c:2074 [inline] __sk_destruct+0x5df/0x750 net/core/sock.c:2166 sk_destruct net/core/sock.c:2181 [inline] __sk_free+0x175/0x460 net/core/sock.c:2192 sk_free+0x7c/0xa0 net/core/sock.c:2203 sock_put include/net/sock.h:1991 [inline] nr_release+0x39e/0x460 net/netrom/af_netrom.c:554 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd ```
To fix this issue, nr_listen() returns -EINVAL for sockets that successfully nr_connect().
Reported-by: syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim v4bel@theori.io Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/netrom/af_netrom.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c index e5c8a295e640..5c04da4cfbad 100644 --- a/net/netrom/af_netrom.c +++ b/net/netrom/af_netrom.c @@ -400,6 +400,11 @@ static int nr_listen(struct socket *sock, int backlog) struct sock *sk = sock->sk;
lock_sock(sk); + if (sock->state != SS_UNCONNECTED) { + release_sock(sk); + return -EINVAL; + } + if (sk->sk_state != TCP_LISTEN) { memset(&nr_sk(sk)->user_addr, 0, AX25_ADDR_LEN); sk->sk_max_ack_backlog = backlog;
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6YDHU CVE: NA
--------------------------------
This reverts commit 939325bbad27dc99b8f8cea2791192e5f81b76de.
Because this commit make a mistake to judge if the page is the same.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/filemap.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index f8e1ee68ecac..173c2962b16a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2530,13 +2530,6 @@ static int generic_file_buffered_read_get_pages(struct kiocb *iocb, goto find_page; }
-static inline bool pos_same_page(loff_t pos1, loff_t pos2, struct page *page) -{ - unsigned int shift = page_shift(page); - - return (pos1 >> shift == pos2 >> shift); -} - /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2627,10 +2620,11 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, writably_mapped = mapping_writably_mapped(mapping);
/* - * When a read accesses a page several times, only + * When a sequential read accesses a page several times, only * mark it as accessed the first time. */ - if (pos_same_page(iocb->ki_pos, ra->prev_pos -1, pages[0])) + if (iocb->ki_pos >> PAGE_SHIFT != + ra->prev_pos >> PAGE_SHIFT) mark_page_accessed(pages[0]); for (i = 1; i < pg_nr; i++) mark_page_accessed(pages[i]);
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-5.19-rc4 commit 5ccc944dce3df5fd2fd683a7df4fd49d1068eba2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6YDHU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
We had an off-by-one error which meant that we never marked the first page in a read as accessed. This was visible as a slowdown when re-reading a file as pages were being evicted from cache too soon. In reviewing this code, we noticed a second bug where a multi-page folio would be marked as accessed multiple times when doing reads that were less than the size of the folio.
Abstract the comparison of whether two file positions are in the same folio into a new function, fixing both of these bugs.
Reported-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Kent Overstreet kent.overstreet@gmail.com Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org
Conflict: folios is not supported yet Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/filemap.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index 173c2962b16a..5ac55c3f19f7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2530,6 +2530,13 @@ static int generic_file_buffered_read_get_pages(struct kiocb *iocb, goto find_page; }
+static inline bool pos_same_page(loff_t pos1, loff_t pos2, struct page *page) +{ + unsigned int shift = page_shift(page); + + return (pos1 >> shift == pos2 >> shift); +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2620,11 +2627,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, writably_mapped = mapping_writably_mapped(mapping);
/* - * When a sequential read accesses a page several times, only + * When a read accesses a page several times, only * mark it as accessed the first time. */ - if (iocb->ki_pos >> PAGE_SHIFT != - ra->prev_pos >> PAGE_SHIFT) + if (!pos_same_page(iocb->ki_pos, ra->prev_pos -1, pages[0])) mark_page_accessed(pages[0]); for (i = 1; i < pg_nr; i++) mark_page_accessed(pages[i]);