From: Liu Jian liujian56@huawei.com
mainline inclusion from mainline-v6.3-rc2 commit d900f3d20cc3169ce42ec72acc850e662a4d4db2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
When the buffer length of the recvmsg system call is 0, we got the flollowing soft lockup problem:
watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149] CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:remove_wait_queue+0xb/0xc0 Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20 RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768 RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040 RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7 R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800 R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0 FS: 00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_msg_wait_data+0x279/0x2f0 tcp_bpf_recvmsg_parser+0x3c6/0x490 inet_recvmsg+0x280/0x290 sock_recvmsg+0xfc/0x120 ____sys_recvmsg+0x160/0x3d0 ___sys_recvmsg+0xf0/0x180 __sys_recvmsg+0xea/0x1a0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc
The logic in tcp_bpf_recvmsg_parser is as follows:
msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { wait data; goto msg_bytes_ready; }
In this case, "copied" always is 0, the infinite loop occurs.
According to the Linux system call man page, 0 should be returned in this case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly return. Also modify several other functions with the same problem.
Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap") Fixes: 9825d866ce0d ("af_unix: Implement unix_dgram_bpf_recvmsg()") Fixes: c5d2177a72a1 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian liujian56@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Cc: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com (cherry picked from commit d900f3d20cc3169ce42ec72acc850e662a4d4db2) Signed-off-by: Liu Jian liujian56@huawei.com
Conflicts: net/ipv4/udp_bpf.c net/unix/unix_bpf.c net/ipv4/tcp_bpf.c Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv4/tcp_bpf.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index eaf2308c355a..ddb1730cdf9b 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -273,6 +273,9 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, if (unlikely(flags & MSG_ERRQUEUE)) return inet_recv_error(sk, msg, len, addr_len);
+ if (!len) + return 0; + psock = sk_psock_get(sk); if (unlikely(!psock)) return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
From: Hao Sun sunhao.th@gmail.com
stable inclusion from stable-v5.10.167 commit a1c0263f1eb4deee132e11e52ee6982435460d81 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a3d81bc1eaef48e34dd0b9b48eefed9e02a06451 ]
The following kernel panic can be triggered when a task with pid=1 attaches a prog that attempts to send killing signal to itself, also see [1] for more details:
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106 panic+0x2c4/0x60f kernel/panic.c:275 do_exit.cold+0x63/0xe4 kernel/exit.c:789 do_group_exit+0xd4/0x2a0 kernel/exit.c:950 get_signal+0x2460/0x2600 kernel/signal.c:2858 arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd
So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.
[1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com
Signed-off-by: Hao Sun sunhao.th@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Stanislav Fomichev sdf@google.com Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/trace/bpf_trace.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 81a7b622c691..b1e255e4f8e7 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1072,6 +1072,9 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type) return -EPERM; if (unlikely(!nmi_uaccess_okay())) return -EPERM; + /* Task should not be pid=1 to avoid kernel panic. */ + if (unlikely(is_global_init(current))) + return -EPERM;
if (irqs_disabled()) { /* Do an early check on signal validity. Otherwise,
From: Stanislav Fomichev sdf@google.com
stable inclusion from stable-v5.10.163 commit 329a76635548ee8fceb3b78c7d54d96524e80925 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit f17472d4599697d701aa239b4c475a506bccfd19 ]
Syzkaller managed to hit another decl_tag issue:
btf_func_proto_check kernel/bpf/btf.c:4506 [inline] btf_check_all_types kernel/bpf/btf.c:4734 [inline] btf_parse_type_sec+0x1175/0x1980 kernel/bpf/btf.c:4763 btf_parse kernel/bpf/btf.c:5042 [inline] btf_new_fd+0x65a/0xb00 kernel/bpf/btf.c:6709 bpf_btf_load+0x6f/0x90 kernel/bpf/syscall.c:4342 __sys_bpf+0x50a/0x6c0 kernel/bpf/syscall.c:5034 __do_sys_bpf kernel/bpf/syscall.c:5093 [inline] __se_sys_bpf kernel/bpf/syscall.c:5091 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5091 do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48
This seems similar to commit ea68376c8bed ("bpf: prevent decl_tag from being referenced in func_proto") but for the argument.
Reported-by: syzbot+8dd0551dda6020944c5d@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev sdf@google.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Yonghong Song yhs@fb.com Link: https://lore.kernel.org/bpf/20221123035422.872531-2-sdf@google.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/bpf/btf.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index fba28f17e61a..a8326316992d 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3675,6 +3675,11 @@ static int btf_func_proto_check(struct btf_verifier_env *env, break; }
+ if (btf_type_is_resolve_source_only(arg_type)) { + btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1); + return -EINVAL; + } + if (args[i].name_off && (!btf_name_offset_valid(btf, args[i].name_off) || !btf_name_valid_identifier(btf, args[i].name_off))) {
From: "Paul E. McKenney" paulmck@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit 1c0c4bc1ceb580851b2d76fdef9712b3bdae134b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
If there is heavy softirq activity, the softirq system will attempt to awaken ksoftirqd and will stop the traditional back-of-interrupt softirq processing. This is all well and good, but only if the ksoftirqd kthreads already exist, which is not the case during early boot, in which case the system hangs.
One reproducer is as follows:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make
This commit therefore adds a couple of existence checks for ksoftirqd and forces back-of-interrupt softirq processing when ksoftirqd does not yet exist. With this change, the above test passes.
Reported-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reported-by: Uladzislau Rezki urezki@gmail.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de [ paulmck: Remove unneeded check per Sebastian Siewior feedback. ] Reviewed-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/softirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 09229ad82209..19668d614f47 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -376,7 +376,7 @@ static inline void invoke_softirq(void) if (ksoftirqd_running(local_softirq_pending())) return;
- if (!force_irqthreads) { + if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) { #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK /* * We can safely execute softirq on the current stack if
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
mainline inclusion from mainline-v5.13-rc1 commit c5e3a41187ac01425f5ad1abce927905e4ac44e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized.
[ 148.133411][ T5509] ===================================================== [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable ----new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] =====================================================
Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var().
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Link: https://lore.kernel.org/r/20210401055823.3929-1-penguin-kernel@I-love.SAKURA... Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/irq/proc.c | 4 ++-- kernel/profile.c | 2 +- kernel/trace/trace.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index 72513ed2a5fc..0df62a3a1f37 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -144,7 +144,7 @@ static ssize_t write_irq_affinity(int type, struct file *file, if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EIO;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
if (type) @@ -238,7 +238,7 @@ static ssize_t default_affinity_write(struct file *file, cpumask_var_t new_value; int err;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/profile.c b/kernel/profile.c index 737b1c704aa8..0db1122855c0 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -438,7 +438,7 @@ static ssize_t prof_cpu_mask_proc_write(struct file *file, cpumask_var_t new_value; int err;
- if (!alloc_cpumask_var(&new_value, GFP_KERNEL)) + if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(buffer, count, new_value); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4e130e2bb566..951181d5b9fb 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4792,7 +4792,7 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, cpumask_var_t tracing_cpumask_new; int err;
- if (!alloc_cpumask_var(&tracing_cpumask_new, GFP_KERNEL)) + if (!zalloc_cpumask_var(&tracing_cpumask_new, GFP_KERNEL)) return -ENOMEM;
err = cpumask_parse_user(ubuf, count, tracing_cpumask_new);
From: Nick Hawkins nick.hawkins@hpe.com
mainline inclusion from mainline-v5.19-rc1 commit 8294fec1cab7ae6153525eb68401ed5905921371 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
Enable the workaround for the 764319 Cortex A-9 erratum. CP14 read accesses to the DBGPRSR and DBGOSLSR registers generate an unexpected Undefined Instruction exception when the DBGSWENABLE external pin is set to 0, even when the CP14 accesses are performed from a privileged mode. The work around catches the exception in a way the kernel does not stop execution with the use of undef_hook. This has been found to effect the HPE GXP SoC.
Signed-off-by: Nick Hawkins nick.hawkins@hpe.com Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/hw_breakpoint.c | 26 ++++++++++++++++++++++++++ 2 files changed, 37 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index ba92a15196cf..0d42eaf7acb4 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1022,6 +1022,17 @@ config ARM_ERRATA_764369 relevant cache maintenance functions and sets a specific bit in the diagnostic control register of the SCU.
+config ARM_ERRATA_764319 + bool "ARM errata: Read to DBGPRSR and DBGOSLSR may generate Undefined instruction" + depends on CPU_V7 + help + This option enables the workaround for the 764319 Cortex A-9 erratum. + CP14 read accesses to the DBGPRSR and DBGOSLSR registers generate an + unexpected Undefined Instruction exception when the DBGSWENABLE + external pin is set to 0, even when the CP14 accesses are performed + from a privileged mode. This work around catches the exception in a + way the kernel does not stop execution. + config ARM_ERRATA_775420 bool "ARM errata: A data cache maintenance operation which aborts, might lead to deadlock" depends on CPU_V7 diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index b1423fb130ea..054e9199f30d 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -941,6 +941,23 @@ static int hw_breakpoint_pending(unsigned long addr, unsigned int fsr, return ret; }
+#ifdef CONFIG_ARM_ERRATA_764319 +static int oslsr_fault; + +static int debug_oslsr_trap(struct pt_regs *regs, unsigned int instr) +{ + oslsr_fault = 1; + instruction_pointer(regs) += 4; + return 0; +} + +static struct undef_hook debug_oslsr_hook = { + .instr_mask = 0xffffffff, + .instr_val = 0xee115e91, + .fn = debug_oslsr_trap, +}; +#endif + /* * One-time initialisation. */ @@ -974,7 +991,16 @@ static bool core_has_os_save_restore(void) case ARM_DEBUG_ARCH_V7_1: return true; case ARM_DEBUG_ARCH_V7_ECP14: +#ifdef CONFIG_ARM_ERRATA_764319 + oslsr_fault = 0; + register_undef_hook(&debug_oslsr_hook); ARM_DBG_READ(c1, c1, 4, oslsr); + unregister_undef_hook(&debug_oslsr_hook); + if (oslsr_fault) + return false; +#else + ARM_DBG_READ(c1, c1, 4, oslsr); +#endif if (oslsr & ARM_OSLSR_OSLM0) return true; fallthrough;
From: James Morse james.morse@arm.com
stable inclusion from stable-v5.10.152 commit 51b96ecaedc0a12f6827f189a94f59012dde8208 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 44b3834b2eed595af07021b1c64e6f9bc396398b upstream.
Cortex-A57 and Cortex-A72 have an erratum where an interrupt that occurs between a pair of AES instructions in aarch32 mode may corrupt the ELR. The task will subsequently produce the wrong AES result.
The AES instructions are part of the cryptographic extensions, which are optional. User-space software will detect the support for these instructions from the hwcaps. If the platform doesn't support these instructions a software implementation should be used.
Remove the hwcap bits on affected parts to indicate user-space should not use the AES instructions.
Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: James Morse james.morse@arm.com Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon will@kernel.org [florian: removed arch/arm64/tools/cpucaps and fixup cpufeature.c] Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
conflicts: arch/arm64/Kconfig rch/arm64/include/asm/cpucaps.h rch/arm64/kernel/cpu_errata.c rch/arm64/kernel/cpufeature.c
Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- Documentation/arm64/silicon-errata.rst | 4 ++++ arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/include/asm/cpucaps.h | 1 + arch/arm64/kernel/cpu_errata.c | 16 ++++++++++++++++ arch/arm64/kernel/cpufeature.c | 13 ++++++++++++- 5 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 427091fcfcf1..b72fbbfe3fcb 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -76,10 +76,14 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #853709 | N/A | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ddbdbc7ecdd9..3427154e218d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -492,6 +492,22 @@ config ARM64_ERRATUM_834220
If unsure, say Y.
+config ARM64_ERRATUM_1742098 + bool "Cortex-A57/A72: 1742098: ELR recorded incorrectly on interrupt taken between cryptographic instructions in a sequence" + depends on COMPAT + default y + help + This option removes the AES hwcap for aarch32 user-space to + workaround erratum 1742098 on Cortex-A57 and Cortex-A72. + + Affected parts may corrupt the AES state if an interrupt is + taken between a pair of AES instructions. These instructions + are only present if the cryptography extensions are present. + All software should have a fallback implementation for CPUs + that don't implement the cryptography extensions. + + If unsure, say Y. + config ARM64_ERRATUM_845719 bool "Cortex-A53: 845719: a load might read incorrect data" depends on AARCH32_EL0 diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 3ffa6108c96d..2f4e8c06674e 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -72,6 +72,7 @@ #define ARM64_HAS_ECV 64 #define ARM64_HAS_EPAN 65 #define ARM64_SPECTRE_BHB 66 +#define ARM64_WORKAROUND_1742098 67
#define ARM64_NCAPS 80
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index cc30f82cf994..a975374e140e 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -427,6 +427,14 @@ static const struct midr_range erratum_1463225[] = { }; #endif
+#ifdef CONFIG_ARM64_ERRATUM_1742098 +static struct midr_range broken_aarch32_aes[] = { + MIDR_RANGE(MIDR_CORTEX_A57, 0, 1, 0xf, 0xf), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), + {}, +}; +#endif + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -626,6 +634,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = { 1, 0), }, #endif +#ifdef CONFIG_ARM64_ERRATUM_1742098 + { + .desc = "ARM erratum 1742098", + .capability = ARM64_WORKAROUND_1742098, + CAP_MIDR_RANGE_LIST(broken_aarch32_aes), + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + }, +#endif #ifdef CONFIG_HISILICON_ERRATUM_HIP08_RU_PREFETCH { .desc = "HiSilicon HIP08 Cache Readunique Prefetch Disable", diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 8401e1e0fe04..d096942b57d6 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -76,6 +76,7 @@ #include <asm/cpufeature.h> #include <asm/cpu_ops.h> #include <asm/fpsimd.h> +#include <asm/hwcap.h> #include <asm/mmu_context.h> #include <asm/mte.h> #include <asm/processor.h> @@ -1771,6 +1772,14 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap) } #endif /* CONFIG_ARM64_MTE */
+static void elf_hwcap_fixup(void) +{ +#ifdef CONFIG_ARM64_ERRATUM_1742098 + if (cpus_have_const_cap(ARM64_WORKAROUND_1742098)) + a32_elf_hwcap2 &= ~COMPAT_HWCAP2_AES; +#endif /* ARM64_ERRATUM_1742098 */ +} + /* Internal helper functions to match cpu capability type */ static bool cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) @@ -2837,8 +2846,10 @@ void __init setup_cpu_features(void) setup_system_capabilities(); setup_elf_hwcaps(arm64_elf_hwcaps);
- if (system_supports_32bit_el0()) + if (system_supports_32bit_el0()) { setup_elf_hwcaps(a32_elf_hwcaps); + elf_hwcap_fixup(); + }
if (system_uses_ttbr0_pan()) pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
From: Jisoo Jang jisoo.jang@yonsei.ac.kr
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6NCVX CVE: CVE-2023-1380
Reference: https://patchwork.kernel.org/project/linux-wireless/patch/20230309104457.226...
--------------------------------
Fix a slab-out-of-bounds read that occurs in kmemdup() called from brcmf_get_assoc_ies(). The bug could occur when assoc_info->req_len, data from a URB provided by a USB device, is bigger than the size of buffer which is defined as WL_EXTRA_BUF_MAX.
Add the size check for req_len/resp_len of assoc_info.
Found by a modified version of syzkaller.
[ 46.592467][ T7] ================================================================== [ 46.594687][ T7] BUG: KASAN: slab-out-of-bounds in kmemdup+0x3e/0x50 [ 46.596572][ T7] Read of size 3014656 at addr ffff888019442000 by task kworker/0:1/7 [ 46.598575][ T7] [ 46.599157][ T7] CPU: 0 PID: 7 Comm: kworker/0:1 Tainted: G O 5.14.0+ #145 [ 46.601333][ T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 46.604360][ T7] Workqueue: events brcmf_fweh_event_worker [ 46.605943][ T7] Call Trace: [ 46.606584][ T7] dump_stack_lvl+0x8e/0xd1 [ 46.607446][ T7] print_address_description.constprop.0.cold+0x93/0x334 [ 46.608610][ T7] ? kmemdup+0x3e/0x50 [ 46.609341][ T7] kasan_report.cold+0x79/0xd5 [ 46.610151][ T7] ? kmemdup+0x3e/0x50 [ 46.610796][ T7] kasan_check_range+0x14e/0x1b0 [ 46.611691][ T7] memcpy+0x20/0x60 [ 46.612323][ T7] kmemdup+0x3e/0x50 [ 46.612987][ T7] brcmf_get_assoc_ies+0x967/0xf60 [ 46.613904][ T7] ? brcmf_notify_vif_event+0x3d0/0x3d0 [ 46.614831][ T7] ? lock_chain_count+0x20/0x20 [ 46.615683][ T7] ? mark_lock.part.0+0xfc/0x2770 [ 46.616552][ T7] ? lock_chain_count+0x20/0x20 [ 46.617409][ T7] ? mark_lock.part.0+0xfc/0x2770 [ 46.618244][ T7] ? lock_chain_count+0x20/0x20 [ 46.619024][ T7] brcmf_bss_connect_done.constprop.0+0x241/0x2e0 [ 46.620019][ T7] ? brcmf_parse_configure_security.isra.0+0x2a0/0x2a0 [ 46.620818][ T7] ? __lock_acquire+0x181f/0x5790 [ 46.621462][ T7] brcmf_notify_connect_status+0x448/0x1950 [ 46.622134][ T7] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 46.622736][ T7] ? brcmf_cfg80211_join_ibss+0x7b0/0x7b0 [ 46.623390][ T7] ? find_held_lock+0x2d/0x110 [ 46.623962][ T7] ? brcmf_fweh_event_worker+0x19f/0xc60 [ 46.624603][ T7] ? mark_held_locks+0x9f/0xe0 [ 46.625145][ T7] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 [ 46.625871][ T7] ? brcmf_cfg80211_join_ibss+0x7b0/0x7b0 [ 46.626545][ T7] brcmf_fweh_call_event_handler.isra.0+0x90/0x100 [ 46.627338][ T7] brcmf_fweh_event_worker+0x557/0xc60 [ 46.627962][ T7] ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100 [ 46.628736][ T7] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 46.629396][ T7] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 46.629970][ T7] ? lockdep_hardirqs_on_prepare+0x273/0x3e0 [ 46.630649][ T7] process_one_work+0x92b/0x1460 [ 46.631205][ T7] ? pwq_dec_nr_in_flight+0x330/0x330 [ 46.631821][ T7] ? rwlock_bug.part.0+0x90/0x90 [ 46.632347][ T7] worker_thread+0x95/0xe00 [ 46.632832][ T7] ? __kthread_parkme+0x115/0x1e0 [ 46.633393][ T7] ? process_one_work+0x1460/0x1460 [ 46.633957][ T7] kthread+0x3a1/0x480 [ 46.634369][ T7] ? set_kthread_struct+0x120/0x120 [ 46.634933][ T7] ret_from_fork+0x1f/0x30 [ 46.635431][ T7] [ 46.635687][ T7] Allocated by task 7: [ 46.636151][ T7] kasan_save_stack+0x1b/0x40 [ 46.636628][ T7] __kasan_kmalloc+0x7c/0x90 [ 46.637108][ T7] kmem_cache_alloc_trace+0x19e/0x330 [ 46.637696][ T7] brcmf_cfg80211_attach+0x4a0/0x4040 [ 46.638275][ T7] brcmf_attach+0x389/0xd40 [ 46.638739][ T7] brcmf_usb_probe+0x12de/0x1690 [ 46.639279][ T7] usb_probe_interface+0x2aa/0x760 [ 46.639820][ T7] really_probe+0x205/0xb70 [ 46.640342][ T7] __driver_probe_device+0x311/0x4b0 [ 46.640876][ T7] driver_probe_device+0x4e/0x150 [ 46.641445][ T7] __device_attach_driver+0x1cc/0x2a0 [ 46.642000][ T7] bus_for_each_drv+0x156/0x1d0 [ 46.642543][ T7] __device_attach+0x23f/0x3a0 [ 46.643065][ T7] bus_probe_device+0x1da/0x290 [ 46.643644][ T7] device_add+0xb7b/0x1eb0 [ 46.644130][ T7] usb_set_configuration+0xf59/0x16f0 [ 46.644720][ T7] usb_generic_driver_probe+0x82/0xa0 [ 46.645295][ T7] usb_probe_device+0xbb/0x250 [ 46.645786][ T7] really_probe+0x205/0xb70 [ 46.646258][ T7] __driver_probe_device+0x311/0x4b0 [ 46.646804][ T7] driver_probe_device+0x4e/0x150 [ 46.647387][ T7] __device_attach_driver+0x1cc/0x2a0 [ 46.647926][ T7] bus_for_each_drv+0x156/0x1d0 [ 46.648454][ T7] __device_attach+0x23f/0x3a0 [ 46.648939][ T7] bus_probe_device+0x1da/0x290 [ 46.649478][ T7] device_add+0xb7b/0x1eb0 [ 46.649936][ T7] usb_new_device.cold+0x49c/0x1029 [ 46.650526][ T7] hub_event+0x1c98/0x3950 [ 46.650975][ T7] process_one_work+0x92b/0x1460 [ 46.651535][ T7] worker_thread+0x95/0xe00 [ 46.651991][ T7] kthread+0x3a1/0x480 [ 46.652413][ T7] ret_from_fork+0x1f/0x30 [ 46.652885][ T7] [ 46.653131][ T7] The buggy address belongs to the object at ffff888019442000 [ 46.653131][ T7] which belongs to the cache kmalloc-2k of size 2048 [ 46.654669][ T7] The buggy address is located 0 bytes inside of [ 46.654669][ T7] 2048-byte region [ffff888019442000, ffff888019442800) [ 46.656137][ T7] The buggy address belongs to the page: [ 46.656720][ T7] page:ffffea0000651000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x19440 [ 46.657792][ T7] head:ffffea0000651000 order:3 compound_mapcount:0 compound_pincount:0 [ 46.658673][ T7] flags: 0x100000000010200(slab|head|node=0|zone=1) [ 46.659422][ T7] raw: 0100000000010200 0000000000000000 dead000000000122 ffff888100042000 [ 46.660363][ T7] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 46.661236][ T7] page dumped because: kasan: bad access detected [ 46.661956][ T7] page_owner tracks the page as allocated [ 46.662588][ T7] page last allocated via order 3, migratetype Unmovable, gfp_mask 0x52a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 7, ts 31136961085, free_ts 0 [ 46.664271][ T7] prep_new_page+0x1aa/0x240 [ 46.664763][ T7] get_page_from_freelist+0x159a/0x27c0 [ 46.665340][ T7] __alloc_pages+0x2da/0x6a0 [ 46.665847][ T7] alloc_pages+0xec/0x1e0 [ 46.666308][ T7] allocate_slab+0x380/0x4e0 [ 46.666770][ T7] ___slab_alloc+0x5bc/0x940 [ 46.667264][ T7] __slab_alloc+0x6d/0x80 [ 46.667712][ T7] kmem_cache_alloc_trace+0x30a/0x330 [ 46.668299][ T7] brcmf_usbdev_qinit.constprop.0+0x50/0x470 [ 46.668885][ T7] brcmf_usb_probe+0xc97/0x1690 [ 46.669438][ T7] usb_probe_interface+0x2aa/0x760 [ 46.669988][ T7] really_probe+0x205/0xb70 [ 46.670487][ T7] __driver_probe_device+0x311/0x4b0 [ 46.671031][ T7] driver_probe_device+0x4e/0x150 [ 46.671604][ T7] __device_attach_driver+0x1cc/0x2a0 [ 46.672192][ T7] bus_for_each_drv+0x156/0x1d0 [ 46.672739][ T7] page_owner free stack trace missing [ 46.673335][ T7] [ 46.673620][ T7] Memory state around the buggy address: [ 46.674213][ T7] ffff888019442700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 46.675083][ T7] ffff888019442780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 46.675994][ T7] >ffff888019442800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 46.676875][ T7] ^ [ 46.677323][ T7] ffff888019442880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 46.678190][ T7] ffff888019442900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 46.679052][ T7] ================================================================== [ 46.679945][ T7] Disabling lock debugging due to kernel taint [ 46.680725][ T7] Kernel panic - not syncing:
Reviewed-by: Arend van Spriel arend.vanspriel@broadcom.com Signed-off-by: Jisoo Jang jisoo.jang@yonsei.ac.kr Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230309104457.22628-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index c2b6e5c966d0..300c63d75df2 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -5831,6 +5831,11 @@ static s32 brcmf_get_assoc_ies(struct brcmf_cfg80211_info *cfg, (struct brcmf_cfg80211_assoc_ielen_le *)cfg->extra_buf; req_len = le32_to_cpu(assoc_info->req_len); resp_len = le32_to_cpu(assoc_info->resp_len); + if (req_len > WL_EXTRA_BUF_MAX || resp_len > WL_EXTRA_BUF_MAX) { + bphy_err(drvr, "invalid lengths in assoc info: req %u resp %u\n", + req_len, resp_len); + return -EINVAL; + } if (req_len) { err = brcmf_fil_iovar_data_get(ifp, "assoc_req_ies", cfg->extra_buf,
From: Hangyu Hua hbh25y@gmail.com
mainline inclusion from mainline-v6.3-rc2 commit 49c47cc21b5b7a3d8deb18fc57b0aa2ab1286962 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6NIUR CVE: CVE-2023-28466
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
ctx->crypto_send.info is not protected by lock_sock in do_tls_getsockopt_conf(). A race condition between do_tls_getsockopt_conf() and error paths of do_tls_setsockopt_conf() may lead to a use-after-free or null-deref.
More discussion: https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Hangyu Hua hbh25y@gmail.com Link: https://lore.kernel.org/r/20230228023344.9623-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Liu Jian liujian56@huawei.com
Conflicts: net/tls/tls_main.c Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/tls/tls_main.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index e537085b184f..54863e68f304 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -386,13 +386,11 @@ static int do_tls_getsockopt_conf(struct sock *sk, char __user *optval, rc = -EINVAL; goto out; } - lock_sock(sk); memcpy(crypto_info_aes_gcm_128->iv, cctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, TLS_CIPHER_AES_GCM_128_IV_SIZE); memcpy(crypto_info_aes_gcm_128->rec_seq, cctx->rec_seq, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); - release_sock(sk); if (copy_to_user(optval, crypto_info_aes_gcm_128, sizeof(*crypto_info_aes_gcm_128))) @@ -410,13 +408,11 @@ static int do_tls_getsockopt_conf(struct sock *sk, char __user *optval, rc = -EINVAL; goto out; } - lock_sock(sk); memcpy(crypto_info_aes_gcm_256->iv, cctx->iv + TLS_CIPHER_AES_GCM_256_SALT_SIZE, TLS_CIPHER_AES_GCM_256_IV_SIZE); memcpy(crypto_info_aes_gcm_256->rec_seq, cctx->rec_seq, TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE); - release_sock(sk); if (copy_to_user(optval, crypto_info_aes_gcm_256, sizeof(*crypto_info_aes_gcm_256))) @@ -436,6 +432,8 @@ static int do_tls_getsockopt(struct sock *sk, int optname, { int rc = 0;
+ lock_sock(sk); + switch (optname) { case TLS_TX: case TLS_RX: @@ -446,6 +444,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname, rc = -ENOPROTOOPT; break; } + + release_sock(sk); + return rc; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.158 commit 575a6266f63dbb3b8eb1da03671451f0d81b8034 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6NCP2 CVE: CVE-2023-28327
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
[ Upstream commit b3abe42e94900bdd045c472f9c9be620ba5ce553 ]
Wei Chen reported a NULL deref in sk_user_ns() [0][1], and Paolo diagnosed the root cause: in unix_diag_get_exact(), the newly allocated skb does not have sk. [2]
We must get the user_ns from the NETLINK_CB(in_skb).sk and pass it to sk_diag_fill().
[0]: BUG: kernel NULL pointer dereference, address: 0000000000000270 PGD 12bbce067 P4D 12bbce067 PUD 12bc40067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 27942 Comm: syz-executor.0 Not tainted 6.1.0-rc5-next-20221118 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_user_ns include/net/sock.h:920 [inline] RIP: 0010:sk_diag_dump_uid net/unix/diag.c:119 [inline] RIP: 0010:sk_diag_fill+0x77d/0x890 net/unix/diag.c:170 Code: 89 ef e8 66 d4 2d fd c7 44 24 40 00 00 00 00 49 8d 7c 24 18 e8 54 d7 2d fd 49 8b 5c 24 18 48 8d bb 70 02 00 00 e8 43 d7 2d fd <48> 8b 9b 70 02 00 00 48 8d 7b 10 e8 33 d7 2d fd 48 8b 5b 10 48 8d RSP: 0018:ffffc90000d67968 EFLAGS: 00010246 RAX: ffff88812badaa48 RBX: 0000000000000000 RCX: ffffffff840d481d RDX: 0000000000000465 RSI: 0000000000000000 RDI: 0000000000000270 RBP: ffffc90000d679a8 R08: 0000000000000277 R09: 0000000000000000 R10: 0001ffffffffffff R11: 0001c90000d679a8 R12: ffff88812ac03800 R13: ffff88812c87c400 R14: ffff88812ae42210 R15: ffff888103026940 FS: 00007f08b4e6f700(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000270 CR3: 000000012c58b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> unix_diag_get_exact net/unix/diag.c:285 [inline] unix_diag_handler_dump+0x3f9/0x500 net/unix/diag.c:317 __sock_diag_cmd net/core/sock_diag.c:235 [inline] sock_diag_rcv_msg+0x237/0x250 net/core/sock_diag.c:266 netlink_rcv_skb+0x13e/0x250 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x24/0x40 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2476 ___sys_sendmsg net/socket.c:2530 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2559 __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x4697f9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f08b4e6ec48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000077bf80 RCX: 00000000004697f9 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00000000004d29e9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000077bf80 R13: 0000000000000000 R14: 000000000077bf80 R15: 00007ffdb36bc6c0 </TASK> Modules linked in: CR2: 0000000000000270
[1]: https://lore.kernel.org/netdev/CAO4mrfdvyjFpokhNsiwZiP-wpdSD0AStcJwfKcKQdAAL... [2]: https://lore.kernel.org/netdev/e04315e7c90d9a75613f3993c2baf2d344eef7eb.came...
Fixes: cae9910e7344 ("net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.") Reported-by: syzbot syzkaller@googlegroups.com Reported-by: Wei Chen harperchen1110@gmail.com Diagnosed-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/unix/diag.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index 9ff64f9df1f3..951b33fa8f5c 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -113,14 +113,16 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql); }
-static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb) +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb, + struct user_namespace *user_ns) { - uid_t uid = from_kuid_munged(sk_user_ns(nlskb->sk), sock_i_uid(sk)); + uid_t uid = from_kuid_munged(user_ns, sock_i_uid(sk)); return nla_put(nlskb, UNIX_DIAG_UID, sizeof(uid_t), &uid); }
static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags, int sk_ino) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags, int sk_ino) { struct nlmsghdr *nlh; struct unix_diag_msg *rep; @@ -166,7 +168,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r goto out_nlmsg_trim;
if ((req->udiag_show & UDIAG_SHOW_UID) && - sk_diag_dump_uid(sk, skb)) + sk_diag_dump_uid(sk, skb, user_ns)) goto out_nlmsg_trim;
nlmsg_end(skb, nlh); @@ -178,7 +180,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r }
static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req, - u32 portid, u32 seq, u32 flags) + struct user_namespace *user_ns, + u32 portid, u32 seq, u32 flags) { int sk_ino;
@@ -189,7 +192,7 @@ static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_r if (!sk_ino) return 0;
- return sk_diag_fill(sk, skb, req, portid, seq, flags, sk_ino); + return sk_diag_fill(sk, skb, req, user_ns, portid, seq, flags, sk_ino); }
static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) @@ -217,7 +220,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) goto next; if (!(req->udiag_states & (1 << sk->sk_state))) goto next; - if (sk_diag_dump(sk, skb, req, + if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk), NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI) < 0) @@ -285,7 +288,8 @@ static int unix_diag_get_exact(struct sk_buff *in_skb, if (!rep) goto out;
- err = sk_diag_fill(sk, rep, req, NETLINK_CB(in_skb).portid, + err = sk_diag_fill(sk, rep, req, sk_user_ns(NETLINK_CB(in_skb).sk), + NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0, req->udiag_ino); if (err < 0) { nlmsg_free(rep);
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v6.3-rc2 commit eee00237fa5ec8f704f7323b54e48cc34e2d9168 category: bugfix bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now, 'es->s_state' maybe covered by recover journal. And journal errno maybe not recorded in journal sb as IO error. ext4_update_super() only update error information when 'sbi->s_add_error_count' large than zero. Then 'EXT4_ERROR_FS' flag maybe lost. To solve above issue just recover 'es->s_state' error flag after journal replay like error info.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.com Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ext4/super.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index da8bd8031119..96792a06fd8d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5539,6 +5539,7 @@ static int ext4_load_journal(struct super_block *sb, err = jbd2_journal_wipe(journal, !really_read_only); if (!err) { char *save = kmalloc(EXT4_S_ERR_LEN, GFP_KERNEL); + if (save) memcpy(save, ((char *) es) + EXT4_S_ERR_START, EXT4_S_ERR_LEN); @@ -5547,6 +5548,14 @@ static int ext4_load_journal(struct super_block *sb, memcpy(((char *) es) + EXT4_S_ERR_START, save, EXT4_S_ERR_LEN); kfree(save); + es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state & + EXT4_ERROR_FS); + /* Write out restored error information to the superblock */ + if (!bdev_read_only(sb->s_bdev)) { + int err2; + err2 = ext4_commit_super(sb); + err = err ? : err2; + } }
if (err) {
From: Ye Bin yebin10@huawei.com
mainline inclusion from mainline-v6.3-rc2 commit f57886ca1606ba74cc4ec4eb5cbf073934ffa559 category: bugfix bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now, jounral error number maybe cleared even though ext4_commit_super() failed. This may lead to error flag miss, then fsck will miss to check file system deeply.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.com Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ext4/super.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 96792a06fd8d..ffac35e66bd9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5778,11 +5778,13 @@ static int ext4_clear_journal_err(struct super_block *sb, errstr = ext4_decode_error(sb, j_errno, nbuf); ext4_warning(sb, "Filesystem error recorded " "from previous mount: %s", errstr); - ext4_warning(sb, "Marking fs in need of filesystem check.");
EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_commit_super(sb); + j_errno = ext4_commit_super(sb); + if (j_errno) + return j_errno; + ext4_warning(sb, "Marked fs in need of filesystem check.");
jbd2_journal_clear_err(journal); jbd2_journal_update_sb_errno(journal);
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.16-rc1 commit f941eadd8d6d4ee2f8c9aeab8e1da5e647533a7d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
---------------------------
__bpf_prog_run() can run from non IRQ contexts, meaning it could be re entered if interrupted.
This calls for the irq safe variant of u64_stats_update_{begin|end}, or risk a deadlock.
This patch is a nop on 64bit arches, fortunately.
syzbot report:
WARNING: inconsistent lock state 5.12.0-rc3-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. udevd/4013 [HC0[0]:SC0[0]:HE1:SE1] takes: ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: sk_filter include/linux/filter.h:867 [inline] ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: do_one_broadcast net/netlink/af_netlink.c:1468 [inline] ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520 {IN-SOFTIRQ-W} state was registered at: lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510 lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483 do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline] do_write_seqcount_begin include/linux/seqlock.h:545 [inline] u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline] bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline] bpf_prog_run_clear_cb+0x1bc/0x270 include/linux/filter.h:755 run_filter+0xa0/0x17c net/packet/af_packet.c:2031 packet_rcv+0xc0/0x3e0 net/packet/af_packet.c:2104 dev_queue_xmit_nit+0x2bc/0x39c net/core/dev.c:2387 xmit_one net/core/dev.c:3588 [inline] dev_hard_start_xmit+0x94/0x518 net/core/dev.c:3609 sch_direct_xmit+0x11c/0x1f0 net/sched/sch_generic.c:313 qdisc_restart net/sched/sch_generic.c:376 [inline] __qdisc_run+0x194/0x7f8 net/sched/sch_generic.c:384 qdisc_run include/net/pkt_sched.h:136 [inline] qdisc_run include/net/pkt_sched.h:128 [inline] __dev_xmit_skb net/core/dev.c:3795 [inline] __dev_queue_xmit+0x65c/0xf84 net/core/dev.c:4150 dev_queue_xmit+0x14/0x18 net/core/dev.c:4215 neigh_resolve_output net/core/neighbour.c:1491 [inline] neigh_resolve_output+0x170/0x228 net/core/neighbour.c:1471 neigh_output include/net/neighbour.h:510 [inline] ip6_finish_output2+0x2e4/0x9fc net/ipv6/ip6_output.c:117 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline] __ip6_finish_output+0x164/0x3f8 net/ipv6/ip6_output.c:161 ip6_finish_output+0x2c/0xb0 net/ipv6/ip6_output.c:192 NF_HOOK_COND include/linux/netfilter.h:290 [inline] ip6_output+0x74/0x294 net/ipv6/ip6_output.c:215 dst_output include/net/dst.h:448 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] mld_sendpack+0x2a8/0x7e4 net/ipv6/mcast.c:1679 mld_send_cr net/ipv6/mcast.c:1975 [inline] mld_ifc_timer_expire+0x1e8/0x494 net/ipv6/mcast.c:2474 call_timer_fn+0xd0/0x570 kernel/time/timer.c:1431 expire_timers kernel/time/timer.c:1476 [inline] __run_timers kernel/time/timer.c:1745 [inline] run_timer_softirq+0x2e4/0x384 kernel/time/timer.c:1758 __do_softirq+0x204/0x7ac kernel/softirq.c:345 do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline] invoke_softirq kernel/softirq.c:228 [inline] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422 irq_exit+0x10/0x3c kernel/softirq.c:446 __handle_domain_irq+0xb4/0x120 kernel/irq/irqdesc.c:692 handle_domain_irq include/linux/irqdesc.h:176 [inline] gic_handle_irq+0x84/0xac drivers/irqchip/irq-gic.c:370 __irq_svc+0x5c/0x94 arch/arm/kernel/entry-armv.S:205 debug_smp_processor_id+0x0/0x24 lib/smp_processor_id.c:53 rcu_read_lock_held_common kernel/rcu/update.c:108 [inline] rcu_read_lock_sched_held+0x24/0x7c kernel/rcu/update.c:123 trace_lock_acquire+0x24c/0x278 include/trace/events/lock.h:13 lock_acquire+0x3c/0x74 kernel/locking/lockdep.c:5481 rcu_lock_acquire include/linux/rcupdate.h:267 [inline] rcu_read_lock include/linux/rcupdate.h:656 [inline] avc_has_perm_noaudit+0x6c/0x260 security/selinux/avc.c:1150 selinux_inode_permission+0x140/0x220 security/selinux/hooks.c:3141 security_inode_permission+0x44/0x60 security/security.c:1268 inode_permission.part.0+0x5c/0x13c fs/namei.c:521 inode_permission fs/namei.c:494 [inline] may_lookup fs/namei.c:1652 [inline] link_path_walk.part.0+0xd4/0x38c fs/namei.c:2208 link_path_walk fs/namei.c:2189 [inline] path_lookupat+0x3c/0x1b8 fs/namei.c:2419 filename_lookup+0xa8/0x1a4 fs/namei.c:2453 user_path_at_empty+0x74/0x90 fs/namei.c:2733 do_readlinkat+0x5c/0x12c fs/stat.c:417 __do_sys_readlink fs/stat.c:450 [inline] sys_readlink+0x24/0x28 fs/stat.c:447 ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64 0x7eaa4974 irq event stamp: 298277 hardirqs last enabled at (298277): [<802000d0>] no_work_pending+0x4/0x34 hardirqs last disabled at (298276): [<8020c9b8>] do_work_pending+0x9c/0x648 arch/arm/kernel/signal.c:676 softirqs last enabled at (298216): [<8020167c>] __do_softirq+0x584/0x7ac kernel/softirq.c:372 softirqs last disabled at (298201): [<8024dff4>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline] softirqs last disabled at (298201): [<8024dff4>] invoke_softirq kernel/softirq.c:228 [inline] softirqs last disabled at (298201): [<8024dff4>] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&(&pstats->syncp)->seq); <Interrupt> lock(&(&pstats->syncp)->seq);
*** DEADLOCK ***
1 lock held by udevd/4013: #0: 82b09c5c (rcu_read_lock){....}-{1:2}, at: sk_filter_trim_cap+0x54/0x434 net/core/filter.c:139
stack backtrace: CPU: 1 PID: 4013 Comm: udevd Not tainted 5.12.0-rc3-syzkaller #0 Hardware name: ARM-Versatile Express Backtrace: [<81802550>] (dump_backtrace) from [<818027c4>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252) r7:00000080 r6:600d0093 r5:00000000 r4:82b58344 [<818027ac>] (show_stack) from [<81809e98>] (__dump_stack lib/dump_stack.c:79 [inline]) [<818027ac>] (show_stack) from [<81809e98>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120) [<81809de0>] (dump_stack) from [<81804a00>] (print_usage_bug.part.0+0x228/0x230 kernel/locking/lockdep.c:3806) r7:86bcb768 r6:81a0326c r5:830f96a8 r4:86bcb0c0 [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (print_usage_bug kernel/locking/lockdep.c:3776 [inline]) [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (valid_state kernel/locking/lockdep.c:3818 [inline]) [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock_irq kernel/locking/lockdep.c:4021 [inline]) [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock.part.0+0xc34/0x136c kernel/locking/lockdep.c:4478) r10:83278fe8 r9:82c6d748 r8:00000000 r7:82c6d2d4 r6:00000004 r5:86bcb768 r4:00000006 [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_lock kernel/locking/lockdep.c:4442 [inline]) [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_usage kernel/locking/lockdep.c:4391 [inline]) [<802ba584>] (mark_lock.part.0) from [<802bc644>] (__lock_acquire+0x9bc/0x3318 kernel/locking/lockdep.c:4854) r10:86bcb768 r9:86bcb0c0 r8:00000001 r7:00040000 r6:0000075a r5:830f96a8 r4:00000000 [<802bbc88>] (__lock_acquire) from [<802bfb90>] (lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510) r10:00000000 r9:600d0013 r8:00000000 r7:00000000 r6:828a2680 r5:828a2680 r4:861e5bc8 [<802bfaa0>] (lock_acquire.part.0) from [<802bff28>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483) r10:8146137c r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:00000000 r4:ff7c9dec [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline]) [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin include/linux/seqlock.h:545 [inline]) [<802bfebc>] (lock_acquire) from [<81381eb4>] (u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline]) [<802bfebc>] (lock_acquire) from [<81381eb4>] (__bpf_prog_run_save_cb include/linux/filter.h:727 [inline]) [<802bfebc>] (lock_acquire) from [<81381eb4>] (bpf_prog_run_save_cb include/linux/filter.h:741 [inline]) [<802bfebc>] (lock_acquire) from [<81381eb4>] (sk_filter_trim_cap+0x26c/0x434 net/core/filter.c:149) r10:a4095dd0 r9:ff7c9dd0 r8:e44be000 r7:8146137c r6:00000001 r5:8611ba80 r4:00000000 [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (sk_filter include/linux/filter.h:867 [inline]) [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (do_one_broadcast net/netlink/af_netlink.c:1468 [inline]) [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520) r10:00000001 r9:833d6b1c r8:00000000 r7:8572f864 r6:8611ba80 r5:8698d800 r4:8572f800 [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_broadcast net/netlink/af_netlink.c:1544 [inline]) [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_sendmsg+0x3d0/0x478 net/netlink/af_netlink.c:1925) r10:00000000 r9:00000002 r8:8698d800 r7:000000b7 r6:8611b900 r5:861e5f50 r4:86aa3000 [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg_nosec net/socket.c:654 [inline]) [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg+0x3c/0x4c net/socket.c:674) r10:00000000 r9:861e5dd4 r8:00000000 r7:86570000 r6:00000000 r5:86570000 r4:861e5f50 [<81321f18>] (sock_sendmsg) from [<813234d0>] (____sys_sendmsg+0x230/0x29c net/socket.c:2350) r5:00000040 r4:861e5f50 [<813232a0>] (____sys_sendmsg) from [<8132549c>] (___sys_sendmsg+0xac/0xe4 net/socket.c:2404) r10:00000128 r9:861e4000 r8:00000000 r7:00000000 r6:86570000 r5:861e5f50 r4:00000000 [<813253f0>] (___sys_sendmsg) from [<81325684>] (__sys_sendmsg net/socket.c:2433 [inline]) [<813253f0>] (___sys_sendmsg) from [<81325684>] (__do_sys_sendmsg net/socket.c:2442 [inline]) [<813253f0>] (___sys_sendmsg) from [<81325684>] (sys_sendmsg+0x58/0xa0 net/socket.c:2440) r8:80200224 r7:00000128 r6:00000000 r5:7eaa541c r4:86570000 [<8132562c>] (sys_sendmsg) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64) Exception stack(0x861e5fa8 to 0x861e5ff0) 5fa0: 00000000 00000000 0000000c 7eaa541c 00000000 00000000 5fc0: 00000000 00000000 76fbf840 00000128 00000000 0000008f 7eaa541c 000563f8 5fe0: 00056110 7eaa53e0 00036cec 76c9bf44 r6:76fbf840 r5:00000000 r4:00000000
Fixes: 492ecee892c2 ("bpf: enable program stats") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20211026214133.3114279-2-eric.dumazet@gmail.com Conflicts: include/linux/filter.h Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/filter.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h index 45ba2a61f9e5..0418d88aa0f3 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -578,12 +578,13 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ struct bpf_prog_stats *__stats; \ u64 __start = sched_clock(); \ + unsigned long flags; \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ __stats = this_cpu_ptr(prog->aux->stats); \ - u64_stats_update_begin(&__stats->syncp); \ + flags = u64_stats_update_begin_irqsave(&__stats->syncp);\ __stats->cnt++; \ __stats->nsecs += sched_clock() - __start; \ - u64_stats_update_end(&__stats->syncp); \ + u64_stats_update_end_irqrestore(&__stats->syncp, flags);\ } else { \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ } \
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.16-rc1 commit d979617aa84d96acca44c2f5778892b4565e322f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6O293
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
It seems update_prog_stats() suffers from same issue fixed in the prior patch:
As it can run while interrupts are enabled, it could be re-entered and the u64_stats syncp could be mangled.
Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com Conflicts: kernel/bpf/trampoline.c Signed-off-by: Pu Lehui pulehui@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/bpf/trampoline.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 87becf77cc75..02637c6cedde 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -526,11 +526,13 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start) * Hence check that 'start' is not zero. */ start) { + unsigned long flags; + stats = this_cpu_ptr(prog->aux->stats); - u64_stats_update_begin(&stats->syncp); + flags = u64_stats_update_begin_irqsave(&stats->syncp); stats->cnt++; stats->nsecs += sched_clock() - start; - u64_stats_update_end(&stats->syncp); + u64_stats_update_end_irqrestore(&stats->syncp, flags); } migrate_enable(); rcu_read_unlock();
From: Edward Lo edward.lo@ambergroup.io
maillist inclusion category: bugfix bugzilla: 188564, https://gitee.com/src-openeuler/kernel/issues/I6OD4P CVE: CVE-2022-48424
Reference: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4...
----------------------------------------
Although the attribute name length is checked before comparing it to some common names (e.g., $I30), the offset isn't. This adds a sanity check for the attribute name offset, guarantee the validity and prevent possible out-of-bound memory accesses.
[ 191.720056] BUG: unable to handle page fault for address: ffffebde00000008 [ 191.721060] #PF: supervisor read access in kernel mode [ 191.721586] #PF: error_code(0x0000) - not-present page [ 191.722079] PGD 0 P4D 0 [ 191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28 [ 191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 191.724832] RIP: 0010:kfree+0x56/0x3b0 [ 191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.730645] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.731328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 [ 191.732568] Call Trace: [ 191.733231] <TASK> [ 191.733860] kvfree+0x2c/0x40 [ 191.734632] ni_clear+0x180/0x290 [ 191.735085] ntfs_evict_inode+0x45/0x70 [ 191.735495] evict+0x199/0x280 [ 191.735996] iput.part.0+0x286/0x320 [ 191.736438] iput+0x32/0x50 [ 191.736811] iget_failed+0x23/0x30 [ 191.737270] ntfs_iget5+0x337/0x1890 [ 191.737629] ? ntfs_clear_mft_tail+0x20/0x260 [ 191.738201] ? ntfs_get_block_bmap+0x70/0x70 [ 191.738482] ? ntfs_objid_init+0xf6/0x140 [ 191.738779] ? ntfs_reparse_init+0x140/0x140 [ 191.739266] ntfs_fill_super+0x121b/0x1b50 [ 191.739623] ? put_ntfs+0x1d0/0x1d0 [ 191.739984] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 191.740466] ? put_ntfs+0x1d0/0x1d0 [ 191.740787] ? sb_set_blocksize+0x6a/0x80 [ 191.741272] get_tree_bdev+0x232/0x370 [ 191.741829] ? put_ntfs+0x1d0/0x1d0 [ 191.742669] ntfs_fs_get_tree+0x15/0x20 [ 191.743132] vfs_get_tree+0x4c/0x130 [ 191.743457] path_mount+0x654/0xfe0 [ 191.743938] ? putname+0x80/0xa0 [ 191.744271] ? finish_automount+0x2e0/0x2e0 [ 191.744582] ? putname+0x80/0xa0 [ 191.745053] ? kmem_cache_free+0x1c4/0x440 [ 191.745403] ? putname+0x80/0xa0 [ 191.745616] do_mount+0xd6/0xf0 [ 191.745887] ? path_mount+0xfe0/0xfe0 [ 191.746287] ? __kasan_check_write+0x14/0x20 [ 191.746582] __x64_sys_mount+0xca/0x110 [ 191.746850] do_syscall_64+0x3b/0x90 [ 191.747122] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 191.747517] RIP: 0033:0x7f351fee948a [ 191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a [ 191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0 [ 191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020 [ 191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0 [ 191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff [ 191.752519] </TASK> [ 191.752782] Modules linked in: [ 191.753785] CR2: ffffebde00000008 [ 191.754937] ---[ end trace 0000000000000000 ]--- [ 191.755429] RIP: 0010:kfree+0x56/0x3b0 [ 191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.759317] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.759711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/inode.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index eecf78e5d754..5a62bd642217 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -129,6 +129,9 @@ static struct inode *ntfs_read_mft(struct inode *inode, rsize = attr->non_res ? 0 : le32_to_cpu(attr->res.data_size); asize = le32_to_cpu(attr->size);
+ if (le16_to_cpu(attr->name_off) + attr->name_len > asize) + goto out; + switch (attr->type) { case ATTR_STD: if (attr->non_res ||
From: Edward Lo edward.lo@ambergroup.io
maillist inclusion category: bugfix bugzilla: 188564, https://gitee.com/src-openeuler/kernel/issues/I6OD5R CVE: CVE-2022-48425
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/f...
----------------------------------------
Log load and replay is part of the metadata handle flow during mount operation. The $MFT record will be loaded and used while replaying logs. However, a malformed $MFT record, say, has RECORD_FLAG_DIR flag set and contains an ATTR_ROOT attribute will misguide kernel to treat it as a directory, and try to free the allocated resources when the corresponding inode is freed, which will cause an invalid kfree because the memory hasn't actually been allocated.
[ 101.368647] BUG: KASAN: invalid-free in kvfree+0x2c/0x40 [ 101.369457] [ 101.369986] CPU: 0 PID: 198 Comm: mount Not tainted 6.0.0-rc7+ #5 [ 101.370529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 101.371362] Call Trace: [ 101.371795] <TASK> [ 101.372157] dump_stack_lvl+0x49/0x63 [ 101.372658] print_report.cold+0xf5/0x689 [ 101.373022] ? ni_write_inode+0x754/0xd90 [ 101.373378] ? kvfree+0x2c/0x40 [ 101.373698] kasan_report_invalid_free+0x77/0xf0 [ 101.374058] ? kvfree+0x2c/0x40 [ 101.374352] ? kvfree+0x2c/0x40 [ 101.374668] __kasan_slab_free+0x189/0x1b0 [ 101.374992] ? kvfree+0x2c/0x40 [ 101.375271] kfree+0x168/0x3b0 [ 101.375717] kvfree+0x2c/0x40 [ 101.376002] indx_clear+0x26/0x60 [ 101.376316] ni_clear+0xc5/0x290 [ 101.376661] ntfs_evict_inode+0x45/0x70 [ 101.377001] evict+0x199/0x280 [ 101.377432] iput.part.0+0x286/0x320 [ 101.377819] iput+0x32/0x50 [ 101.378166] ntfs_loadlog_and_replay+0x143/0x320 [ 101.378656] ? ntfs_bio_fill_1+0x510/0x510 [ 101.378968] ? iput.part.0+0x286/0x320 [ 101.379367] ntfs_fill_super+0xecb/0x1ba0 [ 101.379729] ? put_ntfs+0x1d0/0x1d0 [ 101.380046] ? vsprintf+0x20/0x20 [ 101.380542] ? mutex_unlock+0x81/0xd0 [ 101.380914] ? set_blocksize+0x95/0x150 [ 101.381597] get_tree_bdev+0x232/0x370 [ 101.382254] ? put_ntfs+0x1d0/0x1d0 [ 101.382699] ntfs_fs_get_tree+0x15/0x20 [ 101.383094] vfs_get_tree+0x4c/0x130 [ 101.383675] path_mount+0x654/0xfe0 [ 101.384203] ? putname+0x80/0xa0 [ 101.384540] ? finish_automount+0x2e0/0x2e0 [ 101.384943] ? putname+0x80/0xa0 [ 101.385362] ? kmem_cache_free+0x1c4/0x440 [ 101.385968] ? putname+0x80/0xa0 [ 101.386666] do_mount+0xd6/0xf0 [ 101.387228] ? path_mount+0xfe0/0xfe0 [ 101.387585] ? __kasan_check_write+0x14/0x20 [ 101.387979] __x64_sys_mount+0xca/0x110 [ 101.388436] do_syscall_64+0x3b/0x90 [ 101.388757] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 101.389289] RIP: 0033:0x7fa0f70e948a [ 101.390048] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 101.391297] RSP: 002b:00007ffc24fdecc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 101.391988] RAX: ffffffffffffffda RBX: 000055932c183060 RCX: 00007fa0f70e948a [ 101.392494] RDX: 000055932c183260 RSI: 000055932c1832e0 RDI: 000055932c18bce0 [ 101.393053] RBP: 0000000000000000 R08: 000055932c183280 R09: 0000000000000020 [ 101.393577] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055932c18bce0 [ 101.394044] R13: 000055932c183260 R14: 0000000000000000 R15: 00000000ffffffff [ 101.394747] </TASK> [ 101.395402] [ 101.396047] Allocated by task 198: [ 101.396724] kasan_save_stack+0x26/0x50 [ 101.397400] __kasan_slab_alloc+0x6d/0x90 [ 101.397974] kmem_cache_alloc_lru+0x192/0x5a0 [ 101.398524] ntfs_alloc_inode+0x23/0x70 [ 101.399137] alloc_inode+0x3b/0xf0 [ 101.399534] iget5_locked+0x54/0xa0 [ 101.400026] ntfs_iget5+0xaf/0x1780 [ 101.400414] ntfs_loadlog_and_replay+0xe5/0x320 [ 101.400883] ntfs_fill_super+0xecb/0x1ba0 [ 101.401313] get_tree_bdev+0x232/0x370 [ 101.401774] ntfs_fs_get_tree+0x15/0x20 [ 101.402224] vfs_get_tree+0x4c/0x130 [ 101.402673] path_mount+0x654/0xfe0 [ 101.403160] do_mount+0xd6/0xf0 [ 101.403537] __x64_sys_mount+0xca/0x110 [ 101.404058] do_syscall_64+0x3b/0x90 [ 101.404333] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 101.404816] [ 101.405067] The buggy address belongs to the object at ffff888008cc9ea0 [ 101.405067] which belongs to the cache ntfs_inode_cache of size 992 [ 101.406171] The buggy address is located 232 bytes inside of [ 101.406171] 992-byte region [ffff888008cc9ea0, ffff888008cca280) [ 101.406995] [ 101.408559] The buggy address belongs to the physical page: [ 101.409320] page:00000000dccf19dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cc8 [ 101.410654] head:00000000dccf19dd order:2 compound_mapcount:0 compound_pincount:0 [ 101.411533] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 101.412665] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888003695140 [ 101.413209] raw: 0000000000000000 00000000800e000e 00000001ffffffff 0000000000000000 [ 101.413799] page dumped because: kasan: bad access detected [ 101.414213] [ 101.414427] Memory state around the buggy address: [ 101.414991] ffff888008cc9e80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.415785] ffff888008cc9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.416933] >ffff888008cc9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.417857] ^ [ 101.418566] ffff888008cca000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 101.419704] ffff888008cca080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/inode.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 5a62bd642217..a4ee2954346f 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -98,6 +98,12 @@ static struct inode *ntfs_read_mft(struct inode *inode, /* Record should contain $I30 root. */ is_dir = rec->flags & RECORD_FLAG_DIR;
+ /* MFT_REC_MFT is not a dir */ + if (is_dir && ino == MFT_REC_MFT) { + err = -EINVAL; + goto out; + } + inode->i_generation = le16_to_cpu(rec->seq);
/* Enumerate all struct Attributes MFT. */
From: ZhangPeng zhangpeng362@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PKGM
Reference: https://lore.kernel.org/linux-mm/20220824071909.192535-1-wangkefeng.wang@hua...
--------------------------------
wapd_run/stop() will set pgdat->kswapd to NULL, which could race with kswapd_is_running() in kcompactd(),
kswapd_run/stop() kcompactd() kswapd_is_running() if (pgdat->kswapd) // load non-NULL pgdat->kswapd pgdat->kswapd = NULL task_is_running(pgdat->kswapd) // Null pointer derefence
The KASAN report the null-ptr-deref shown below,
vmscan: Failed to start kswapd on node 0 ... BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504 Read of size 8 at addr 0000000000000024 by task kcompactd0/37
CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x394 show_stack+0x34/0x4c dump_stack+0x158/0x1e4 __kasan_report+0x138/0x140 kasan_report+0x44/0xdc __asan_load8+0x94/0xd0 kcompactd+0x440/0x504 kthread+0x1a4/0x1f0 ret_from_fork+0x10/0x18
For race between kswapd_run() and kcompactd(), adding a temporary value when create a kthread, and only set it to pgdat->kswapd if kthread_run() return successful task_struct to fix the issue.
For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop() before kswapd_stop() to fix the issue.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com
Conflicts: mm/vmscan.c
Signed-off-by: ZhangPeng zhangpeng362@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/memory_hotplug.c | 2 +- mm/vmscan.c | 10 ++++++---- 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7456d825414d..d2dd2bfcaac3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1483,8 +1483,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
node_states_clear_node(node, &arg); if (arg.status_change_nid >= 0) { - kswapd_stop(node); kcompactd_stop(node); + kswapd_stop(node); }
writeback_set_ratelimit(); diff --git a/mm/vmscan.c b/mm/vmscan.c index 1b333a4d247a..ace42e822aab 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4329,17 +4329,19 @@ int kswapd_run(int nid) { pg_data_t *pgdat = NODE_DATA(nid); int ret = 0; + struct task_struct *t;
if (pgdat->kswapd) return 0;
- pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid); - if (IS_ERR(pgdat->kswapd)) { + t = kthread_run(kswapd, pgdat, "kswapd%d", nid); + if (IS_ERR(t)) { /* failure at boot is fatal */ BUG_ON(system_state < SYSTEM_RUNNING); pr_err("Failed to start kswapd on node %d\n", nid); - ret = PTR_ERR(pgdat->kswapd); - pgdat->kswapd = NULL; + ret = PTR_ERR(t); + } else { + pgdat->kswapd = t; } return ret; }
From: ZhangPeng zhangpeng362@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PKGM
Reference: https://lore.kernel.org/linux-mm/20220824071909.192535-1-wangkefeng.wang@hua...
--------------------------------
The pgdat->kswapd could be accessed concurrently by kswapd_run() and kcompactd(), it don't be protected by any lock, which could leads to data races, adding READ/WRITE_ONCE() to slince it.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com
Conflicts: mm/compaction.c mm/vmscan.c
Signed-off-by: ZhangPeng zhangpeng362@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/compaction.c | 4 +++- mm/vmscan.c | 8 ++++---- 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 80116210c2e6..509dee7c2f38 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1926,7 +1926,9 @@ static inline bool is_via_compact_memory(int order)
static bool kswapd_is_running(pg_data_t *pgdat) { - return pgdat->kswapd && (pgdat->kswapd->state == TASK_RUNNING); + struct task_struct *t = READ_ONCE(pgdat->kswapd); + + return t && (t->state == TASK_RUNNING); }
/* diff --git a/mm/vmscan.c b/mm/vmscan.c index ace42e822aab..a177de05a1a4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4331,7 +4331,7 @@ int kswapd_run(int nid) int ret = 0; struct task_struct *t;
- if (pgdat->kswapd) + if (READ_ONCE(pgdat->kswapd)) return 0;
t = kthread_run(kswapd, pgdat, "kswapd%d", nid); @@ -4341,7 +4341,7 @@ int kswapd_run(int nid) pr_err("Failed to start kswapd on node %d\n", nid); ret = PTR_ERR(t); } else { - pgdat->kswapd = t; + WRITE_ONCE(pgdat->kswapd, t); } return ret; } @@ -4352,11 +4352,11 @@ int kswapd_run(int nid) */ void kswapd_stop(int nid) { - struct task_struct *kswapd = NODE_DATA(nid)->kswapd; + struct task_struct *kswapd = READ_ONCE(NODE_DATA(nid)->kswapd);
if (kswapd) { kthread_stop(kswapd); - NODE_DATA(nid)->kswapd = NULL; + WRITE_ONCE(NODE_DATA(nid)->kswapd, NULL); } }
From: Longlong Xia xialonglong1@huawei.com
mainline inclusion from mainline-v6.2-rc7 commit 7717fc1a12f88701573f9ed897cc4f6699c661e3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently.
The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup.
Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com Signed-off-by: Longlong Xia xialonglong1@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Chen Wandun chenwandun@huawei.com Cc: Huang Ying ying.huang@intel.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Nanyong Sun sunnanyong@huawei.com Cc: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/swapfile.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c index eaf483c7c83e..9fe12f0b9205 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1104,6 +1104,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) goto check_out; pr_debug("scan_swap_map of si %d failed to find offset\n", si->type); + cond_resched();
spin_lock(&swap_avail_lock); nextsi:
From: "Eric W. Biederman" ebiederm@xmission.com
stable inclusion from stable-v5.10.110 commit 936c8be4d1447f36ac4d2a464bd03a5cd659c42f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C CVE: CVE-2023-1249
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 95c5436a4883841588dae86fb0b9325f47ba5ad3 upstream.
Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the individual coredump routines into do_coredump itself. This makes the code less error prone and easier to maintain.
Make the vma snapshot available to the coredump routines in struct coredump_params. This makes it easier to change and update what is captures in the vma snapshot and will be needed for fixing fill_file_notes.
Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/binfmt_elf.c | 20 +++++++------------- fs/binfmt_elf_fdpic.c | 18 ++++++------------ fs/coredump.c | 41 ++++++++++++++++++++++------------------ include/linux/binfmts.h | 3 +++ include/linux/coredump.h | 3 --- 5 files changed, 39 insertions(+), 46 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index ed507d27034b..547b3caef72b 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2166,8 +2166,7 @@ static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum, static int elf_core_dump(struct coredump_params *cprm) { int has_dumped = 0; - int vma_count, segs, i; - size_t vma_data_size; + int segs, i; struct elfhdr elf; loff_t offset = 0, dataoff; struct elf_note_info info = { }; @@ -2175,16 +2174,12 @@ static int elf_core_dump(struct coredump_params *cprm) struct elf_shdr *shdr4extnum = NULL; Elf_Half e_phnum; elf_addr_t e_shoff; - struct core_vma_metadata *vma_meta; - - if (dump_vma_snapshot(cprm, &vma_count, &vma_meta, &vma_data_size)) - return 0;
/* * The number of segs are recored into ELF header as 16bit value. * Please check DEFAULT_MAX_MAP_COUNT definition when you modify here. */ - segs = vma_count + elf_core_extra_phdrs(); + segs = cprm->vma_count + elf_core_extra_phdrs();
/* for notes section */ segs++; @@ -2222,7 +2217,7 @@ static int elf_core_dump(struct coredump_params *cprm)
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
- offset += vma_data_size; + offset += cprm->vma_data_size; offset += elf_core_extra_data_size(); e_shoff = offset;
@@ -2242,8 +2237,8 @@ static int elf_core_dump(struct coredump_params *cprm) goto end_coredump;
/* Write program headers for segments dump */ - for (i = 0; i < vma_count; i++) { - struct core_vma_metadata *meta = vma_meta + i; + for (i = 0; i < cprm->vma_count; i++) { + struct core_vma_metadata *meta = cprm->vma_meta + i; struct elf_phdr phdr;
phdr.p_type = PT_LOAD; @@ -2280,8 +2275,8 @@ static int elf_core_dump(struct coredump_params *cprm) if (!dump_skip(cprm, dataoff - cprm->pos)) goto end_coredump;
- for (i = 0; i < vma_count; i++) { - struct core_vma_metadata *meta = vma_meta + i; + for (i = 0; i < cprm->vma_count; i++) { + struct core_vma_metadata *meta = cprm->vma_meta + i;
if (!dump_user_range(cprm, meta->start, meta->dump_size)) goto end_coredump; @@ -2299,7 +2294,6 @@ static int elf_core_dump(struct coredump_params *cprm) end_coredump: free_note_info(&info); kfree(shdr4extnum); - kvfree(vma_meta); kfree(phdr4note); return has_dumped; } diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index be4062b8ba75..5764295a3f0f 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1479,7 +1479,7 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm, static int elf_fdpic_core_dump(struct coredump_params *cprm) { int has_dumped = 0; - int vma_count, segs; + int segs; int i; struct elfhdr *elf = NULL; loff_t offset = 0, dataoff; @@ -1494,8 +1494,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) elf_addr_t e_shoff; struct core_thread *ct; struct elf_thread_status *tmp; - struct core_vma_metadata *vma_meta = NULL; - size_t vma_data_size;
/* alloc memory for large data structures: too large to be on stack */ elf = kmalloc(sizeof(*elf), GFP_KERNEL); @@ -1505,9 +1503,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) if (!psinfo) goto end_coredump;
- if (dump_vma_snapshot(cprm, &vma_count, &vma_meta, &vma_data_size)) - goto end_coredump; - for (ct = current->mm->core_state->dumper.next; ct; ct = ct->next) { tmp = elf_dump_thread_status(cprm->siginfo->si_signo, @@ -1527,7 +1522,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) tmp->next = thread_list; thread_list = tmp;
- segs = vma_count + elf_core_extra_phdrs(); + segs = cprm->vma_count + elf_core_extra_phdrs();
/* for notes section */ segs++; @@ -1572,7 +1567,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) /* Page-align dumped data */ dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
- offset += vma_data_size; + offset += cprm->vma_data_size; offset += elf_core_extra_data_size(); e_shoff = offset;
@@ -1592,8 +1587,8 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) goto end_coredump;
/* write program headers for segments dump */ - for (i = 0; i < vma_count; i++) { - struct core_vma_metadata *meta = vma_meta + i; + for (i = 0; i < cprm->vma_count; i++) { + struct core_vma_metadata *meta = cprm->vma_meta + i; struct elf_phdr phdr; size_t sz;
@@ -1643,7 +1638,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) if (!dump_skip(cprm, dataoff - cprm->pos)) goto end_coredump;
- if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count)) + if (!elf_fdpic_dump_segments(cprm, cprm->vma_meta, cprm->vma_count)) goto end_coredump;
if (!elf_core_write_extra_data(cprm)) @@ -1667,7 +1662,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) thread_list = thread_list->next; kfree(tmp); } - kvfree(vma_meta); kfree(phdr4note); kfree(elf); kfree(psinfo); diff --git a/fs/coredump.c b/fs/coredump.c index a85987fe868d..2dc3dd4b59a7 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -53,6 +53,8 @@
#include <trace/events/sched.h>
+static bool dump_vma_snapshot(struct coredump_params *cprm); + int core_uses_pid; unsigned int core_pipe_limit; char core_pattern[CORENAME_MAX_SIZE] = "core"; @@ -602,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) * by any locks. */ .mm_flags = mm->flags, + .vma_meta = NULL, };
audit_core_dumps(siginfo->si_signo); @@ -807,9 +810,13 @@ void do_coredump(const kernel_siginfo_t *siginfo) pr_info("Core dump to |%s disabled\n", cn.corename); goto close_fail; } + if (!dump_vma_snapshot(&cprm)) + goto close_fail; + file_start_write(cprm.file); core_dumped = binfmt->core_dump(&cprm); file_end_write(cprm.file); + kvfree(cprm.vma_meta); } if (ispipe && core_pipe_limit) wait_for_dump_helpers(cprm.file); @@ -1085,14 +1092,11 @@ static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma, * Under the mmap_lock, take a snapshot of relevant information about the task's * VMAs. */ -int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count, - struct core_vma_metadata **vma_meta, - size_t *vma_data_size_ptr) +static bool dump_vma_snapshot(struct coredump_params *cprm) { struct vm_area_struct *vma, *gate_vma; struct mm_struct *mm = current->mm; int i; - size_t vma_data_size = 0;
/* * Once the stack expansion code is fixed to not change VMA bounds @@ -1100,20 +1104,21 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count, * mmap_lock in read mode. */ if (mmap_write_lock_killable(mm)) - return -EINTR; + return false;
+ cprm->vma_data_size = 0; gate_vma = get_gate_vma(mm); - *vma_count = mm->map_count + (gate_vma ? 1 : 0); + cprm->vma_count = mm->map_count + (gate_vma ? 1 : 0);
- *vma_meta = kvmalloc_array(*vma_count, sizeof(**vma_meta), GFP_KERNEL); - if (!*vma_meta) { + cprm->vma_meta = kvmalloc_array(cprm->vma_count, sizeof(*cprm->vma_meta), GFP_KERNEL); + if (!cprm->vma_meta) { mmap_write_unlock(mm); - return -ENOMEM; + return false; }
for (i = 0, vma = first_vma(current, gate_vma); vma != NULL; vma = next_vma(vma, gate_vma), i++) { - struct core_vma_metadata *m = (*vma_meta) + i; + struct core_vma_metadata *m = cprm->vma_meta + i;
m->start = vma->vm_start; m->end = vma->vm_end; @@ -1123,13 +1128,14 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
mmap_write_unlock(mm);
- if (WARN_ON(i != *vma_count)) { - kvfree(*vma_meta); - return -EFAULT; + if (WARN_ON(i != cprm->vma_count)) { + kvfree(cprm->vma_meta); + return false; }
- for (i = 0; i < *vma_count; i++) { - struct core_vma_metadata *m = (*vma_meta) + i; + + for (i = 0; i < cprm->vma_count; i++) { + struct core_vma_metadata *m = cprm->vma_meta + i;
if (m->dump_size == DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER) { char elfmag[SELFMAG]; @@ -1142,9 +1148,8 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count, } }
- vma_data_size += m->dump_size; + cprm->vma_data_size += m->dump_size; }
- *vma_data_size_ptr = vma_data_size; - return 0; + return true; } diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..5a9786e6b554 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -82,6 +82,9 @@ struct coredump_params { unsigned long mm_flags; loff_t written; loff_t pos; + int vma_count; + size_t vma_data_size; + struct core_vma_metadata *vma_meta; };
/* diff --git a/include/linux/coredump.h b/include/linux/coredump.h index e58e8c207782..9c1fd6241307 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -24,9 +24,6 @@ extern int dump_align(struct coredump_params *cprm, int align); extern void dump_truncate(struct coredump_params *cprm); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); -int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count, - struct core_vma_metadata **vma_meta, - size_t *vma_data_size_ptr); #ifdef CONFIG_COREDUMP extern void do_coredump(const kernel_siginfo_t *siginfo); #else
From: "Eric W. Biederman" ebiederm@xmission.com
stable inclusion from stable-v5.10.110 commit b043ae637a83585b2a497c2eb7ee49446fc68e98 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C CVE: CVE-2023-1249
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 49c1866348f364478a0c4d3dd13fd08bb82d3a5b upstream.
The condition is impossible and to the best of my knowledge has never triggered.
We are in deep trouble if that conditions happens and we walk past the end of our allocated array.
So delete the WARN_ON and the code that makes it look like the kernel can handle the case of walking past the end of it's vma_meta array.
Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/coredump.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c index 2dc3dd4b59a7..3d9509b65f0e 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1128,12 +1128,6 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
mmap_write_unlock(mm);
- if (WARN_ON(i != cprm->vma_count)) { - kvfree(cprm->vma_meta); - return false; - } - - for (i = 0; i < cprm->vma_count; i++) { struct core_vma_metadata *m = cprm->vma_meta + i;
From: "Eric W. Biederman" ebiederm@xmission.com
stable inclusion from stable-v5.10.110 commit b7933f145ad32bb5e084af55176ab6dcaa15a036 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C CVE: CVE-2023-1249
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 9ec7d3230717b4fe9b6c7afeb4811909c23fa1d7 upstream.
Instead of individually passing cprm->siginfo and cprm->regs into fill_note_info pass all of struct coredump_params.
This is preparation to allow fill_files_note to use the existing vma snapshot.
Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/binfmt_elf.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 547b3caef72b..ab41b6b147ae 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1797,7 +1797,7 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
static int fill_note_info(struct elfhdr *elf, int phdrs, struct elf_note_info *info, - const kernel_siginfo_t *siginfo, struct pt_regs *regs) + struct coredump_params *cprm) { struct task_struct *dump_task = current; const struct user_regset_view *view = task_user_regset_view(dump_task); @@ -1869,7 +1869,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, * Now fill in each thread's information. */ for (t = info->thread; t != NULL; t = t->next) - if (!fill_thread_core_info(t, view, siginfo->si_signo, &info->size)) + if (!fill_thread_core_info(t, view, cprm->siginfo->si_signo, &info->size)) return 0;
/* @@ -1878,7 +1878,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_psinfo(psinfo, dump_task->group_leader, dump_task->mm); info->size += notesize(&info->psinfo);
- fill_siginfo_note(&info->signote, &info->csigdata, siginfo); + fill_siginfo_note(&info->signote, &info->csigdata, cprm->siginfo); info->size += notesize(&info->signote);
fill_auxv_note(&info->auxv, current->mm); @@ -2026,7 +2026,7 @@ static int elf_note_info_init(struct elf_note_info *info)
static int fill_note_info(struct elfhdr *elf, int phdrs, struct elf_note_info *info, - const kernel_siginfo_t *siginfo, struct pt_regs *regs) + struct coredump_params *cprm) { struct core_thread *ct; struct elf_thread_status *ets; @@ -2047,13 +2047,13 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, list_for_each_entry(ets, &info->thread_list, list) { int sz;
- sz = elf_dump_thread_status(siginfo->si_signo, ets); + sz = elf_dump_thread_status(cprm->siginfo->si_signo, ets); info->thread_status_size += sz; } /* now collect the dump for the current */ memset(info->prstatus, 0, sizeof(*info->prstatus)); - fill_prstatus(info->prstatus, current, siginfo->si_signo); - elf_core_copy_regs(&info->prstatus->pr_reg, regs); + fill_prstatus(info->prstatus, current, cprm->siginfo->si_signo); + elf_core_copy_regs(&info->prstatus->pr_reg, cprm->regs);
/* Set up header */ fill_elf_header(elf, phdrs, ELF_ARCH, ELF_CORE_EFLAGS); @@ -2069,7 +2069,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_note(info->notes + 1, "CORE", NT_PRPSINFO, sizeof(*info->psinfo), info->psinfo);
- fill_siginfo_note(info->notes + 2, &info->csigdata, siginfo); + fill_siginfo_note(info->notes + 2, &info->csigdata, cprm->siginfo); fill_auxv_note(info->notes + 3, current->mm); info->numnote = 4;
@@ -2079,8 +2079,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, }
/* Try to dump the FPU. */ - info->prstatus->pr_fpvalid = elf_core_copy_task_fpregs(current, regs, - info->fpu); + info->prstatus->pr_fpvalid = + elf_core_copy_task_fpregs(current, cprm->regs, info->fpu); if (info->prstatus->pr_fpvalid) fill_note(info->notes + info->numnote++, "CORE", NT_PRFPREG, sizeof(*info->fpu), info->fpu); @@ -2193,7 +2193,7 @@ static int elf_core_dump(struct coredump_params *cprm) * Collect all the non-memory information about the process for the * notes. This also sets up the file header. */ - if (!fill_note_info(&elf, e_phnum, &info, cprm->siginfo, cprm->regs)) + if (!fill_note_info(&elf, e_phnum, &info, cprm)) goto end_coredump;
has_dumped = 1;
From: "Eric W. Biederman" ebiederm@xmission.com
stable inclusion from stable-v5.10.110 commit 558564db44755dfb3e48b0d64de327d20981e950 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C CVE: CVE-2023-1249
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 390031c942116d4733310f0684beb8db19885fe6 upstream.
Matthew Wilcox reported that there is a missing mmap_lock in file_files_note that could possibly lead to a user after free.
Solve this by using the existing vma snapshot for consistency and to avoid the need to take the mmap_lock anywhere in the coredump code except for dump_vma_snapshot.
Update the dump_vma_snapshot to capture vm_pgoff and vm_file that are neeeded by fill_files_note.
Add free_vma_snapshot to free the captured values of vm_file.
Reported-by: Matthew Wilcox willy@infradead.org Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org Cc: stable@vger.kernel.org Fixes: a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files") Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/binfmt_elf.c | 24 ++++++++++++------------ fs/coredump.c | 22 +++++++++++++++++++++- include/linux/coredump.h | 2 ++ 3 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index ab41b6b147ae..213864bc7e8c 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1613,17 +1613,16 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata, * long file_ofs * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL... */ -static int fill_files_note(struct memelfnote *note) +static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm) { - struct mm_struct *mm = current->mm; - struct vm_area_struct *vma; unsigned count, size, names_ofs, remaining, n; user_long_t *data; user_long_t *start_end_ofs; char *name_base, *name_curpos; + int i;
/* *Estimated* file count and total data size needed */ - count = mm->map_count; + count = cprm->vma_count; if (count > UINT_MAX / 64) return -EINVAL; size = count * 64; @@ -1645,11 +1644,12 @@ static int fill_files_note(struct memelfnote *note) name_base = name_curpos = ((char *)data) + names_ofs; remaining = size - names_ofs; count = 0; - for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) { + for (i = 0; i < cprm->vma_count; i++) { + struct core_vma_metadata *m = &cprm->vma_meta[i]; struct file *file; const char *filename;
- file = vma->vm_file; + file = m->file; if (!file) continue; filename = file_path(file, name_curpos, remaining); @@ -1669,9 +1669,9 @@ static int fill_files_note(struct memelfnote *note) memmove(name_curpos, filename, n); name_curpos += n;
- *start_end_ofs++ = vma->vm_start; - *start_end_ofs++ = vma->vm_end; - *start_end_ofs++ = vma->vm_pgoff; + *start_end_ofs++ = m->start; + *start_end_ofs++ = m->end; + *start_end_ofs++ = m->pgoff; count++; }
@@ -1682,7 +1682,7 @@ static int fill_files_note(struct memelfnote *note) * Count usually is less than mm->map_count, * we need to move filenames down. */ - n = mm->map_count - count; + n = cprm->vma_count - count; if (n != 0) { unsigned shift_bytes = n * 3 * sizeof(data[0]); memmove(name_base - shift_bytes, name_base, @@ -1884,7 +1884,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_auxv_note(&info->auxv, current->mm); info->size += notesize(&info->auxv);
- if (fill_files_note(&info->files) == 0) + if (fill_files_note(&info->files, cprm) == 0) info->size += notesize(&info->files);
return 1; @@ -2073,7 +2073,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_auxv_note(info->notes + 3, current->mm); info->numnote = 4;
- if (fill_files_note(info->notes + info->numnote) == 0) { + if (fill_files_note(info->notes + info->numnote, cprm) == 0) { info->notes_files = info->notes + info->numnote; info->numnote++; } diff --git a/fs/coredump.c b/fs/coredump.c index 3d9509b65f0e..21cda234685f 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -54,6 +54,7 @@ #include <trace/events/sched.h>
static bool dump_vma_snapshot(struct coredump_params *cprm); +static void free_vma_snapshot(struct coredump_params *cprm);
int core_uses_pid; unsigned int core_pipe_limit; @@ -816,7 +817,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) file_start_write(cprm.file); core_dumped = binfmt->core_dump(&cprm); file_end_write(cprm.file); - kvfree(cprm.vma_meta); + free_vma_snapshot(&cprm); } if (ispipe && core_pipe_limit) wait_for_dump_helpers(cprm.file); @@ -1088,6 +1089,20 @@ static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma, return gate_vma; }
+static void free_vma_snapshot(struct coredump_params *cprm) +{ + if (cprm->vma_meta) { + int i; + for (i = 0; i < cprm->vma_count; i++) { + struct file *file = cprm->vma_meta[i].file; + if (file) + fput(file); + } + kvfree(cprm->vma_meta); + cprm->vma_meta = NULL; + } +} + /* * Under the mmap_lock, take a snapshot of relevant information about the task's * VMAs. @@ -1124,6 +1139,11 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) m->end = vma->vm_end; m->flags = vma->vm_flags; m->dump_size = vma_dump_size(vma, cprm->mm_flags); + m->pgoff = vma->vm_pgoff; + + m->file = vma->vm_file; + if (m->file) + get_file(m->file); }
mmap_write_unlock(mm); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 9c1fd6241307..5300d0cb70f7 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -11,6 +11,8 @@ struct core_vma_metadata { unsigned long start, end; unsigned long flags; unsigned long dump_size; + unsigned long pgoff; + struct file *file; };
/*
From: Li Huafei lihuafei1@huawei.com
Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C CVE: CVE-2023-1249
--------------------------------
The coredump_params structure is only used as parameters for function pointer members of some structures, such as linux_binfmt, spufs_calls, etc., and the parameters are of pointer type, so adding members of coredump_params will not affect the memory layout.
Also coredump_params is used to hold coredump parameters to be passed to coredump functions of different types of binfmt, the driver will not use the structure.
Signed-off-by: Li Huafei lihuafei1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/binfmts.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 5a9786e6b554..1716a07a9809 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -82,9 +82,9 @@ struct coredump_params { unsigned long mm_flags; loff_t written; loff_t pos; - int vma_count; - size_t vma_data_size; - struct core_vma_metadata *vma_meta; + KABI_EXTEND(int vma_count) + KABI_EXTEND(size_t vma_data_size) + KABI_EXTEND(struct core_vma_metadata *vma_meta) };
/*
From: ZhangPeng zhangpeng362@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LD0S CVE: NA
----------------------------------------
This reverts commit 49ed1f1e68851efc04d2796eb8d6b6cdcba4dbb7.
It will have a great impact on the product if we can't use vmalloc to alloc high-order physical pages.
Signed-off-by: ZhangPeng zhangpeng362@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/vmalloc.c | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a97d3011aa9a..dadbea29241d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2637,17 +2637,14 @@ static void __vunmap(const void *addr, int deallocate_pages) vm_remove_mappings(area, deallocate_pages);
if (deallocate_pages) { + unsigned int page_order = vm_area_page_order(area); int i;
- for (i = 0; i < area->nr_pages; i++) { + for (i = 0; i < area->nr_pages; i += 1U << page_order) { struct page *page = area->pages[i];
BUG_ON(!page); - /* - * High-order allocs for huge vmallocs are split, so - * can be freed as an array of order-0 allocations - */ - __free_pages(page, 0); + __free_pages(page, page_order); } atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
@@ -2927,7 +2924,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page *page; int p;
- page = alloc_pages_node(node, gfp_mask, page_order); + /* Compound pages required for remap_vmalloc_page */ + page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order); if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vfree() */ area->nr_pages = i; @@ -2939,16 +2937,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; }
- /* - * Higher order allocations must be able to be treated as - * indepdenent small pages by callers (as they can with - * small-page vmallocs). Some drivers do their own refcounting - * on vmalloc_to_page() pages, some use page->mapping, - * page->lru, etc. - */ - if (page_order) - split_page(page, page_order); - for (p = 0; p < (1U << page_order); p++) area->pages[i + p] = page + p;
From: GUO Zihua guozihua@huawei.com
Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6P3K4 CVE: NA
--------------------------------
There is a memory leakage in ima_store_template when ima_add_template_entry returns a non-zero value and duplicated_entry was successfully generated. Fix it by freeing duplicated_entry in that case.
Fixes: 31604143977f ("ima: Add support for measurement with digest lists") Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: yiyang yiyang13@huawei.com Reviewed-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- security/integrity/ima/ima_api.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c index d9f4599dee40..6ecaf6834844 100644 --- a/security/integrity/ima/ima_api.c +++ b/security/integrity/ima/ima_api.c @@ -133,7 +133,9 @@ int ima_store_template(struct ima_template_entry *entry,
entry->pcr = pcr; result = ima_add_template_entry(entry, violation, op, inode, filename); - if (!result && duplicated_entry) { + if (result) { + kfree(duplicated_entry); + } else if (duplicated_entry) { result = ima_add_template_entry(duplicated_entry, violation, op, inode, filename); if (result < 0)
From: Edward Lo edward.lo@ambergroup.io
mainline inclusion from mainline-v6.2-rc1 commit 54e45702b648b7c0000e90b3e9b890e367e16ea8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6OD4U CVE: CVE-2022-48423
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Though we already have some sanity checks while enumerating attributes, resident attribute names aren't included. This patch checks the resident attribute names are in the valid ranges.
[ 259.209031] BUG: KASAN: slab-out-of-bounds in ni_create_attr_list+0x1e1/0x850 [ 259.210770] Write of size 426 at addr ffff88800632f2b2 by task exp/255 [ 259.211551] [ 259.212035] CPU: 0 PID: 255 Comm: exp Not tainted 6.0.0-rc6 #37 [ 259.212955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 259.214387] Call Trace: [ 259.214640] <TASK> [ 259.214895] dump_stack_lvl+0x49/0x63 [ 259.215284] print_report.cold+0xf5/0x689 [ 259.215565] ? kasan_poison+0x3c/0x50 [ 259.215778] ? kasan_unpoison+0x28/0x60 [ 259.215991] ? ni_create_attr_list+0x1e1/0x850 [ 259.216270] kasan_report+0xa7/0x130 [ 259.216481] ? ni_create_attr_list+0x1e1/0x850 [ 259.216719] kasan_check_range+0x15a/0x1d0 [ 259.216939] memcpy+0x3c/0x70 [ 259.217136] ni_create_attr_list+0x1e1/0x850 [ 259.217945] ? __rcu_read_unlock+0x5b/0x280 [ 259.218384] ? ni_remove_attr+0x2e0/0x2e0 [ 259.218712] ? kernel_text_address+0xcf/0xe0 [ 259.219064] ? __kernel_text_address+0x12/0x40 [ 259.219434] ? arch_stack_walk+0x9e/0xf0 [ 259.219668] ? __this_cpu_preempt_check+0x13/0x20 [ 259.219904] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 259.220140] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 259.220561] ni_ins_attr_ext+0x52c/0x5c0 [ 259.220984] ? ni_create_attr_list+0x850/0x850 [ 259.221532] ? run_deallocate+0x120/0x120 [ 259.221972] ? vfs_setxattr+0x128/0x300 [ 259.222688] ? setxattr+0x126/0x140 [ 259.222921] ? path_setxattr+0x164/0x180 [ 259.223431] ? __x64_sys_setxattr+0x6d/0x80 [ 259.223828] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.224417] ? mi_find_attr+0x3c/0xf0 [ 259.224772] ni_insert_attr+0x1ba/0x420 [ 259.225216] ? ni_ins_attr_ext+0x5c0/0x5c0 [ 259.225504] ? ntfs_read_ea+0x119/0x450 [ 259.225775] ni_insert_resident+0xc0/0x1c0 [ 259.226316] ? ni_insert_nonresident+0x400/0x400 [ 259.227001] ? __kasan_kmalloc+0x88/0xb0 [ 259.227468] ? __kmalloc+0x192/0x320 [ 259.227773] ntfs_set_ea+0x6bf/0xb30 [ 259.228216] ? ftrace_graph_ret_addr+0x2a/0xb0 [ 259.228494] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.228838] ? ntfs_read_ea+0x450/0x450 [ 259.229098] ? is_bpf_text_address+0x24/0x40 [ 259.229418] ? kernel_text_address+0xcf/0xe0 [ 259.229681] ? __kernel_text_address+0x12/0x40 [ 259.229948] ? unwind_get_return_address+0x3a/0x60 [ 259.230271] ? write_profile+0x270/0x270 [ 259.230537] ? arch_stack_walk+0x9e/0xf0 [ 259.230836] ntfs_setxattr+0x114/0x5c0 [ 259.231099] ? ntfs_set_acl_ex+0x2e0/0x2e0 [ 259.231529] ? evm_protected_xattr_common+0x6d/0x100 [ 259.231817] ? posix_xattr_acl+0x13/0x80 [ 259.232073] ? evm_protect_xattr+0x1f7/0x440 [ 259.232351] __vfs_setxattr+0xda/0x120 [ 259.232635] ? xattr_resolve_name+0x180/0x180 [ 259.232912] __vfs_setxattr_noperm+0x93/0x300 [ 259.233219] __vfs_setxattr_locked+0x141/0x160 [ 259.233492] ? kasan_poison+0x3c/0x50 [ 259.233744] vfs_setxattr+0x128/0x300 [ 259.234002] ? __vfs_setxattr_locked+0x160/0x160 [ 259.234837] do_setxattr+0xb8/0x170 [ 259.235567] ? vmemdup_user+0x53/0x90 [ 259.236212] setxattr+0x126/0x140 [ 259.236491] ? do_setxattr+0x170/0x170 [ 259.236791] ? debug_smp_processor_id+0x17/0x20 [ 259.237232] ? kasan_quarantine_put+0x57/0x180 [ 259.237605] ? putname+0x80/0xa0 [ 259.237870] ? __kasan_slab_free+0x11c/0x1b0 [ 259.238234] ? putname+0x80/0xa0 [ 259.238500] ? preempt_count_sub+0x18/0xc0 [ 259.238775] ? __mnt_want_write+0xaa/0x100 [ 259.238990] ? mnt_want_write+0x8b/0x150 [ 259.239290] path_setxattr+0x164/0x180 [ 259.239605] ? setxattr+0x140/0x140 [ 259.239849] ? debug_smp_processor_id+0x17/0x20 [ 259.240174] ? fpregs_assert_state_consistent+0x67/0x80 [ 259.240411] __x64_sys_setxattr+0x6d/0x80 [ 259.240715] do_syscall_64+0x3b/0x90 [ 259.240934] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.241697] RIP: 0033:0x7fc6b26e4469 [ 259.242647] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 259.244512] RSP: 002b:00007ffc3c7841f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000bc [ 259.245086] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6b26e4469 [ 259.246025] RDX: 00007ffc3c784380 RSI: 00007ffc3c7842e0 RDI: 00007ffc3c784238 [ 259.246961] RBP: 00007ffc3c788410 R08: 0000000000000001 R09: 00007ffc3c7884f8 [ 259.247775] R10: 000000000000007f R11: 0000000000000217 R12: 00000000004004e0 [ 259.248534] R13: 00007ffc3c7884f0 R14: 0000000000000000 R15: 0000000000000000 [ 259.249368] </TASK> [ 259.249644] [ 259.249888] Allocated by task 255: [ 259.250283] kasan_save_stack+0x26/0x50 [ 259.250957] __kasan_kmalloc+0x88/0xb0 [ 259.251826] __kmalloc+0x192/0x320 [ 259.252745] ni_create_attr_list+0x11e/0x850 [ 259.253298] ni_ins_attr_ext+0x52c/0x5c0 [ 259.253685] ni_insert_attr+0x1ba/0x420 [ 259.253974] ni_insert_resident+0xc0/0x1c0 [ 259.254311] ntfs_set_ea+0x6bf/0xb30 [ 259.254629] ntfs_setxattr+0x114/0x5c0 [ 259.254859] __vfs_setxattr+0xda/0x120 [ 259.255155] __vfs_setxattr_noperm+0x93/0x300 [ 259.255445] __vfs_setxattr_locked+0x141/0x160 [ 259.255862] vfs_setxattr+0x128/0x300 [ 259.256251] do_setxattr+0xb8/0x170 [ 259.256522] setxattr+0x126/0x140 [ 259.256911] path_setxattr+0x164/0x180 [ 259.257308] __x64_sys_setxattr+0x6d/0x80 [ 259.257637] do_syscall_64+0x3b/0x90 [ 259.257970] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.258550] [ 259.258772] The buggy address belongs to the object at ffff88800632f000 [ 259.258772] which belongs to the cache kmalloc-1k of size 1024 [ 259.260190] The buggy address is located 690 bytes inside of [ 259.260190] 1024-byte region [ffff88800632f000, ffff88800632f400) [ 259.261412] [ 259.261743] The buggy address belongs to the physical page: [ 259.262354] page:0000000081e8cac9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x632c [ 259.263722] head:0000000081e8cac9 order:2 compound_mapcount:0 compound_pincount:0 [ 259.264284] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 259.265312] raw: 000fffffc0010200 ffffea0000060d00 dead000000000004 ffff888001041dc0 [ 259.265772] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000 [ 259.266305] page dumped because: kasan: bad access detected [ 259.266588] [ 259.266728] Memory state around the buggy address: [ 259.267225] ffff88800632f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.267841] ffff88800632f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.269111] >ffff88800632f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.269626] ^ [ 259.270162] ffff88800632f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.270810] ffff88800632f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs3/record.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index 861e35791506..b06af26c00a0 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -260,6 +260,11 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) if (t16 + t32 > asize) return NULL;
+ if (attr->name_len && + le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len > t16) { + return NULL; + } + return attr; }
From: Sven Schnelle svens@linux.ibm.com
stable inclusion from stable-v5.10.173 commit 84ea44dc3e4ecb2632586238014bf6722aa5843b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Q4F0 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit db4df8e9d79e7d37732c1a1b560958e8dadfefa1 ]
When specifying an invalid console= device like console=tty3270, tty_driver_lookup_tty() returns the tty struct without checking whether index is a valid number.
To reproduce:
qemu-system-x86_64 -enable-kvm -nographic -serial mon:stdio \ -kernel ../linux-build-x86/arch/x86/boot/bzImage \ -append "console=ttyS0 console=tty3270"
This crashes with:
[ 0.770599] BUG: kernel NULL pointer dereference, address: 00000000000000ef [ 0.771265] #PF: supervisor read access in kernel mode [ 0.771773] #PF: error_code(0x0000) - not-present page [ 0.772609] Oops: 0000 [#1] PREEMPT SMP PTI [ 0.774878] RIP: 0010:tty_open+0x268/0x6f0 [ 0.784013] chrdev_open+0xbd/0x230 [ 0.784444] ? cdev_device_add+0x80/0x80 [ 0.784920] do_dentry_open+0x1e0/0x410 [ 0.785389] path_openat+0xca9/0x1050 [ 0.785813] do_filp_open+0xaa/0x150 [ 0.786240] file_open_name+0x133/0x1b0 [ 0.786746] filp_open+0x27/0x50 [ 0.787244] console_on_rootfs+0x14/0x4d [ 0.787800] kernel_init_freeable+0x1e4/0x20d [ 0.788383] ? rest_init+0xc0/0xc0 [ 0.788881] kernel_init+0x11/0x120 [ 0.789356] ret_from_fork+0x22/0x30
Signed-off-by: Sven Schnelle svens@linux.ibm.com Reviewed-by: Jiri Slaby jirislaby@kernel.org Link: https://lore.kernel.org/r/20221209112737.3222509-2-svens@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/tty/tty_io.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 669aef77a0bd..c37d2657308c 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1237,14 +1237,16 @@ static struct tty_struct *tty_driver_lookup_tty(struct tty_driver *driver, { struct tty_struct *tty;
- if (driver->ops->lookup) + if (driver->ops->lookup) { if (!file) tty = ERR_PTR(-EIO); else tty = driver->ops->lookup(driver, file, idx); - else + } else { + if (idx >= driver->num) + return ERR_PTR(-EINVAL); tty = driver->ttys[idx]; - + } if (!IS_ERR(tty)) tty_kref_get(tty); return tty;
From: Pedro Tammela pctammela@mojatatu.com
stable inclusion from stable-v5.10.168 commit eb8e9d8572d1d9df17272783ad8a84843ce559d4 category: bugfix bugzilla: 188576, https://gitee.com/src-openeuler/kernel/issues/I6OP9S CVE: CVE-2023-1281
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit ee059170b1f7e94e55fa6cadee544e176a6e59c2 upstream.
The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext.
CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis sec@valis.email Suggested-by: valis sec@valis.email Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: Pedro Tammela pctammela@mojatatu.com Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sched/cls_tcindex.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index e9a8a2c86bbd..dc87feaa3cb3 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -12,6 +12,7 @@ #include <linux/errno.h> #include <linux/slab.h> #include <linux/refcount.h> +#include <linux/rcupdate.h> #include <net/act_api.h> #include <net/netlink.h> #include <net/pkt_cls.h> @@ -338,6 +339,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcf_result cr = {}; int err, balloc = 0; struct tcf_exts e; + bool update_h = false;
err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) @@ -455,10 +457,13 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, } }
- if (cp->perfect) + if (cp->perfect) { r = cp->perfect + handle; - else - r = tcindex_lookup(cp, handle) ? : &new_filter_result; + } else { + /* imperfect area is updated in-place using rcu */ + update_h = !!tcindex_lookup(cp, handle); + r = &new_filter_result; + }
if (r == &new_filter_result) { f = kzalloc(sizeof(*f), GFP_KERNEL); @@ -492,7 +497,28 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
rcu_assign_pointer(tp->root, cp);
- if (r == &new_filter_result) { + if (update_h) { + struct tcindex_filter __rcu **fp; + struct tcindex_filter *cf; + + f->result.res = r->res; + tcf_exts_change(&f->result.exts, &r->exts); + + /* imperfect area bucket */ + fp = cp->h + (handle % cp->hash); + + /* lookup the filter, guaranteed to exist */ + for (cf = rcu_dereference_bh_rtnl(*fp); cf; + fp = &cf->next, cf = rcu_dereference_bh_rtnl(*fp)) + if (cf->key == handle) + break; + + f->next = cf->next; + + cf = rcu_replace_pointer(*fp, f, 1); + tcf_exts_get_net(&cf->result.exts); + tcf_queue_work(&cf->rwork, tcindex_destroy_fexts_work); + } else if (r == &new_filter_result) { struct tcindex_filter *nfp; struct tcindex_filter __rcu **fp;
From: Pedro Tammela pctammela@mojatatu.com
stable inclusion from stable-v5.10.168 commit 4fe9950815e19051b7b8268b4d4c3ac286a741bf category: bugfix bugzilla: 188576, https://gitee.com/src-openeuler/kernel/issues/I6OP9S CVE: CVE-2023-1281
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
[ Upstream commit 42018a322bd453e38b3ffee294982243e50a484f ]
Syzkaller found an issue where a handle greater than 16 bits would trigger a null-ptr-deref in the imperfect hash area update.
general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted 6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509 Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00 RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8 RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8 R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00 FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572 tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155 rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x334/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmmsg+0x18f/0x460 net/socket.c:2616 __do_sys_sendmmsg net/socket.c:2645 [inline] __se_sys_sendmmsg net/socket.c:2642 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
Fixes: ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu") Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: Pedro Tammela pctammela@mojatatu.com Reported-by: syzbot syzkaller@googlegroups.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sched/cls_tcindex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index dc87feaa3cb3..56612caf7833 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -510,7 +510,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, /* lookup the filter, guaranteed to exist */ for (cf = rcu_dereference_bh_rtnl(*fp); cf; fp = &cf->next, cf = rcu_dereference_bh_rtnl(*fp)) - if (cf->key == handle) + if (cf->key == (u16)handle) break;
f->next = cf->next;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.17-rc3 commit ebb7fb1557b1d03b906b668aa2164b51e6b7d19a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Trond Myklebust reported soft lockups in XFS IO completion such as this:
watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106] CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1 Workqueue: xfs-conv/md127 xfs_end_io [xfs] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40
Ioends are processed as an atomic completion unit when all the chained bios in the ioend have completed their IO. Logically contiguous ioends can also be merged and completed as a single, larger unit. Both of these things can be problematic as both the bio chains per ioend and the size of the merged ioends processed as a single completion are both unbound.
If we have a large sequential dirty region in the page cache, write_cache_pages() will keep feeding us sequential pages and we will keep mapping them into ioends and bios until we get a dirty page at a non-sequential file offset. These large sequential runs can will result in bio and ioend chaining to optimise the io patterns. The pages iunder writeback are pinned within these chains until the submission chaining is broken, allowing the entire chain to be completed. This can result in huge chains being processed in IO completion context.
We get deep bio chaining if we have large contiguous physical extents. We will keep adding pages to the current bio until it is full, then we'll chain a new bio to keep adding pages for writeback. Hence we can build bio chains that map millions of pages and tens of gigabytes of RAM if the page cache contains big enough contiguous dirty file regions. This long bio chain pins those pages until the final bio in the chain completes and the ioend can iterate all the chained bios and complete them.
OTOH, if we have a physically fragmented file, we end up submitting one ioend per physical fragment that each have a small bio or bio chain attached to them. We do not chain these at IO submission time, but instead we chain them at completion time based on file offset via iomap_ioend_try_merge(). Hence we can end up with unbound ioend chains being built via completion merging.
XFS can then do COW remapping or unwritten extent conversion on that merged chain, which involves walking an extent fragment at a time and running a transaction to modify the physical extent information. IOWs, we merge all the discontiguous ioends together into a contiguous file range, only to then process them individually as discontiguous extents.
This extent manipulation is computationally expensive and can run in a tight loop, so merging logically contiguous but physically discontigous ioends gains us nothing except for hiding the fact the fact we broke the ioends up into individual physical extents at submission and then need to loop over those individual physical extents at completion.
Hence we need to have mechanisms to limit ioend sizes and to break up completion processing of large merged ioend chains:
1. bio chains per ioend need to be bound in length. Pure overwrites go straight to iomap_finish_ioend() in softirq context with the exact bio chain attached to the ioend by submission. Hence the only way to prevent long holdoffs here is to bound ioend submission sizes because we can't reschedule in softirq context.
2. iomap_finish_ioends() has to handle unbound merged ioend chains correctly. This relies on any one call to iomap_finish_ioend() being bound in runtime so that cond_resched() can be issued regularly as the long ioend chain is processed. i.e. this relies on mechanism #1 to limit individual ioend sizes to work correctly.
3. filesystems have to loop over the merged ioends to process physical extent manipulations. This means they can loop internally, and so we break merging at physical extent boundaries so the filesystem can easily insert reschedule points between individual extent manipulations.
Signed-off-by: Dave Chinner dchinner@redhat.com Reported-and-tested-by: Trond Myklebust trondmy@hammerspace.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Conflicts: include/linux/iomap.h fs/iomap/buffered-io.c fs/xfs/xfs_aops.c
[ 6e552494fb90 ("iomap: remove unused private field from ioend") is not applied. 95c4cd053a1d ("iomap: Convert to_iomap_page to take a folio") is not applied. 8ffd74e9a816 ("iomap: Convert bio completions to use folios") is not applied. 044c6449f18f ("xfs: drop unused ioend private merge and setfilesize code") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_aops.c | 16 ++++++++++++- include/linux/iomap.h | 2 ++ 3 files changed, 65 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 986020dcf026..33ddce5c5bbc 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -21,6 +21,8 @@
#include "../internal.h"
+#define IOEND_BATCH_SIZE 4096 + /* * Structure allocated for each page or THP when block size < page size * to track sub-page uptodate status and I/O completions. @@ -1061,7 +1063,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page, * state, release holds on bios, and finally free up memory. Do not use the * ioend after this. */ -static void +static u32 iomap_finish_ioend(struct iomap_ioend *ioend, int error) { struct inode *inode = ioend->io_inode; @@ -1070,6 +1072,7 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) u64 start = bio->bi_iter.bi_sector; loff_t offset = ioend->io_offset; bool quiet = bio_flagged(bio, BIO_QUIET); + u32 folio_count = 0;
for (bio = &ioend->io_inline_bio; bio; bio = next) { struct bio_vec *bv; @@ -1085,9 +1088,11 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) next = bio->bi_private;
/* walk each page on bio, ending page IO on them */ - bio_for_each_segment_all(bv, bio, iter_all) + bio_for_each_segment_all(bv, bio, iter_all) { iomap_finish_page_writeback(inode, bv->bv_page, error, bv->bv_len); + folio_count++; + } bio_put(bio); } /* The ioend has been freed by bio_put() */ @@ -1097,20 +1102,36 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) "%s: writeback error on inode %lu, offset %lld, sector %llu", inode->i_sb->s_id, inode->i_ino, offset, start); } + return folio_count; }
+/* + * Ioend completion routine for merged bios. This can only be called from task + * contexts as merged ioends can be of unbound length. Hence we have to break up + * the writeback completions into manageable chunks to avoid long scheduler + * holdoffs. We aim to keep scheduler holdoffs down below 10ms so that we get + * good batch processing throughput without creating adverse scheduler latency + * conditions. + */ void iomap_finish_ioends(struct iomap_ioend *ioend, int error) { struct list_head tmp; + u32 completions; + + might_sleep();
list_replace_init(&ioend->io_list, &tmp); - iomap_finish_ioend(ioend, error); + completions = iomap_finish_ioend(ioend, error);
while (!list_empty(&tmp)) { + if (completions > IOEND_BATCH_SIZE * 8) { + cond_resched(); + completions = 0; + } ioend = list_first_entry(&tmp, struct iomap_ioend, io_list); list_del_init(&ioend->io_list); - iomap_finish_ioend(ioend, error); + completions += iomap_finish_ioend(ioend, error); } } EXPORT_SYMBOL_GPL(iomap_finish_ioends); @@ -1131,6 +1152,18 @@ iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next) return false; if (ioend->io_offset + ioend->io_size != next->io_offset) return false; + /* + * Do not merge physically discontiguous ioends. The filesystem + * completion functions will have to iterate the physical + * discontiguities even if we merge the ioends at a logical level, so + * we don't gain anything by merging physical discontiguities here. + * + * We cannot use bio->bi_iter.bi_sector here as it is modified during + * submission so does not point to the start sector of the bio at + * completion. + */ + if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector) + return false; return true; }
@@ -1236,9 +1269,11 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc, ioend->io_flags = wpc->iomap.flags; ioend->io_inode = inode; ioend->io_size = 0; + ioend->io_folios = 0; ioend->io_offset = offset; ioend->io_private = NULL; ioend->io_bio = bio; + ioend->io_sector = sector; return ioend; }
@@ -1279,6 +1314,13 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset, return false; if (sector != bio_end_sector(wpc->ioend->io_bio)) return false; + /* + * Limit ioend bio chain lengths to minimise IO completion latency. This + * also prevents long tight loops ending page writeback on all the + * folios in the ioend. + */ + if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE) + return false; return true; }
@@ -1372,6 +1414,8 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, &submit_list); count++; } + if (count) + wpc->ioend->io_folios++;
WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); WARN_ON_ONCE(!PageLocked(page)); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index e341d6531e68..d4d531c78ae4 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -191,7 +191,20 @@ xfs_ioend_merge_private( } }
-/* Finish all pending io completions. */ +/* + * Finish all pending IO completions that require transactional modifications. + * + * We try to merge physical and logically contiguous ioends before completion to + * minimise the number of transactions we need to perform during IO completion. + * Both unwritten extent conversion and COW remapping need to iterate and modify + * one physical extent at a time, so we gain nothing by merging physically + * discontiguous extents here. + * + * The ioend chain length that we can be processing here is largely unbound in + * length and we may have to perform significant amounts of work on each ioend + * to complete it. Hence we have to be careful about holding the CPU for too + * long in this loop. + */ void xfs_end_io( struct work_struct *work) @@ -212,6 +225,7 @@ xfs_end_io( list_del_init(&ioend->io_list); iomap_ioend_try_merge(ioend, &tmp, xfs_ioend_merge_private); xfs_end_ioend(ioend); + cond_resched(); } }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h index d84fc7e55e21..0c95321f42fd 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -199,10 +199,12 @@ struct iomap_ioend { struct list_head io_list; /* next ioend in chain */ u16 io_type; u16 io_flags; /* IOMAP_F_* */ + u32 io_folios; /* folios added to ioend */ struct inode *io_inode; /* file being written to */ size_t io_size; /* size of the extent */ loff_t io_offset; /* offset in the file */ void *io_private; /* file system private data */ + sector_t io_sector; /* start sector of ioend */ struct bio *io_bio; /* bio being built */ struct bio io_inline_bio; /* MUST BE LAST! */ };
From: Zhong Jinghua zhongjinghua@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6OSXU
----------------------------------------
commit "md/raid6: refactor raid5_read_one_chunk" incorrectly merged the code. Repeatedly applying for memory leads to memory leaks.
Fix it by removing redundant allocating memory code.
Fixes: c13c2cd20195 ("md/raid6: refactor raid5_read_one_chunk") Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/raid5.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 454d90b785b9..27f961aa2714 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5429,7 +5429,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) atomic_inc(&rdev->nr_pending); rcu_read_unlock();
- align_bio = bio_clone_fast(raid_bio, GFP_NOIO, &mddev->bio_set); align_bio = bio_clone_fast(raid_bio, GFP_NOIO, &mddev->io_acct_set); md_io_acct = container_of(align_bio, struct md_io_acct, bio_clone); raid_bio->bi_next = (void *)rdev;
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v6.3-rc1 commit 3109de308987ceae413ee015038d51e2a86c7806 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6POXN CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It's possible that kcompactd_run could fail to run kcompactd for a hot added node and leave pgdat->kcompactd as NULL. So pgdat->kcompactd should be checked here to avoid possible NULL pointer dereference.
Link: https://lkml.kernel.org/r/20220418141253.24298-10-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Cc: Charan Teja Kalla charante@codeaurora.org Cc: David Hildenbrand david@redhat.com Cc: Pintu Kumar pintu@codeaurora.org Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Ze Zuo zuoze1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/compaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c index 509dee7c2f38..8e8017d909b7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2949,7 +2949,8 @@ static int kcompactd_cpu_online(unsigned int cpu)
if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids) /* One of our CPUs online: restore mask */ - set_cpus_allowed_ptr(pgdat->kcompactd, mask); + if (pgdat->kcompactd) + set_cpus_allowed_ptr(pgdat->kcompactd, mask); } return 0; }
From: Zhong Jinghua zhongjinghua@huawei.com
hulk inclusion category: bugfix bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162
----------------------------------------
This reverts commit 51e35e67694e347cf5cff3aa6ac20daa9b0b8458.
There is a new fix for this problem in the mainline patch, so the patch should return to the mainline solution.
mainline patch: d36a9ea5e776 ("block: fix use-after-free of q->q_usage_counter")
Fixes: 51e35e67694e("block: fix null-deref in percpu_ref_put") Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- block/blk-core.c | 4 +--- block/blk-mq.c | 7 ------- include/linux/blk-mq.h | 1 - include/linux/blkdev.h | 2 -- 4 files changed, 1 insertion(+), 13 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index f0e28624ef9c..2746db5f54d7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -400,8 +400,7 @@ void blk_cleanup_queue(struct request_queue *q) * prevent that blk_mq_run_hw_queues() accesses the hardware queues * after draining finished. */ - blk_freeze_queue_start(q); - blk_mq_freeze_queue_wait_sync(q); + blk_freeze_queue(q);
rq_qos_exit(q);
@@ -518,7 +517,6 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter);
- blk_queue_flag_set(QUEUE_FLAG_USAGE_COUNT_SYNC, q); wake_up_all(&q->mq_freeze_wq); }
diff --git a/block/blk-mq.c b/block/blk-mq.c index e0ed97465b0f..5f8660ef8c37 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -195,7 +195,6 @@ void blk_freeze_queue_start(struct request_queue *q) { mutex_lock(&q->mq_freeze_lock); if (++q->mq_freeze_depth == 1) { - blk_queue_flag_clear(QUEUE_FLAG_USAGE_COUNT_SYNC, q); percpu_ref_kill(&q->q_usage_counter); mutex_unlock(&q->mq_freeze_lock); if (queue_is_mq(q)) @@ -206,12 +205,6 @@ void blk_freeze_queue_start(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-void blk_mq_freeze_queue_wait_sync(struct request_queue *q) -{ - wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter) && - test_bit(QUEUE_FLAG_USAGE_COUNT_SYNC, &q->queue_flags)); -} - void blk_mq_freeze_queue_wait(struct request_queue *q) { wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f3d78b939e01..2eca0b76b3e9 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -566,7 +566,6 @@ void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_unfreeze_queue(struct request_queue *q); void blk_freeze_queue_start(struct request_queue *q); void blk_mq_freeze_queue_wait(struct request_queue *q); -void blk_mq_freeze_queue_wait_sync(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b04613bc3ed5..9ede0a81e466 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -643,8 +643,6 @@ struct request_queue { #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ /*at least one blk-mq hctx can't get driver tag */ #define QUEUE_FLAG_HCTX_WAIT 30 -/* sync for q_usage_counter */ -#define QUEUE_FLAG_USAGE_COUNT_SYNC 31
#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.18-rc1 commit ba3e845665fbbb0252336f27200cd5cf288a3573 category: bugfix bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
After blk_cleanup_queue() returns, disk may not be released yet, so probably bio may still be submitted and ->q_usage_counter may be touched, so far this way seems safe, but not good from API's viewpoint.
Move the release q_usage_counter into blk_queue_release().
Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220308055200.735835-12-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: block/blk-core.c block/blk-sysfs.c
Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- block/blk-core.c | 2 -- block/blk-sysfs.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index 2746db5f54d7..06fb25bd24df 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -431,8 +431,6 @@ void blk_cleanup_queue(struct request_queue *q) blk_mq_sched_free_rqs(q); mutex_unlock(&q->sysfs_lock);
- percpu_ref_exit(&q->q_usage_counter); - /* @q is and will stay empty, shutdown and put */ blk_put_queue(q); } diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 780f02cbda84..35cd1df55680 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -784,6 +784,8 @@ static void blk_release_queue(struct kobject *kobj)
might_sleep();
+ percpu_ref_exit(&q->q_usage_counter); + if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); blk_stat_free_callback(q->poll_cb);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v6.2-rc1 commit d36a9ea5e7766961e753ee38d4c331bbe6ef659b category: bugfix bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
For blk-mq, queue release handler is usually called after blk_mq_freeze_queue_wait() returns. However, the q_usage_counter->release() handler may not be run yet at that time, so this can cause a use-after-free.
Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu(). Since ->release() is called with rcu read lock held, it is agreed that the race should be covered in caller per discussion from the two links.
Reported-by: Zhang Wensheng zhangwensheng@huaweicloud.com Reported-by: Zhong Jinghua zhongjinghua@huawei.com Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/ Cc: Hillf Danton hdanton@sina.com Cc: Yu Kuai yukuai3@huawei.com Cc: Dennis Zhou dennis@kernel.org Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Signed-off-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: block/blk-sysfs.c block/blk-core.c
Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- block/blk-sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 35cd1df55680..169e63ee05cb 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -726,6 +726,8 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) { struct request_queue *q = container_of(rcu_head, struct request_queue, rcu_head); + + percpu_ref_exit(&q->q_usage_counter); kmem_cache_free(blk_requestq_cachep, q); }
@@ -784,8 +786,6 @@ static void blk_release_queue(struct kobject *kobj)
might_sleep();
- percpu_ref_exit(&q->q_usage_counter); - if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); blk_stat_free_callback(q->poll_cb);