From: Kees Cook keescook@chromium.org
mainline inclusion from mainline-v5.13-rc1 commit 39218ff4c625dbf2e68224024fe0acaa60bcd51a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This provides the ability for architectures to enable kernel stack base address offset randomization. This feature is controlled by the boot param "randomize_kstack_offset=on/off", with its default value set by CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
This feature is based on the original idea from the last public release of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt All the credit for the original idea goes to the PaX team. Note that the design and implementation of this upstream randomize_kstack_offset feature differs greatly from the RANDKSTACK feature (see below).
Reasoning for the feature:
This feature aims to make harder the various stack-based attacks that rely on deterministic stack structure. We have had many such attacks in past (just to name few):
https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux...
As Linux kernel stack protections have been constantly improving (vmap-based stack allocation with guard pages, removal of thread_info, STACKLEAK), attackers have had to find new ways for their exploits to work. They have done so, continuing to rely on the kernel's stack determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT were not relevant. For example, the following recent attacks would have been hampered if the stack offset was non-deterministic between syscalls:
https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf (page 70: targeting the pt_regs copy with linear stack overflow)
https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html (leaked stack address from one syscall as a target during next syscall)
The main idea is that since the stack offset is randomized on each system call, it is harder for an attack to reliably land in any particular place on the thread stack, even with address exposures, as the stack base will change on the next syscall. Also, since randomization is performed after placing pt_regs, the ptrace-based approach[1] to discover the randomized offset during a long-running syscall should not be possible.
Design description:
During most of the kernel's execution, it runs on the "thread stack", which is pretty deterministic in its structure: it is fixed in size, and on every entry from userspace to kernel on a syscall the thread stack starts construction from an address fetched from the per-cpu cpu_current_top_of_stack variable. The first element to be pushed to the thread stack is the pt_regs struct that stores all required CPU registers and syscall parameters. Finally the specific syscall function is called, with the stack being used as the kernel executes the resulting request.
The goal of randomize_kstack_offset feature is to add a random offset after the pt_regs has been pushed to the stack and before the rest of the thread stack is used during the syscall processing, and to change it every time a process issues a syscall. The source of randomness is currently architecture-defined (but x86 is using the low byte of rdtsc()). Future improvements for different entropy sources is possible, but out of scope for this patch. Further more, to add more unpredictability, new offsets are chosen at the end of syscalls (the timing of which should be less easy to measure from userspace than at syscall entry time), and stored in a per-CPU variable, so that the life of the value does not stay explicitly tied to a single task.
As suggested by Andy Lutomirski, the offset is added using alloca() and an empty asm() statement with an output constraint, since it avoids changes to assembly syscall entry code, to the unwinder, and provides correct stack alignment as defined by the compiler.
In order to make this available by default with zero performance impact for those that don't want it, it is boot-time selectable with static branches. This way, if the overhead is not wanted, it can just be left turned off with no performance impact.
The generated assembly for x86_64 with GCC looks like this:
... ffffffff81003977: 65 8b 05 02 ea 00 7f mov %gs:0x7f00ea02(%rip),%eax # 12380 <kstack_offset> ffffffff8100397e: 25 ff 03 00 00 and $0x3ff,%eax ffffffff81003983: 48 83 c0 0f add $0xf,%rax ffffffff81003987: 25 f8 07 00 00 and $0x7f8,%eax ffffffff8100398c: 48 29 c4 sub %rax,%rsp ffffffff8100398f: 48 8d 44 24 0f lea 0xf(%rsp),%rax ffffffff81003994: 48 83 e0 f0 and $0xfffffffffffffff0,%rax ...
As a result of the above stack alignment, this patch introduces about 5 bits of randomness after pt_regs is spilled to the thread stack on x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for stack alignment). The amount of entropy could be adjusted based on how much of the stack space we wish to trade for security.
My measure of syscall performance overhead (on x86_64):
lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null randomize_kstack_offset=y Simple syscall: 0.7082 microseconds randomize_kstack_offset=n Simple syscall: 0.7016 microseconds
So, roughly 0.9% overhead growth for a no-op syscall, which is very manageable. And for people that don't want this, it's off by default.
There are two gotchas with using the alloca() trick. First, compilers that have Stack Clash protection (-fstack-clash-protection) enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to any dynamic stack allocations. While the randomization offset is always less than a page, the resulting assembly would still contain (unreachable!) probing routines, bloating the resulting assembly. To avoid this, -fno-stack-clash-protection is unconditionally added to the kernel Makefile since this is the only dynamic stack allocation in the kernel (now that VLAs have been removed) and it is provably safe from Stack Clash style attacks.
The second gotcha with alloca() is a negative interaction with -fstack-protector*, in that it sees the alloca() as an array allocation, which triggers the unconditional addition of the stack canary function pre/post-amble which slows down syscalls regardless of the static branch. In order to avoid adding this unneeded check and its associated performance impact, architectures need to carefully remove uses of -fstack-protector-strong (or -fstack-protector) in the compilation units that use the add_random_kstack() macro and to audit the resulting stack mitigation coverage (to make sure no desired coverage disappears). No change is visible for this on x86 because the stack protector is already unconditionally disabled for the compilation unit, but the change is required on arm64. There is, unfortunately, no attribute that can be used to disable stack protector for specific functions.
Comparison to PaX RANDKSTACK feature:
The RANDKSTACK feature randomizes the location of the stack start (cpu_current_top_of_stack), i.e. including the location of pt_regs structure itself on the stack. Initially this patch followed the same approach, but during the recent discussions[2], it has been determined to be of a little value since, if ptrace functionality is available for an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write different offsets in the pt_regs struct, observe the cache behavior of the pt_regs accesses, and figure out the random stack offset. Another difference is that the random offset is stored in a per-cpu variable, rather than having it be per-thread. As a result, these implementations differ a fair bit in their implementation details and results, though obviously the intent is similar.
[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4B... [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshet... [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html
Co-developed-by: Elena Reshetova elena.reshetova@intel.com Signed-off-by: Elena Reshetova elena.reshetova@intel.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
conflict: Documentation/admin-guide/kernel-parameters.txt arch/Kconfig
Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- .../admin-guide/kernel-parameters.txt | 11 ++++ Makefile | 4 ++ arch/Kconfig | 23 ++++++++ include/linux/randomize_kstack.h | 54 +++++++++++++++++++ init/main.c | 23 ++++++++ 5 files changed, 115 insertions(+) create mode 100644 include/linux/randomize_kstack.h
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0ec4c66752af..dca5a96e49b2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4258,6 +4258,17 @@ fully seed the kernel's CRNG. Default is controlled by CONFIG_RANDOM_TRUST_BOOTLOADER.
+ randomize_kstack_offset= + [KNL] Enable or disable kernel stack offset + randomization, which provides roughly 5 bits of + entropy, frustrating memory corruption attacks + that depend on stack address determinism or + cross-syscall address exposures. This is only + available on architectures that have defined + CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET. + Format: <bool> (1/Y/y=enable, 0/N/n=disable) + Default is CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. + ras=option[,option,...] [KNL] RAS-specific options
cec_disable [X86] diff --git a/Makefile b/Makefile index 6fa75aa5836d..df7e3d33a9e1 100644 --- a/Makefile +++ b/Makefile @@ -825,6 +825,10 @@ KBUILD_CFLAGS += -ftrivial-auto-var-init=zero KBUILD_CFLAGS += -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang endif
+# While VLAs have been removed, GCC produces unreachable stack probes +# for the randomize_kstack_offset feature. Disable it for all compilers. +KBUILD_CFLAGS += $(call cc-option, -fno-stack-clash-protection) + DEBUG_CFLAGS :=
# Workaround for GCC versions < 5.0 diff --git a/arch/Kconfig b/arch/Kconfig index 7800502d9b6e..97fc1bdcdc8d 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -989,6 +989,29 @@ config VMAP_STACK virtual mappings with real shadow memory, and KASAN_VMALLOC must be enabled.
+config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET + def_bool n + help + An arch should select this symbol if it can support kernel stack + offset randomization with calls to add_random_kstack_offset() + during syscall entry and choose_random_kstack_offset() during + syscall exit. Careful removal of -fstack-protector-strong and + -fstack-protector should also be applied to the entry code and + closely examined, as the artificial stack bump looks like an array + to the compiler, so it will attempt to add canary checks regardless + of the static branch state. + +config RANDOMIZE_KSTACK_OFFSET_DEFAULT + bool "Randomize kernel stack offset on syscall entry" + depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET + help + The kernel stack offset can be randomized (after pt_regs) by + roughly 5 bits of entropy, frustrating memory corruption + attacks that depend on stack address determinism or + cross-syscall address exposures. This feature is controlled + by kernel boot param "randomize_kstack_offset=on/off", and this + config chooses the default boot state. + config ARCH_OPTIONAL_KERNEL_RWX def_bool n
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h new file mode 100644 index 000000000000..fd80fab663a9 --- /dev/null +++ b/include/linux/randomize_kstack.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _LINUX_RANDOMIZE_KSTACK_H +#define _LINUX_RANDOMIZE_KSTACK_H + +#include <linux/kernel.h> +#include <linux/jump_label.h> +#include <linux/percpu-defs.h> + +DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, + randomize_kstack_offset); +DECLARE_PER_CPU(u32, kstack_offset); + +/* + * Do not use this anywhere else in the kernel. This is used here because + * it provides an arch-agnostic way to grow the stack with correct + * alignment. Also, since this use is being explicitly masked to a max of + * 10 bits, stack-clash style attacks are unlikely. For more details see + * "VLAs" in Documentation/process/deprecated.rst + */ +void *__builtin_alloca(size_t size); +/* + * Use, at most, 10 bits of entropy. We explicitly cap this to keep the + * "VLA" from being unbounded (see above). 10 bits leaves enough room for + * per-arch offset masks to reduce entropy (by removing higher bits, since + * high entropy may overly constrain usable stack space), and for + * compiler/arch-specific stack alignment to remove the lower bits. + */ +#define KSTACK_OFFSET_MAX(x) ((x) & 0x3FF) + +/* + * These macros must be used during syscall entry when interrupts and + * preempt are disabled, and after user registers have been stored to + * the stack. + */ +#define add_random_kstack_offset() do { \ + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ + &randomize_kstack_offset)) { \ + u32 offset = raw_cpu_read(kstack_offset); \ + u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset)); \ + /* Keep allocation even after "ptr" loses scope. */ \ + asm volatile("" : "=o"(*ptr) :: "memory"); \ + } \ +} while (0) + +#define choose_random_kstack_offset(rand) do { \ + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ + &randomize_kstack_offset)) { \ + u32 offset = raw_cpu_read(kstack_offset); \ + offset ^= (rand); \ + raw_cpu_write(kstack_offset, offset); \ + } \ +} while (0) + +#endif diff --git a/init/main.c b/init/main.c index 7f4e8a8964b1..e1d179fa1f4a 100644 --- a/init/main.c +++ b/init/main.c @@ -846,6 +846,29 @@ static void __init mm_init(void) pti_init(); }
+#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, + randomize_kstack_offset); +DEFINE_PER_CPU(u32, kstack_offset); + +static int __init early_randomize_kstack_offset(char *buf) +{ + int ret; + bool bool_result; + + ret = kstrtobool(buf, &bool_result); + if (ret) + return ret; + + if (bool_result) + static_branch_enable(&randomize_kstack_offset); + else + static_branch_disable(&randomize_kstack_offset); + return 0; +} +early_param("randomize_kstack_offset", early_randomize_kstack_offset); +#endif + void __init __weak arch_call_rest_init(void) { rest_init();
From: Kees Cook keescook@chromium.org
mainline inclusion from mainline-v5.13-rc1 commit fe950f6020338c8ac668ef823bb692d36b7542a2 category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Allow for a randomized stack offset on a per-syscall basis, with roughly 5-6 bits of entropy, depending on compiler and word size. Since the method of offsetting uses macros, this cannot live in the common entry code (the stack offset needs to be retained for the life of the syscall, which means it needs to happen at the actual entry point).
Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20210401232347.2791257-5-keescook@chromium.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 3 +++ arch/x86/include/asm/entry-common.h | 16 ++++++++++++++++ 3 files changed, 20 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a787f309830c..d25c983a3b64 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -164,6 +164,7 @@ config X86 select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64 select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD select HAVE_ARCH_VMAP_STACK if X86_64 + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET select HAVE_ARCH_WITHIN_STACK_FRAMES select HAVE_ASM_MODVERSIONS select HAVE_CMPXCHG_DOUBLE diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 93a3122cd15f..1f179e317f0c 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -38,6 +38,7 @@ #ifdef CONFIG_X86_64 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) { + add_random_kstack_offset(); nr = syscall_enter_from_user_mode(regs, nr);
instrumentation_begin(); @@ -83,6 +84,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs) { unsigned int nr = syscall_32_enter(regs);
+ add_random_kstack_offset(); /* * Subtlety here: if ptrace pokes something larger than 2^32-1 into * orig_ax, the unsigned int return value truncates it. This may @@ -102,6 +104,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs) unsigned int nr = syscall_32_enter(regs); int res;
+ add_random_kstack_offset(); /* * This cannot use syscall_enter_from_user_mode() as it has to * fetch EBP before invoking any of the syscall entry work diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index 4a382fb6a9ef..50ded30ae54e 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -2,6 +2,7 @@ #ifndef _ASM_X86_ENTRY_COMMON_H #define _ASM_X86_ENTRY_COMMON_H
+#include <linux/randomize_kstack.h> #include <linux/user-return-notifier.h>
#include <asm/nospec-branch.h> @@ -72,6 +73,21 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, */ current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED); #endif + + /* + * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(), + * but not enough for x86 stack utilization comfort. To keep + * reasonable stack head room, reduce the maximum offset to 8 bits. + * + * The actual entropy will be further reduced by the compiler when + * applying stack alignment constraints (see cc_stack_align4/8 in + * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32) + * low bits from any entropy chosen here. + * + * Therefore, final stack offset entropy will be 5 (x86_64) or + * 6 (ia32) bits. + */ + choose_random_kstack_offset(rdtsc() & 0xFF); } #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
From: Kees Cook keescook@chromium.org
mainline inclusion from mainline-v5.13-rc1 commit 70918779aec9bd01d16f4e6e800ffe423d196021 category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Allow for a randomized stack offset on a per-syscall basis, with roughly 5 bits of entropy. (And include AAPCS rationale AAPCS thanks to Mark Rutland.)
In order to avoid unconditional stack canaries on syscall entry (due to the use of alloca()), also disable stack protector to avoid triggering needless checks and slowing down the entry path. As there is no general way to control stack protector coverage with a function attribute[1], this must be disabled at the compilation unit level. This isn't a problem here, though, since stack protector was not triggered before: examining the resulting syscall.o, there are no changes in canary coverage (none before, none now).
[1] a working __attribute__((no_stack_protector)) has been added to GCC and Clang but has not been released in any version yet: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=346b302d09c1e6db56d9fe69048ac... https://reviews.llvm.org/rG4fbf84c1732fca596ad1d6e96015e19760eb8a9b
Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20210401232347.2791257-6-keescook@chromium.org
conflict: arch/arm64/Kconfig arch/arm64/kernel/syscall.c
Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/Kconfig | 1 + arch/arm64/kernel/Makefile | 5 +++++ arch/arm64/kernel/syscall.c | 16 ++++++++++++++++ 3 files changed, 22 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index df28741e49c9..969f0889fbea 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -146,6 +146,7 @@ config ARM64 select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_PREL32_RELOCATIONS + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_STACKLEAK select HAVE_ARCH_THREAD_STRUCT_WHITELIST diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 4cf75b247461..312c164db2ed 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -9,6 +9,11 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
+# Remove stack protector to avoid triggering unneeded stack canary +# checks due to randomize_kstack_offset. +CFLAGS_REMOVE_syscall.o = -fstack-protector -fstack-protector-strong +CFLAGS_syscall.o += -fno-stack-protector + # Object file lists. obj-y := debug-monitors.o entry.o irq.o fpsimd.o \ entry-common.o entry-fpsimd.o process.o ptrace.o \ diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 66ca9534bd69..2a106e67a4cb 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -5,6 +5,7 @@ #include <linux/errno.h> #include <linux/nospec.h> #include <linux/ptrace.h> +#include <linux/randomize_kstack.h> #include <linux/syscalls.h>
#include <asm/daifflags.h> @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, { long ret;
+ add_random_kstack_offset(); + if (scno < sc_nr) { syscall_fn_t syscall_fn; syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)]; @@ -51,6 +54,19 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, }
syscall_set_return_value(current, regs, 0, ret); + + /* + * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(), + * but not enough for arm64 stack utilization comfort. To keep + * reasonable stack head room, reduce the maximum offset to 9 bits. + * + * The actual entropy will be further reduced by the compiler when + * applying stack alignment constraints: the AAPCS mandates a + * 16-byte (i.e. 4-bit) aligned SP at function boundaries. + * + * The resulting 5 bits of entropy is seen in SP[8:4]. + */ + choose_random_kstack_offset(get_random_int() & 0x1FF); }
static inline bool has_syscall_work(unsigned long flags)
From: Kees Cook keescook@chromium.org
mainline inclusion from mainline-v5.13-rc1 commit 68ef8735d253f3d840082b78f996bf2d89ee6e5f category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
For validating the stack offset behavior, report the offset from a given process's first seen stack address. Add s script to calculate the results to the LKDTM kselftests.
Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20210401232347.2791257-7-keescook@chromium.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/misc/lkdtm/bugs.c | 17 +++++++++ drivers/misc/lkdtm/core.c | 1 + drivers/misc/lkdtm/lkdtm.h | 1 + tools/testing/selftests/lkdtm/.gitignore | 1 + tools/testing/selftests/lkdtm/Makefile | 1 + .../testing/selftests/lkdtm/stack-entropy.sh | 36 +++++++++++++++++++ 6 files changed, 57 insertions(+) create mode 100755 tools/testing/selftests/lkdtm/stack-entropy.sh
diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c index d39b8139b096..5094835d2567 100644 --- a/drivers/misc/lkdtm/bugs.c +++ b/drivers/misc/lkdtm/bugs.c @@ -134,6 +134,23 @@ noinline void lkdtm_CORRUPT_STACK_STRONG(void) __lkdtm_CORRUPT_STACK((void *)&data); }
+static pid_t stack_pid; +static unsigned long stack_addr; + +void lkdtm_REPORT_STACK(void) +{ + volatile uintptr_t magic; + pid_t pid = task_pid_nr(current); + + if (pid != stack_pid) { + pr_info("Starting stack offset tracking for pid %d\n", pid); + stack_pid = pid; + stack_addr = (uintptr_t)&magic; + } + + pr_info("Stack offset: %d\n", (int)(stack_addr - (uintptr_t)&magic)); +} + void lkdtm_UNALIGNED_LOAD_STORE_WRITE(void) { static u8 data[5] __attribute__((aligned(4))) = {1, 2, 3, 4, 5}; diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 32b3d77368e3..2c0bb716a119 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -110,6 +110,7 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(EXHAUST_STACK), CRASHTYPE(CORRUPT_STACK), CRASHTYPE(CORRUPT_STACK_STRONG), + CRASHTYPE(REPORT_STACK), CRASHTYPE(CORRUPT_LIST_ADD), CRASHTYPE(CORRUPT_LIST_DEL), CRASHTYPE(STACK_GUARD_PAGE_LEADING), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 6dec4c9b442f..6e35ab171ea3 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -17,6 +17,7 @@ void lkdtm_LOOP(void); void lkdtm_EXHAUST_STACK(void); void lkdtm_CORRUPT_STACK(void); void lkdtm_CORRUPT_STACK_STRONG(void); +void lkdtm_REPORT_STACK(void); void lkdtm_UNALIGNED_LOAD_STORE_WRITE(void); void lkdtm_SOFTLOCKUP(void); void lkdtm_HARDLOCKUP(void); diff --git a/tools/testing/selftests/lkdtm/.gitignore b/tools/testing/selftests/lkdtm/.gitignore index f26212605b6b..d4b0be857deb 100644 --- a/tools/testing/selftests/lkdtm/.gitignore +++ b/tools/testing/selftests/lkdtm/.gitignore @@ -1,2 +1,3 @@ *.sh !run.sh +!stack-entropy.sh diff --git a/tools/testing/selftests/lkdtm/Makefile b/tools/testing/selftests/lkdtm/Makefile index 1bcc9ee990eb..c71109ceeb2d 100644 --- a/tools/testing/selftests/lkdtm/Makefile +++ b/tools/testing/selftests/lkdtm/Makefile @@ -5,6 +5,7 @@ include ../lib.mk
# NOTE: $(OUTPUT) won't get default value if used before lib.mk TEST_FILES := tests.txt +TEST_PROGS := stack-entropy.sh TEST_GEN_PROGS = $(patsubst %,$(OUTPUT)/%.sh,$(shell awk '{print $$1}' tests.txt | sed -e 's/#//')) all: $(TEST_GEN_PROGS)
diff --git a/tools/testing/selftests/lkdtm/stack-entropy.sh b/tools/testing/selftests/lkdtm/stack-entropy.sh new file mode 100755 index 000000000000..b1b8a5097cbb --- /dev/null +++ b/tools/testing/selftests/lkdtm/stack-entropy.sh @@ -0,0 +1,36 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Measure kernel stack entropy by sampling via LKDTM's REPORT_STACK test. +set -e +samples="${1:-1000}" + +# Capture dmesg continuously since it may fill up depending on sample size. +log=$(mktemp -t stack-entropy-XXXXXX) +dmesg --follow >"$log" & pid=$! +report=-1 +for i in $(seq 1 $samples); do + echo "REPORT_STACK" >/sys/kernel/debug/provoke-crash/DIRECT + if [ -t 1 ]; then + percent=$(( 100 * $i / $samples )) + if [ "$percent" -ne "$report" ]; then + /bin/echo -en "$percent%\r" + report="$percent" + fi + fi +done +kill "$pid" + +# Count unique offsets since last run. +seen=$(tac "$log" | grep -m1 -B"$samples"0 'Starting stack offset' | \ + grep 'Stack offset' | awk '{print $NF}' | sort | uniq -c | wc -l) +bits=$(echo "obase=2; $seen" | bc | wc -L) +echo "Bits of stack entropy: $bits" +rm -f "$log" + +# We would expect any functional stack randomization to be at least 5 bits. +if [ "$bits" -lt 5 ]; then + exit 1 +else + exit 0 +fi
From: Nick Desaulniers ndesaulniers@google.com
mainline inclusion from mainline-v5.13-rc1 commit 2515dd6ce8e545b0b2eece84920048ef9ed846c4 category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
"o" isn't a common asm() constraint to use; it triggers an assertion in assert-enabled builds of LLVM that it's not recognized when targeting aarch64 (though it appears to fall back to "m"). It's fixed in LLVM 13 now, but there isn't really a good reason to use "o" in particular here. To avoid causing build issues for those using assert-enabled builds of earlier LLVM versions, the constraint needs changing.
Instead, if the point is to retain the __builtin_alloca(), make ptr appear to "escape" via being an input to an empty inline asm block. This is preferable anyways, since otherwise this looks like a dead store.
While the use of "r" was considered in
https://lore.kernel.org/lkml/202104011447.2E7F543@keescook/
it was only tested as an output (which looks like a dead store, and wasn't sufficient).
Use "r" as an input constraint instead, which behaves correctly across compilers and architectures.
Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Kees Cook keescook@chromium.org Tested-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Nathan Chancellor nathan@kernel.org Link: https://reviews.llvm.org/D100412 Link: https://bugs.llvm.org/show_bug.cgi?id=49956 Link: https://lore.kernel.org/r/20210419231741.4084415-1-keescook@chromium.org Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/randomize_kstack.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index fd80fab663a9..bebc911161b6 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -38,7 +38,7 @@ void *__builtin_alloca(size_t size); u32 offset = raw_cpu_read(kstack_offset); \ u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset)); \ /* Keep allocation even after "ptr" loses scope. */ \ - asm volatile("" : "=o"(*ptr) :: "memory"); \ + asm volatile("" :: "r"(ptr) : "memory"); \ } \ } while (0)
From: Marco Elver elver@google.com
mainline inclusion from mainline-v5.18-rc1 commit 8cb37a5974a48569aab8a1736d21399fddbdbdb2 category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The randomize_kstack_offset feature is unconditionally compiled in when the architecture supports it.
To add constraints on compiler versions, we require a dedicated Kconfig variable. Therefore, introduce RANDOMIZE_KSTACK_OFFSET.
Furthermore, this option is now also configurable by EXPERT kernels: while the feature is supposed to have zero performance overhead when disabled, due to its use of static branches, there are few cases where giving a distribution the option to disable the feature entirely makes sense. For example, in very resource constrained environments, which would never enable the feature to begin with, in which case the additional kernel code size increase would be redundant.
Signed-off-by: Marco Elver elver@google.com Reviewed-by: Nathan Chancellor nathan@kernel.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Kees Cook keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20220131090521.1947110-1-elver@google.com Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/Kconfig | 23 ++++++++++++++++++----- include/linux/randomize_kstack.h | 5 +++++ init/main.c | 2 +- 3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig index 97fc1bdcdc8d..28d570a63f43 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1001,16 +1001,29 @@ config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET to the compiler, so it will attempt to add canary checks regardless of the static branch state.
-config RANDOMIZE_KSTACK_OFFSET_DEFAULT - bool "Randomize kernel stack offset on syscall entry" +config RANDOMIZE_KSTACK_OFFSET + bool "Support for randomizing kernel stack offset on syscall entry" if EXPERT + default y depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET help The kernel stack offset can be randomized (after pt_regs) by roughly 5 bits of entropy, frustrating memory corruption attacks that depend on stack address determinism or - cross-syscall address exposures. This feature is controlled - by kernel boot param "randomize_kstack_offset=on/off", and this - config chooses the default boot state. + cross-syscall address exposures. + + The feature is controlled via the "randomize_kstack_offset=on/off" + kernel boot param, and if turned off has zero overhead due to its use + of static branches (see JUMP_LABEL). + + If unsure, say Y. + +config RANDOMIZE_KSTACK_OFFSET_DEFAULT + bool "Default state of kernel stack offset randomization" + depends on RANDOMIZE_KSTACK_OFFSET + help + Kernel stack offset randomization is controlled by kernel boot param + "randomize_kstack_offset=on/off", and this config chooses the default + boot state.
config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index bebc911161b6..91f1b990a3c3 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -2,6 +2,7 @@ #ifndef _LINUX_RANDOMIZE_KSTACK_H #define _LINUX_RANDOMIZE_KSTACK_H
+#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET #include <linux/kernel.h> #include <linux/jump_label.h> #include <linux/percpu-defs.h> @@ -50,5 +51,9 @@ void *__builtin_alloca(size_t size); raw_cpu_write(kstack_offset, offset); \ } \ } while (0) +#else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */ +#define add_random_kstack_offset() do { } while (0) +#define choose_random_kstack_offset(rand) do { } while (0) +#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
#endif diff --git a/init/main.c b/init/main.c index e1d179fa1f4a..3660c43291ce 100644 --- a/init/main.c +++ b/init/main.c @@ -846,7 +846,7 @@ static void __init mm_init(void) pti_init(); }
-#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, randomize_kstack_offset); DEFINE_PER_CPU(u32, kstack_offset);
From: "GONG, Ruiqi" gongruiqi1@huawei.com
mainline inclusion from mainline-v6.0-rc1 commit 375561bd6195a31bf4c109732bd538cb97a941f4 category: featrue bugzilla: https://gitee.com/openeuler/kernel/issues/I5YQ6Z CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Fix the following Sparse warnings that got noticed when the PPC-dev patchwork was checking another patch (see the link below):
init/main.c:862:1: warning: symbol 'randomize_kstack_offset' was not declared. Should it be static? init/main.c:864:1: warning: symbol 'kstack_offset' was not declared. Should it be static?
Which in fact are triggered on all architectures that have HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET support (for instances x86, arm64 etc).
Link: https://lore.kernel.org/lkml/e7b0d68b-914d-7283-827c-101988923929@huawei.com... Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: GONG, Ruiqi gongruiqi1@huawei.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20220629060423.2515693-1-gongruiqi1@huawei.com
conflict: init/main.c
Signed-off-by: Yi Yang yiyang13@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- init/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/init/main.c b/init/main.c index 3660c43291ce..21b65f18ba83 100644 --- a/init/main.c +++ b/init/main.c @@ -99,6 +99,7 @@ #include <linux/mem_encrypt.h> #include <linux/kcsan.h> #include <linux/init_syscalls.h> +#include <linux/randomize_kstack.h>
#include <asm/io.h> #include <asm/bugs.h>
From: Tejun Heo tj@kernel.org
mainline inclusion from mainline-v5.18-rc1 commit 6b2b04590b51aa4cf395fcd185ce439cab5961dc category: bugfix bugzilla: 187443, https://gitee.com/openeuler/kernel/issues/I5Z7O2 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs...
---------------------------
blk-iocost and iolatency are cgroup aware rq-qos policies but they didn't disable merges across different cgroups. This obviously can lead to accounting and control errors but more importantly to priority inversions - e.g. an IO which belongs to a higher priority cgroup or IO class may end up getting throttled incorrectly because it gets merged to an IO issued from a low priority cgroup.
Fix it by adding blk_cgroup_mergeable() which is called from merge paths and rejects cross-cgroup and cross-issue_as_root merges.
Signed-off-by: Tejun Heo tj@kernel.org Fixes: d70675121546 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Cc: Josef Bacik jbacik@fb.com Link: https://lore.kernel.org/r/Yi/eE/6zFNyWJ+qd@slm.duckdns.org Signed-off-by: Jens Axboe axboe@kernel.dk
conflicts: block/blk-merge.c include/linux/blk-cgroup.h
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-merge.c | 11 +++++++++++ include/linux/blk-cgroup.h | 17 +++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/block/blk-merge.c b/block/blk-merge.c index cfb88d2fbf38..f3fca8bb326d 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -7,6 +7,7 @@ #include <linux/bio.h> #include <linux/blkdev.h> #include <linux/scatterlist.h> +#include <linux/blk-cgroup.h>
#include <trace/events/block.h>
@@ -554,6 +555,9 @@ static inline unsigned int blk_rq_get_max_segments(struct request *rq) static inline int ll_new_hw_segment(struct request *req, struct bio *bio, unsigned int nr_phys_segs) { + if (!blk_cgroup_mergeable(req, bio)) + goto no_merge; + if (blk_integrity_merge_bio(req->q, req, bio) == false) goto no_merge;
@@ -650,6 +654,9 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, if (total_phys_segments > blk_rq_get_max_segments(req)) return 0;
+ if (!blk_cgroup_mergeable(req, next->bio)) + return 0; + if (blk_integrity_merge_rq(q, req, next) == false) return 0;
@@ -860,6 +867,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (rq->rq_disk != bio->bi_disk) return false;
+ /* don't merge across cgroup boundaries */ + if (!blk_cgroup_mergeable(rq, bio)) + return false; + /* only merge integrity protected bio into ditto rq */ if (blk_integrity_merge_bio(rq->q, rq, bio) == false) return false; diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index b44db9835489..dac9804907df 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -25,6 +25,7 @@ #include <linux/atomic.h> #include <linux/kthread.h> #include <linux/fs.h> +#include <linux/blk-mq.h>
/* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) @@ -610,6 +611,21 @@ static inline void blkcg_clear_delay(struct blkcg_gq *blkg) atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); }
+/** + * blk_cgroup_mergeable - Determine whether to allow or disallow merges + * @rq: request to merge into + * @bio: bio to merge + * + * @bio and @rq should belong to the same cgroup and their issue_as_root should + * match. The latter is necessary as we don't want to throttle e.g. a metadata + * update because it happens to be next to a regular IO. + */ +static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) +{ + return rq->bio->bi_blkg == bio->bi_blkg && + bio_issue_as_root_blkg(rq->bio) == bio_issue_as_root_blkg(bio); +} + void blk_cgroup_bio_start(struct bio *bio); void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); @@ -665,6 +681,7 @@ static inline void blkg_put(struct blkcg_gq *blkg) { } static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } static inline void blkcg_bio_issue_init(struct bio *bio) { } static inline void blk_cgroup_bio_start(struct bio *bio) { } +static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) { return true; }
#define blk_queue_for_each_rl(rl, q) \ for ((rl) = &(q)->root_rl; (rl); (rl) = NULL)
From: Li Nan linan122@huawei.com
hulk inclusion category: bugfix bugzilla: 187443, https://gitee.com/openeuler/kernel/issues/I5Z7O2 CVE: NA
--------------------------------
Include additional files and add new function will cause kabi broken. So move changes to blk-mq.h. bio_issue_as_root_blkg() is needed by blk_cgroup_mergeable(), move it together. It is used by iocost, too, so add blk-mq.h to blk-iocost.c.
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-iocost.c | 1 + block/blk-merge.c | 1 - block/blk-mq.h | 34 ++++++++++++++++++++++++++++++++++ include/linux/blk-cgroup.h | 33 --------------------------------- 4 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 08e4ba856e3b..462dbb766ed1 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -184,6 +184,7 @@ #include "blk-rq-qos.h" #include "blk-stat.h" #include "blk-wbt.h" +#include "blk-mq.h"
#ifdef CONFIG_TRACEPOINTS
diff --git a/block/blk-merge.c b/block/blk-merge.c index f3fca8bb326d..1ed1b5146db2 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -7,7 +7,6 @@ #include <linux/bio.h> #include <linux/blkdev.h> #include <linux/scatterlist.h> -#include <linux/blk-cgroup.h>
#include <trace/events/block.h>
diff --git a/block/blk-mq.h b/block/blk-mq.h index 5572277cf9a3..1c86f7d56e72 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -338,5 +338,39 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, return __blk_mq_active_requests(hctx) < depth; }
+/** + * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg + * @return: true if this bio needs to be submitted with the root blkg context. + * + * In order to avoid priority inversions we sometimes need to issue a bio as if + * it were attached to the root blkg, and then backcharge to the actual owning + * blkg. The idea is we do bio_blkcg() to look up the actual context for the + * bio and attach the appropriate blkg to the bio. Then we call this helper and + * if it is true run with the root blkg for that queue and then do any + * backcharging to the originating cgroup once the io is complete. + */ +static inline bool bio_issue_as_root_blkg(struct bio *bio) +{ + return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; +} + +#ifdef CONFIG_BLK_CGROUP +/** + * blk_cgroup_mergeable - Determine whether to allow or disallow merges + * @rq: request to merge into + * @bio: bio to merge + * + * @bio and @rq should belong to the same cgroup and their issue_as_root should + * match. The latter is necessary as we don't want to throttle e.g. a metadata + * update because it happens to be next to a regular IO. + */ +static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) +{ + return rq->bio->bi_blkg == bio->bi_blkg && + bio_issue_as_root_blkg(rq->bio) == bio_issue_as_root_blkg(bio); +} +#else /* CONFIG_BLK_CGROUP */ +static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) { return true; } +#endif /* CONFIG_BLK_CGROUP */
#endif diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index dac9804907df..994ff06de40f 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -25,7 +25,6 @@ #include <linux/atomic.h> #include <linux/kthread.h> #include <linux/fs.h> -#include <linux/blk-mq.h>
/* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) @@ -297,22 +296,6 @@ static inline bool blk_cgroup_congested(void) return ret; }
-/** - * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg - * @return: true if this bio needs to be submitted with the root blkg context. - * - * In order to avoid priority inversions we sometimes need to issue a bio as if - * it were attached to the root blkg, and then backcharge to the actual owning - * blkg. The idea is we do bio_blkcg() to look up the actual context for the - * bio and attach the appropriate blkg to the bio. Then we call this helper and - * if it is true run with the root blkg for that queue and then do any - * backcharging to the originating cgroup once the io is complete. - */ -static inline bool bio_issue_as_root_blkg(struct bio *bio) -{ - return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; -} - /** * blkcg_parent - get the parent of a blkcg * @blkcg: blkcg of interest @@ -611,21 +594,6 @@ static inline void blkcg_clear_delay(struct blkcg_gq *blkg) atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); }
-/** - * blk_cgroup_mergeable - Determine whether to allow or disallow merges - * @rq: request to merge into - * @bio: bio to merge - * - * @bio and @rq should belong to the same cgroup and their issue_as_root should - * match. The latter is necessary as we don't want to throttle e.g. a metadata - * update because it happens to be next to a regular IO. - */ -static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) -{ - return rq->bio->bi_blkg == bio->bi_blkg && - bio_issue_as_root_blkg(rq->bio) == bio_issue_as_root_blkg(bio); -} - void blk_cgroup_bio_start(struct bio *bio); void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); @@ -681,7 +649,6 @@ static inline void blkg_put(struct blkcg_gq *blkg) { } static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } static inline void blkcg_bio_issue_init(struct bio *bio) { } static inline void blk_cgroup_bio_start(struct bio *bio) { } -static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) { return true; }
#define blk_queue_for_each_rl(rl, q) \ for ((rl) = &(q)->root_rl; (rl); (rl) = NULL)
From: Qi Liu liuqi115@huawei.com
mainline inclusion from mainline-v5.17-rc1 commit 20c634932ae8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
A user may issue a control phy command from sysfs at any time, even if the controller is resetting.
If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect.
To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running.
Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huaw... Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index a1c6a67da132..482be0a461f8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1168,6 +1168,7 @@ static int hisi_sas_control_phy(struct asd_sas_phy *sas_phy, enum phy_func func, u8 sts = phy->phy_attached; int ret = 0;
+ down(&hisi_hba->sem); phy->reset_completion = &completion;
switch (func) { @@ -1211,6 +1212,7 @@ static int hisi_sas_control_phy(struct asd_sas_phy *sas_phy, enum phy_func func, out: phy->reset_completion = NULL;
+ up(&hisi_hba->sem); return ret; }
From: Qi Liu liuqi115@huawei.com
mainline inclusion from mainline-v5.17-rc1 commit 16775db613c2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
If we issue a controller reset command during executing a FLR a hung task may be found:
Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18
This is a race condition which occurs when the FLR completes first.
Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .
Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huaw... Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++++--- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 482be0a461f8..09809c3bd317 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1600,7 +1600,6 @@ void hisi_sas_controller_reset_prepare(struct hisi_hba *hisi_hba) { struct Scsi_Host *shost = hisi_hba->shost;
- down(&hisi_hba->sem); hisi_hba->phy_state = hisi_hba->hw->get_phys_state(hisi_hba);
scsi_block_requests(shost); @@ -1626,9 +1625,9 @@ void hisi_sas_controller_reset_done(struct hisi_hba *hisi_hba) if (hisi_hba->reject_stp_links_msk) hisi_sas_terminate_stp_reject(hisi_hba); hisi_sas_reset_init_all_devices(hisi_hba); - up(&hisi_hba->sem); scsi_unblock_requests(shost); clear_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags); + up(&hisi_hba->sem);
hisi_sas_rescan_topology(hisi_hba, hisi_hba->phy_state); } @@ -1639,8 +1638,11 @@ static int hisi_sas_controller_prereset(struct hisi_hba *hisi_hba) if (!hisi_hba->hw->soft_reset) return -1;
- if (test_and_set_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) + down(&hisi_hba->sem); + if (test_and_set_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) { + up(&hisi_hba->sem); return -1; + }
if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct[0].itct) hisi_hba->hw->debugfs_snapshot_regs(hisi_hba); diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index dcbda8edd03f..bd7d21a13145 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -4915,6 +4915,7 @@ static void hisi_sas_reset_prepare_v3_hw(struct pci_dev *pdev) int rc;
dev_info(dev, "FLR prepare\n"); + down(&hisi_hba->sem); set_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags); hisi_sas_controller_reset_prepare(hisi_hba);
From: Qi Liu liuqi115@huawei.com
mainline inclusion from mainline-v5.17-rc1 commit 37310bad7fa6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger.
To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes.
Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huaw... Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 18 +++++++++++++----- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 10 ++++++++-- 2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 09809c3bd317..03501a2d6512 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -929,10 +929,14 @@ void hisi_sas_phy_oob_ready(struct hisi_hba *hisi_hba, int phy_no) { struct hisi_sas_phy *phy = &hisi_hba->phy[phy_no]; struct device *dev = hisi_hba->dev; + unsigned long flags;
dev_dbg(dev, "phy%d OOB ready\n", phy_no); - if (phy->phy_attached) + spin_lock_irqsave(&phy->lock, flags); + if (phy->phy_attached) { + spin_unlock_irqrestore(&phy->lock, flags); return; + }
if (!timer_pending(&phy->timer)) { if (phy->wait_phyup_cnt < HISI_SAS_WAIT_PHYUP_RETRIES) { @@ -940,13 +944,17 @@ void hisi_sas_phy_oob_ready(struct hisi_hba *hisi_hba, int phy_no) phy->timer.expires = jiffies + HISI_SAS_WAIT_PHYUP_TIMEOUT; add_timer(&phy->timer); - } else { - dev_warn(dev, "phy%d failed to come up %d times, giving up\n", - phy_no, phy->wait_phyup_cnt); - phy->wait_phyup_cnt = 0; + spin_unlock_irqrestore(&phy->lock, flags); + return; } + + dev_warn(dev, "phy%d failed to come up %d times, giving up\n", + phy_no, phy->wait_phyup_cnt); + phy->wait_phyup_cnt = 0; } + spin_unlock_irqrestore(&phy->lock, flags); } + EXPORT_SYMBOL_GPL(hisi_sas_phy_oob_ready);
static void hisi_sas_phy_init(struct hisi_hba *hisi_hba, int phy_no) diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index bd7d21a13145..ef8d7b82fd01 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -1481,7 +1481,6 @@ static irqreturn_t phy_up_v3_hw(int phy_no, struct hisi_hba *hisi_hba) struct asd_sas_phy *sas_phy = &phy->sas_phy; struct device *dev = hisi_hba->dev;
- del_timer(&phy->timer); hisi_sas_phy_write32(hisi_hba, phy_no, PHYCTRL_PHY_ENA_MSK, 1);
port_id = hisi_sas_read32(hisi_hba, PHY_PORT_NUM_MA); @@ -1558,11 +1557,18 @@ static irqreturn_t phy_up_v3_hw(int phy_no, struct hisi_hba *hisi_hba) }
phy->port_id = port_id; - phy->phy_attached = 1; + /* Call pm_runtime_put_sync() with pairs in hisi_sas_phyup_pm_work() */ pm_runtime_get_noresume(dev); hisi_sas_notify_phy_event(phy, HISI_PHYE_PHY_UP_PM); + res = IRQ_HANDLED; + + spin_lock(&phy->lock); + /* Delete timer and set phy_attached atomically */ + del_timer(&phy->timer); + phy->phy_attached = 1; + spin_unlock(&phy->lock); end: if (phy->reset_completion) complete(phy->reset_completion);
From: Xiang Chen chenxiang66@hisilicon.com
mainline inclusion from mainline-v5.18-rc1 commit 512623de5239 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
The time of phyup not only depends on the controller but also the type of disk connected. As an example, from experience, for some SATA disks the amount of time from reset/power-on to receive the D2H FIS for phyup can take upto and more than 10s sometimes. According to the specification of some SATA disks such as ST14000NM0018, the max time from power-on to ready is 30s.
Based on this the current timeout of phyup at 2s which is not enough. So set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in hisi_sas_control_phy().
For v3 hw there is a pre-existing workaround for a HW bug, being that we issue a link reset when the OOB occurs but the phyup does not. The current phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from when issuing a phy enable or similar via hisi_sas_control_phy(), the subsequent HW workaround linkreset processing calls hisi_sas_control_phy(), but this will pend the original phy reset timing out, so it is safe.
Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawe... Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas.h | 2 +- drivers/scsi/hisi_sas/hisi_sas_main.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h index 679e38be439a..660da5b10193 100644 --- a/drivers/scsi/hisi_sas/hisi_sas.h +++ b/drivers/scsi/hisi_sas/hisi_sas.h @@ -92,7 +92,7 @@
#define HISI_SAS_PROT_MASK (HISI_SAS_DIF_PROT_MASK | HISI_SAS_DIX_PROT_MASK)
-#define HISI_SAS_WAIT_PHYUP_TIMEOUT (20 * HZ) +#define HISI_SAS_WAIT_PHYUP_TIMEOUT (30 * HZ) #define HISI_SAS_CLEAR_ITCT_TIMEOUT (20 * HZ)
struct hisi_hba; diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 03501a2d6512..d1766011cb22 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1210,7 +1210,8 @@ static int hisi_sas_control_phy(struct asd_sas_phy *sas_phy, enum phy_func func, goto out; }
- if (sts && !wait_for_completion_timeout(&completion, 2 * HZ)) { + if (sts && !wait_for_completion_timeout(&completion, + HISI_SAS_WAIT_PHYUP_TIMEOUT)) { dev_warn(dev, "phy%d wait phyup timed out for func %d\n", phy_no, func); if (phy->in_reset)
From: Xingui Yang yangxingui@huawei.com
mainline inclusion from mainline-v5.18-rc1 commit 62413199cd6d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
In case of SSP underflow allow the response frame IU to be examined for setting the response stat value rather than always setting SAS_DATA_UNDERRUN.
This will mean that we call sas_ssp_task_response() in those scenarios and may send sense data to upper layer.
Such a condition would be for bad blocks were we just reporting an underflow error to upper layer, but now the sense data will tell immediately that the media is faulty.
Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawe... Signed-off-by: Xingui Yang yangxingui@huawei.com Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 39 +++++++++++++++++--------- 1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index ef8d7b82fd01..68877da1c32b 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -405,6 +405,8 @@ #define CMPLT_HDR_ERROR_PHASE_MSK (0xff << CMPLT_HDR_ERROR_PHASE_OFF) #define CMPLT_HDR_RSPNS_XFRD_OFF 10 #define CMPLT_HDR_RSPNS_XFRD_MSK (0x1 << CMPLT_HDR_RSPNS_XFRD_OFF) +#define CMPLT_HDR_RSPNS_GOOD_OFF 11 +#define CMPLT_HDR_RSPNS_GOOD_MSK (0x1 << CMPLT_HDR_RSPNS_GOOD_OFF) #define CMPLT_HDR_ERX_OFF 12 #define CMPLT_HDR_ERX_MSK (0x1 << CMPLT_HDR_ERX_OFF) #define CMPLT_HDR_ABORT_STAT_OFF 13 @@ -2137,7 +2139,7 @@ static irqreturn_t fatal_axi_int_v3_hw(int irq_no, void *p) return IRQ_HANDLED; }
-static void +static bool slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, struct hisi_sas_slot *slot) { @@ -2155,6 +2157,15 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, switch (task->task_proto) { case SAS_PROTOCOL_SSP: if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) { + /* + * If returned response frame is incorrect because of data underflow, + * but I/O information has been written to the host memory, we examine + * response IU. + */ + if (!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_GOOD_MSK) && + (complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK)) + return false; + ts->residual = trans_tx_fail_type; ts->stat = SAS_DATA_UNDERRUN; } else if (dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) { @@ -2186,6 +2197,7 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, default: break; } + return true; }
static void slot_complete_v3_hw(struct hisi_hba *hisi_hba, @@ -2260,19 +2272,20 @@ static void slot_complete_v3_hw(struct hisi_hba *hisi_hba, if ((dw0 & CMPLT_HDR_CMPLT_MSK) == 0x3) { u32 *error_info = hisi_sas_status_buf_addr_mem(slot);
- slot_err_v3_hw(hisi_hba, task, slot); - if (ts->stat != SAS_DATA_UNDERRUN) - dev_info(dev, "erroneous completion iptt=%d task=%pK dev id=%d addr=%016llx CQ hdr: 0x%x 0x%x 0x%x 0x%x Error info: 0x%x 0x%x 0x%x 0x%x\n", - slot->idx, task, sas_dev->device_id, - SAS_ADDR(device->sas_addr), - dw0, dw1, complete_hdr->act, dw3, - error_info[0], error_info[1], - error_info[2], error_info[3]); - if (unlikely(slot->abort)) { - sas_task_abort(task); - return; + if (slot_err_v3_hw(hisi_hba, task, slot)) { + if (ts->stat != SAS_DATA_UNDERRUN) + dev_info(dev, "erroneous completion iptt=%d task=%pK dev id=%d addr=%016llx CQ hdr: 0x%x 0x%x 0x%x 0x%x Error info: 0x%x 0x%x 0x%x 0x%x\n", + slot->idx, task, sas_dev->device_id, + SAS_ADDR(device->sas_addr), + dw0, dw1, complete_hdr->act, dw3, + error_info[0], error_info[1], + error_info[2], error_info[3]); + if (unlikely(slot->abort)) { + sas_task_abort(task); + return; + } + goto out; } - goto out; }
switch (task->task_proto) {
From: Xiang Chen chenxiang66@hisilicon.com
mainline inclusion from mainline-v5.19-rc1 commit 9b5387fe5af3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
If we fail to notify the phy up event then undo the RPM resume, as the phy up notify event handling pairs with that RPM resume.
Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huaw... Reported-by: Yihang Li liyihang6@hisilicon.com Tested-by: Yihang Li liyihang6@hisilicon.com Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 68877da1c32b..713afad247e8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -1560,9 +1560,15 @@ static irqreturn_t phy_up_v3_hw(int phy_no, struct hisi_hba *hisi_hba)
phy->port_id = port_id;
- /* Call pm_runtime_put_sync() with pairs in hisi_sas_phyup_pm_work() */ + /* + * Call pm_runtime_get_noresume() which pairs with + * hisi_sas_phyup_pm_work() -> pm_runtime_put_sync(). + * For failure call pm_runtime_put() as we are in a hardirq context. + */ pm_runtime_get_noresume(dev); - hisi_sas_notify_phy_event(phy, HISI_PHYE_PHY_UP_PM); + res = hisi_sas_notify_phy_event(phy, HISI_PHYE_PHY_UP_PM); + if (!res) + pm_runtime_put(dev);
res = IRQ_HANDLED;
From: John Garry john.garry@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit 057e5fc03369 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Create function sas_ata_wait_after_reset() from sas_ata_hard_reset() as some LLDDs may want to check for a remote ATA phy is up after reset.
Link: https://lore.kernel.org/r/1652354134-171343-2-git-send-email-john.garry@huaw... Tested-by: Yihang Li liyihang6@hisilicon.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/libsas/sas_ata.c | 41 +++++++++++++++++++++++------------ include/scsi/sas_ata.h | 7 ++++++ 2 files changed, 34 insertions(+), 14 deletions(-)
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index f92b889369c3..d54f91636ad1 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -372,22 +372,14 @@ static int sas_ata_printk(const char *level, const struct domain_device *ddev, return r; }
-static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class, - unsigned long deadline) +int sas_ata_wait_after_reset(struct domain_device *dev, unsigned long deadline) { - int ret = 0, res; - struct sas_phy *phy; - struct ata_port *ap = link->ap; + struct sata_device *sata_dev = &dev->sata_dev; int (*check_ready)(struct ata_link *link); - struct domain_device *dev = ap->private_data; - struct sas_internal *i = dev_to_sas_internal(dev); - - res = i->dft->lldd_I_T_nexus_reset(dev); - if (res == -ENODEV) - return res; - - if (res != TMF_RESP_FUNC_COMPLETE) - sas_ata_printk(KERN_DEBUG, dev, "Unable to reset ata device?\n"); + struct ata_port *ap = sata_dev->ap; + struct ata_link *link = &ap->link; + struct sas_phy *phy; + int ret;
phy = sas_get_local_phy(dev); if (scsi_is_sas_phy_local(phy)) @@ -400,6 +392,27 @@ static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class, if (ret && ret != -EAGAIN) sas_ata_printk(KERN_ERR, dev, "reset failed (errno=%d)\n", ret);
+ return ret; +} +EXPORT_SYMBOL_GPL(sas_ata_wait_after_reset); + +static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class, + unsigned long deadline) +{ + struct ata_port *ap = link->ap; + struct domain_device *dev = ap->private_data; + struct sas_internal *i = dev_to_sas_internal(dev); + int ret; + + ret = i->dft->lldd_I_T_nexus_reset(dev); + if (ret == -ENODEV) + return ret; + + if (ret != TMF_RESP_FUNC_COMPLETE) + sas_ata_printk(KERN_DEBUG, dev, "Unable to reset ata device?\n"); + + ret = sas_ata_wait_after_reset(dev, deadline); + *class = dev->sata_dev.class;
ap->cbl = ATA_CBL_SATA; diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h index 416c9c47d0e7..e0d2d4915257 100644 --- a/include/scsi/sas_ata.h +++ b/include/scsi/sas_ata.h @@ -33,6 +33,7 @@ void sas_probe_sata(struct asd_sas_port *port); void sas_suspend_sata(struct asd_sas_port *port); void sas_resume_sata(struct asd_sas_port *port); void sas_ata_end_eh(struct ata_port *ap); +int sas_ata_wait_after_reset(struct domain_device *dev, unsigned long deadline); #else
@@ -85,6 +86,12 @@ static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy static inline void sas_ata_end_eh(struct ata_port *ap) { } + +static inline int sas_ata_wait_after_reset(struct domain_device *dev, + unsigned long deadline) +{ + return -ETIMEDOUT; +} #endif
#endif /* _SAS_ATA_H_ */
From: John Garry john.garry@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit 71453bd9d1bf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
We have seen errors like this when a SATA device is probed:
[524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ... [524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00
Since commit 21c7e972475e ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure"), we issue an ATA softreset to disks after a phy reset to ensure that they are in sound working order. If the softreset is issued before the remote phy has come back up then the softreset will fail (errors as above). Remedy this by waiting for the phy to come back up after the reset.
Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huaw... Tested-by: Yihang Li liyihang6@hisilicon.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index d1766011cb22..d3f66cab4359 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1875,13 +1875,18 @@ static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device) /* report PHY down if timed out */ if (rc == -ETIMEDOUT) hisi_sas_phy_down(hisi_hba, sas_phy->id, 0, GFP_KERNEL); - } else if (sas_dev->dev_status != HISI_SAS_DEV_INIT) { - /* - * If in init state, we rely on caller to wait for link to be - * ready; otherwise, except phy reset is fail, delay. - */ - if (!rc) - msleep(2000); + return rc; + } + + if (rc) + return rc; + + /* Remote phy */ + if (dev_is_sata(device)) { + rc = sas_ata_wait_after_reset(device, + HISI_SAS_WAIT_PHYUP_TIMEOUT); + } else { + msleep(2000); }
return rc;
From: John Garry john.garry@huawei.com
mainline inclusion from mainline-v5.19-rc1 commit e9dedc13bb11 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Removing an ATA device via sysfs means that the device may not be found through re-scanning:
root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john#
The problem is that the rescan of the device may conflict with the device in being re-initialized, as follows:
- In the rescan we call hisi_sas_slave_alloc() in store_scan() -> sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc() -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device() In hisi_sas_init_device() we issue an IT nexus reset for ATA devices
- That IT nexus causes the remote PHY to go down and this triggers a bcast event
- In parallel libsas processes the bcast event, finds that the phy is down and marks the device as gone
The hard reset issued in hisi_sas_init_device() is unncessary - as described in the code comment - so remove it. Also set dev status as HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call.
Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huaw... Fixes: 36c6b7613ef1 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback") Tested-by: Yihang Li liyihang6@hisilicon.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_main.c | 47 ++++++++++----------------- 1 file changed, 18 insertions(+), 29 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index d3f66cab4359..b8249a055fbb 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -691,8 +691,6 @@ static int hisi_sas_init_device(struct domain_device *device) struct hisi_sas_tmf_task tmf_task; int retry = HISI_SAS_DISK_RECOVER_CNT; struct hisi_hba *hisi_hba = dev_to_hisi_hba(device); - struct device *dev = hisi_hba->dev; - struct sas_phy *local_phy;
switch (device->dev_type) { case SAS_END_DEVICE: @@ -713,30 +711,18 @@ static int hisi_sas_init_device(struct domain_device *device) case SAS_SATA_PM_PORT: case SAS_SATA_PENDING: /* - * send HARD RESET to clear previous affiliation of - * STP target port + * If an expander is swapped when a SATA disk is attached then + * we should issue a hard reset to clear previous affiliation + * of STP target port, see SPL (chapter 6.19.4). + * + * However we don't need to issue a hard reset here for these + * reasons: + * a. When probing the device, libsas/libata already issues a + * hard reset in sas_probe_sata() -> ata_sas_async_probe(). + * Note that in hisi_sas_debug_I_T_nexus_reset() we take care + * to issue a hard reset by checking the dev status (== INIT). + * b. When resetting the controller, this is simply unnecessary. */ - local_phy = sas_get_local_phy(device); - if (!scsi_is_sas_phy_local(local_phy) && - !test_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) { - unsigned long deadline = ata_deadline(jiffies, 20000); - struct sata_device *sata_dev = &device->sata_dev; - struct ata_host *ata_host = sata_dev->ata_host; - struct ata_port_operations *ops = ata_host->ops; - struct ata_port *ap = sata_dev->ap; - struct ata_link *link; - unsigned int classes; - - ata_for_each_link(link, ap, EDGE) - rc = ops->hardreset(link, &classes, - deadline); - } - sas_put_local_phy(local_phy); - if (rc) { - dev_warn(dev, "SATA disk hardreset fail: %d\n", rc); - return rc; - } - while (retry-- > 0) { rc = hisi_sas_softreset_ata_disk(device); if (!rc) @@ -752,15 +738,19 @@ static int hisi_sas_init_device(struct domain_device *device)
int hisi_sas_slave_alloc(struct scsi_device *sdev) { - struct domain_device *ddev; + struct domain_device *ddev = sdev_to_domain_dev(sdev); + struct hisi_sas_device *sas_dev = ddev->lldd_dev; int rc;
rc = sas_slave_alloc(sdev); if (rc) return rc; - ddev = sdev_to_domain_dev(sdev);
- return hisi_sas_init_device(ddev); + rc = hisi_sas_init_device(ddev); + if (rc) + return rc; + sas_dev->dev_status = HISI_SAS_DEV_NORMAL; + return 0; } EXPORT_SYMBOL_GPL(hisi_sas_slave_alloc);
@@ -810,7 +800,6 @@ static int hisi_sas_dev_found(struct domain_device *device) dev_info(dev, "dev[%d:%x] found\n", sas_dev->device_id, sas_dev->dev_type);
- sas_dev->dev_status = HISI_SAS_DEV_NORMAL; return 0;
err_out:
From: Xingui Yang yangxingui@huawei.com
mainline inclusion from mainline-v6.0-rc1 commit 7e15334f5d25 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
If the I/O completion response frame returned by the target device has been written to the host memory and the err bit in the status field of the received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will let EH analyze and further determine cause of failure.
Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huaw... Signed-off-by: Xingui Yang yangxingui@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: xiabing xiabing12@h-partners.com Reviewed-by: Xiang Chen chenxiang66@hisilicon.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 713afad247e8..311772b23a7d 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -480,6 +480,9 @@ struct hisi_sas_err_record_v3 { #define RX_DATA_LEN_UNDERFLOW_OFF 6 #define RX_DATA_LEN_UNDERFLOW_MSK (1 << RX_DATA_LEN_UNDERFLOW_OFF)
+#define RX_FIS_STATUS_ERR_OFF 0 +#define RX_FIS_STATUS_ERR_MSK (1 << RX_FIS_STATUS_ERR_OFF) + #define HISI_SAS_COMMAND_ENTRIES_V3_HW 4096 #define HISI_SAS_MSI_COUNT_V3_HW 32
@@ -2158,6 +2161,7 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, hisi_sas_status_buf_addr_mem(slot); u32 dma_rx_err_type = le32_to_cpu(record->dma_rx_err_type); u32 trans_tx_fail_type = le32_to_cpu(record->trans_tx_fail_type); + u16 sipc_rx_err_type = le16_to_cpu(record->sipc_rx_err_type); u32 dw3 = le32_to_cpu(complete_hdr->dw3);
switch (task->task_proto) { @@ -2185,7 +2189,10 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, case SAS_PROTOCOL_SATA: case SAS_PROTOCOL_STP: case SAS_PROTOCOL_SATA | SAS_PROTOCOL_STP: - if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) { + if ((complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) && + (sipc_rx_err_type & RX_FIS_STATUS_ERR_MSK)) { + ts->stat = SAS_PROTO_RESPONSE; + } else if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) { ts->residual = trans_tx_fail_type; ts->stat = SAS_DATA_UNDERRUN; } else if (dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) {
From: Wei Yongjun weiyongjun1@huawei.com
mainline inclusion from mainline-v5.12-rc1 commit f6692213b5045dc461ce0858fb18cf46f328c202 category: bugfix bugzilla: 187303, https://gitee.com/openeuler/kernel/issues/I5ZXPC CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
The sparse tool complains as follows:
security/integrity/digsig.c:146:12: warning: symbol 'integrity_add_key' was not declared. Should it be static?
This function is not used outside of digsig.c, so this commit marks it static.
Reported-by: Hulk Robot hulkci@huawei.com Fixes: 60740accf784 ("integrity: Load certs to the platform keyring") Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Nayna Jain nayna@linux.ibm.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: GUO Zihua guozihua@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- security/integrity/digsig.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c index 0f518dcfde05..250fb0836156 100644 --- a/security/integrity/digsig.c +++ b/security/integrity/digsig.c @@ -143,8 +143,8 @@ int __init integrity_init_keyring(const unsigned int id) return __integrity_init_keyring(id, perm, restriction); }
-int __init integrity_add_key(const unsigned int id, const void *data, - off_t size, key_perm_t perm) +static int __init integrity_add_key(const unsigned int id, const void *data, + off_t size, key_perm_t perm) { key_ref_t key; int rc = 0;
From: Jing-Ting Wu Jing-Ting.Wu@mediatek.com
mainline inclusion from mainline-v6.0-rc3 commit 763f4fb76e24959c370cdaa889b2492ba6175580 category: bugfix bugzilla: 187794, https://gitee.com/openeuler/kernel/issues/I5ZXVK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------
Root cause: The rebind_subsystems() is no lock held when move css object from A list to B list,then let B's head be treated as css node at list_for_each_entry_rcu().
Solution: Add grace period before invalidating the removed rstat_css_node.
Reported-by: Jing-Ting Wu jing-ting.wu@mediatek.com Suggested-by: Michal Koutný mkoutny@suse.com Signed-off-by: Jing-Ting Wu jing-ting.wu@mediatek.com Tested-by: Jing-Ting Wu jing-ting.wu@mediatek.com Link: https://lore.kernel.org/linux-arm-kernel/d8f0bc5e2fb6ed259f9334c83279b4c0112... Acked-by: Mukesh Ojha quic_mojha@quicinc.com Fixes: a7df69b81aac ("cgroup: rstat: support cgroup1") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/cgroup/cgroup.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 57f4e19df8c6..46d5c120c626 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1781,6 +1781,7 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
if (ss->css_rstat_flush) { list_del_rcu(&css->rstat_css_node); + synchronize_rcu(); list_add_rcu(&css->rstat_css_node, &dcgrp->rstat_css_list); }
From: Lee Jones lee.jones@linaro.org
mainline inclusion from mainline-v5.11-rc1 commit 27122bf57a62f5f816abb48b3a52226ee273a9da category: bugfix bugzilla: 187303, https://gitee.com/openeuler/kernel/issues/I5ZXT2 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
Fixes the following W=1 kernel build warning(s):
drivers/tty/hvc/hvc_opal.c:106:6: warning: no previous prototype for ‘hvc_opal_hvsi_hangup’ [-Wmissing-prototypes]
Cc: Michael Ellerman mpe@ellerman.id.au Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Lee Jones lee.jones@linaro.org Link: https://lore.kernel.org/r/20201104193549.4026187-34-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/tty/hvc/hvc_opal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c index c66412566efc..056ae21a5121 100644 --- a/drivers/tty/hvc/hvc_opal.c +++ b/drivers/tty/hvc/hvc_opal.c @@ -103,7 +103,7 @@ static void hvc_opal_hvsi_close(struct hvc_struct *hp, int data) notifier_del_irq(hp, data); }
-void hvc_opal_hvsi_hangup(struct hvc_struct *hp, int data) +static void hvc_opal_hvsi_hangup(struct hvc_struct *hp, int data) { struct hvc_opal_priv *pv = hvc_opal_privs[hp->vtermno];
From: Lee Jones lee.jones@linaro.org
mainline inclusion from mainline-v5.11-rc1 commit 109af2a82a36e0cedb54988406e6b40de67272c7 category: bugfix bugzilla: 187303, https://gitee.com/openeuler/kernel/issues/I5ZXUG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
Fixes the following W=1 kernel build warning(s):
drivers/tty/hvc/hvc_vio.c:181:6: warning: no previous prototype for ‘hvterm_hvsi_hangup’ [-Wmissing-prototypes] 181 | void hvterm_hvsi_hangup(struct hvc_struct *hp, int data) | ^~~~~~~~~~~~~~~~~~
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Lee Jones lee.jones@linaro.org Link: https://lore.kernel.org/r/20201104193549.4026187-33-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/tty/hvc/hvc_vio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/hvc/hvc_vio.c b/drivers/tty/hvc/hvc_vio.c index 7af54d6ed5b8..798f27f40cc2 100644 --- a/drivers/tty/hvc/hvc_vio.c +++ b/drivers/tty/hvc/hvc_vio.c @@ -178,7 +178,7 @@ static void hvterm_hvsi_close(struct hvc_struct *hp, int data) notifier_del_irq(hp, data); }
-void hvterm_hvsi_hangup(struct hvc_struct *hp, int data) +static void hvterm_hvsi_hangup(struct hvc_struct *hp, int data) { struct hvterm_priv *pv = hvterm_privs[hp->vtermno];
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v5.13-rc1 commit 63bbdb4ea02b17f929fa4f5c536357183eba9639 category: bugfix bugzilla: 187303, https://gitee.com/openeuler/kernel/issues/I5ZXTT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
The sparse tool complains as follows:
drivers/tty/hvc/hvc_udbg.c:20:19: warning: symbol 'hvc_udbg_dev' was not declared. Should it be static?
This symbol is not used outside of hvc_udbg.c, so this commit marks it static.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20210407125826.4139130-1-yukuai3@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/tty/hvc/hvc_udbg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/hvc/hvc_udbg.c b/drivers/tty/hvc/hvc_udbg.c index a4c9913f76a0..ff0dcc56413c 100644 --- a/drivers/tty/hvc/hvc_udbg.c +++ b/drivers/tty/hvc/hvc_udbg.c @@ -17,7 +17,7 @@
#include "hvc_console.h"
-struct hvc_struct *hvc_udbg_dev; +static struct hvc_struct *hvc_udbg_dev;
static int hvc_udbg_put(uint32_t vtermno, const char *buf, int count) {
From: Shubhrajyoti Datta shubhrajyoti.datta@xilinx.com
mainline inclusion from mainline-v5.15-rc1 commit ed623dffdeebcc0acac7be6af4a301ee7169cd21 category: bugfix bugzilla: 187303, https://gitee.com/openeuler/kernel/issues/I5ZXV2 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------
In case the uart registration fails the clocks are left enabled. Disable the clock in case of errors.
Signed-off-by: Shubhrajyoti Datta shubhrajyoti.datta@xilinx.com Link: https://lore.kernel.org/r/20210713064835.27978-2-shubhrajyoti.datta@xilinx.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/tty/serial/uartlite.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/uartlite.c b/drivers/tty/serial/uartlite.c index 48923cd8c07d..dadac5cc3d5d 100644 --- a/drivers/tty/serial/uartlite.c +++ b/drivers/tty/serial/uartlite.c @@ -784,6 +784,7 @@ static int ulite_probe(struct platform_device *pdev) ret = uart_register_driver(&ulite_uart_driver); if (ret < 0) { dev_err(&pdev->dev, "Failed to register driver\n"); + clk_disable_unprepare(pdata->clk); return ret; } }
From: Chen Jiahao chenjiahao16@huawei.com
hulk inclusion category: feature bugzilla: 187058, https://gitee.com/openeuler/kernel/issues/I5ZUT5
--------------------------------
In case of debugging need when undefined instructin exception occurs, we should dump regs and the undefined instruction before BUG in kernel mode. Which is usually used to compare with objdump instructions.
Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/traps.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 7c05bcf29013..f8ec533eb590 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -405,7 +405,9 @@ void do_undefinstr(struct pt_regs *regs) if (call_undef_hook(regs) == 0) return;
- BUG_ON(!user_mode(regs)); + if (!user_mode(regs)) + die("Oops - undefined instruction", regs, 0); + force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0); } NOKPROBE_SYMBOL(do_undefinstr);
From: Chen Jiahao chenjiahao16@huawei.com
hulk inclusion category: bugfix bugzilla: 187431, https://gitee.com/openeuler/kernel/issues/I5ZUTK
--------------------------------
f86d165bfe5f ("arm64: Add non nmi ipi backtrace support") introduced the IPI backtrace support on arm64 with NMI unsupported.
However a warning message comes when triggering the non-NMI IPI backtrace:
WARNING: CPU: 6 PID: 1121 at kernel/smp.c:680 smp_call_function_many_cond+0x78/0x3dc Modules linked in: soft_lockup_test(OE) [last unloaded: soft_lockup_test] CPU: 6 PID: 1121 Comm: loop_thread Tainted: G OEL 5.10.0+ #9 Hardware name: linux,dummy-virt (DT) pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--) pc : smp_call_function_many_cond+0x78/0x3dc lr : smp_call_function_many+0x40/0x50 sp : ffffffc010033c10 x29: ffffffc010033c10 x28: 0000000000000000 x27: ffffffd5dfe2a9a8 x26: ffffffa9bfdb4430 x25: 0000000000000000 x24: 0000000000000000 x23: ffffffd5dfe29000 x22: 0000000000000000 x21: 0000000000000006 x20: ffffffd5df216418 x19: ffffffd5dfe2a9a8 x18: 0000000000000000 x17: 0000000000000000 x16: ffffffd5df293d78 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0025001680000006 x8 : 4e4d492066726f6d x7 : 53656e64696e6720 x6 : 0000000000000001 x5 : ffffffa9bfdb0750 x4 : 0000000000000000 x3 : 0000000000000000 x2 : ffffffd3e01ab000 x1 : ffffffd5dfc03000 x0 : 0000000000010000 Call trace: smp_call_function_many_cond+0x78/0x3dc smp_call_function_many+0x40/0x50 arm64_send_ipi+0x30/0x3c nmi_trigger_cpumask_backtrace+0xfc/0x154 arch_trigger_cpumask_backtrace+0x3c/0x58 watchdog_timer_fn+0x1d4/0x220 __hrtimer_run_queues+0x1bc/0x2c8 hrtimer_run_queues+0xe4/0x110 run_local_timers+0x24/0x50 update_process_times+0x5c/0x88 tick_periodic+0xd0/0xec tick_handle_periodic+0x38/0x8c arch_timer_handler_virt+0x38/0x50 handle_percpu_devid_irq+0xe0/0x1e0 generic_handle_irq+0x34/0x4c __handle_domain_irq+0xb0/0xb8 gic_handle_irq+0x98/0xb8 el1_irq+0xa8/0x140 loop_func+0x14/0x28 [soft_lockup_test] kthread+0x120/0x130 ret_from_fork+0x10/0x18
The cause is calling smp_call_function_many to send IPI to other CPUs in a softirq handling context, which is unsafe and may lead to deadlock.
Use smp_call_function_single_async() instead to avoid the warning above.
Fixes: f86d165bfe5f ("arm64: Add non nmi ipi backtrace support") Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/ipi_nmi.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c index 2cf28e511b23..9a8f7c256117 100644 --- a/arch/arm64/kernel/ipi_nmi.c +++ b/arch/arm64/kernel/ipi_nmi.c @@ -40,9 +40,25 @@ static void ipi_cpu_backtrace(void *info) printk_safe_exit(); }
+static DEFINE_PER_CPU(call_single_data_t, cpu_backtrace_csd) = + CSD_INIT(ipi_cpu_backtrace, NULL); + static void arm64_send_ipi(cpumask_t *mask) { - smp_call_function_many(mask, ipi_cpu_backtrace, NULL, false); + call_single_data_t *csd; + int this_cpu = raw_smp_processor_id(); + int cpu; + int ret; + + for_each_online_cpu(cpu) { + if (cpu == this_cpu) + continue; + + csd = &per_cpu(cpu_backtrace_csd, cpu); + ret = smp_call_function_single_async(cpu, csd); + if (ret) + pr_info("Sending IPI failed to CPU %d\n", cpu); + } }
bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
From: Chen Jiahao chenjiahao16@huawei.com
hulk inclusion category: feature bugzilla: 186682, https://gitee.com/openeuler/kernel/issues/I5ZUU3
--------------------------------
openEuler 5.10 enables spectre-bhb mitigation by default for ARM64, which may cause performance regression. implement a cmdline parameter 'nospectre_bhb' to provide an option to disable spectre-bhb mitigation.
Signed-off-by: Chen Jiahao chenjiahao16@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/proton-pack.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index e807f77737e0..a010a28d9b68 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -81,6 +81,8 @@ ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, static enum mitigation_state spectre_v2_state;
static bool __read_mostly __nospectre_v2; +static bool __read_mostly __nospectre_bhb; + static int __init parse_spectre_v2_param(char *str) { __nospectre_v2 = true; @@ -88,6 +90,13 @@ static int __init parse_spectre_v2_param(char *str) } early_param("nospectre_v2", parse_spectre_v2_param);
+static int __init parse_spectre_bhb_param(char *str) +{ + __nospectre_bhb = true; + return 0; +} +early_param("nospectre_bhb", parse_spectre_bhb_param); + static bool spectre_v2_mitigations_off(void) { bool ret = __nospectre_v2 || cpu_mitigations_off(); @@ -1060,7 +1069,7 @@ void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *entry) /* No point mitigating Spectre-BHB alone. */ } else if (!IS_ENABLED(CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY)) { pr_info_once("spectre-bhb mitigation disabled by compile time option\n"); - } else if (cpu_mitigations_off()) { + } else if (__nospectre_bhb || cpu_mitigations_off()) { pr_info_once("spectre-bhb mitigation disabled by command line option\n"); } else if (supports_ecbhb(SCOPE_LOCAL_CPU)) { state = SPECTRE_MITIGATED;
From: Hongbo Yao yaohongbo@huawei.com
hulk inclusion category: bugfix bugzilla: 20890, https://gitee.com/openeuler/kernel/issues/I5ZV1W
--------------------------------
The default interrupt trigger mode of Pci driver is Level, however, some pci drivers want to change irq trigger mode to edge by writing hardware register directly. In current kernel, this kind of driver needs to sync entry data which has already been reservd in memory.
This patch provides an interface for the driver to sync entry data in memory.
Signed-off-by: Hongbo Yao yaohongbo@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/include/asm/io_apic.h | 1 + arch/x86/kernel/apic/io_apic.c | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a1a26f6d3aa4..294b99f1a84a 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -195,6 +195,7 @@ extern void clear_IO_APIC(void); extern void restore_boot_irq_mode(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin); extern void print_IO_APICs(void); +extern void ioapic_sync_hardware_data(int irq); #else /* !CONFIG_X86_IO_APIC */
#define IO_APIC_IRQ(x) 0 diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 25b1d5c6af96..34bc1e3325a8 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1201,6 +1201,31 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin) } EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
+/* + * The function is only provided for Drivers to keep the data value in + * memory consistent with the value of the register, and these drivers + * must use it carefully. + * + * The correct step should be: + * Change register ---> ioapic_sync_hardware_data ---> request_irq + */ +void ioapic_sync_hardware_data(int irq) +{ + struct IO_APIC_route_entry entry; + struct mp_chip_data *data; + struct irq_pin_list *pin_list; + + data = irq_get_chip_data(irq); + for_each_irq_pin(pin_list, data->irq_2_pin) { + entry = ioapic_read_entry(pin_list->apic, pin_list->pin); + data->entry.trigger = entry.trigger; + data->entry.polarity = entry.polarity; + pr_debug("irq[%d] update trigger[%d] polarity[%d]\n", + irq, entry.trigger, entry.polarity); + } +} +EXPORT_SYMBOL(ioapic_sync_hardware_data); + static struct irq_chip ioapic_chip, ioapic_ir_chip;
static void __init setup_IO_APIC_irqs(void)
From: Ard Biesheuvel ardb@kernel.org
mainline inclusion from mainline-v5.11 commit fe0af09074bfeb46a35357e67635eefe33cdfc49 category: bugfix bugzilla: 187402, https://gitee.com/openeuler/kernel/issues/I61CL6
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This reverts commit 32cf1a12cad43358e47dac8014379c2f33dfbed4.
The 'exisitng buffer' in this case is the firmware provided table, and we should not modify that in place. This fixes a crash on arm64 with initrd table overrides, in which case the DSDT is not mapped with read/write permissions.
Reported-by: Shawn Guo shawn.guo@linaro.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Shawn Guo shawn.guo@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com
Conflicts: drivers/acpi/acpica/nsrepair2.c Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/acpi/acpica/nsrepair2.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/acpica/nsrepair2.c b/drivers/acpi/acpica/nsrepair2.c index fb4f8f6209bd..125143c41bb8 100644 --- a/drivers/acpi/acpica/nsrepair2.c +++ b/drivers/acpi/acpica/nsrepair2.c @@ -491,8 +491,9 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, union acpi_operand_object **return_object_ptr) { union acpi_operand_object *return_object = *return_object_ptr; - char *dest; + union acpi_operand_object *new_string; char *source; + char *dest;
ACPI_FUNCTION_NAME(ns_repair_HID);
@@ -513,6 +514,13 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, return (AE_OK); }
+ /* It is simplest to always create a new string object */ + + new_string = acpi_ut_create_string_object(return_object->string.length); + if (!new_string) { + return (AE_NO_MEMORY); + } + /* * Remove a leading asterisk if present. For some unknown reason, there * are many machines in the field that contains IDs like this. @@ -522,7 +530,7 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, source = return_object->string.pointer; if (*source == '*') { source++; - return_object->string.length--; + new_string->string.length--;
ACPI_DEBUG_PRINT((ACPI_DB_REPAIR, "%s: Removed invalid leading asterisk\n", @@ -537,11 +545,12 @@ acpi_ns_repair_HID(struct acpi_evaluate_info *info, * "NNNN####" where N is an uppercase letter or decimal digit, and * # is a hex digit. */ - for (dest = return_object->string.pointer; *source; dest++, source++) { + for (dest = new_string->string.pointer; *source; dest++, source++) { *dest = (char)toupper((int)*source); } - return_object->string.pointer[return_object->string.length] = 0;
+ acpi_ut_remove_reference(return_object); + *return_object_ptr = new_string; return (AE_OK); }
From: Jiasheng Jiang jiasheng@iscas.ac.cn
mainline inclusion from mainline-v5.17-rc1 commit 2cea3ec5b0099d0e9dd6752aa86e08bce38d6b32 category: bugfix bugzilla: 187402, https://gitee.com/openeuler/kernel/issues/I61CL6
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Because devres_alloc() may fail, devm_ioremap() may return NULL.
Then, 'clk_data->base' will be assigned to clkdev->data->base in platform_device_register_data().
The PTR_ERR_OR_ZERO() check on clk_data does not cover 'base', so it is better to add an explicit check against NULL after updating it.
Fixes: 3f4ba94e3615 ("ACPI: APD: Add AMD misc clock handler support") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn [ rjw: Changelog rewrite ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/acpi/acpi_apd.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/acpi/acpi_apd.c b/drivers/acpi/acpi_apd.c index f1841d1714be..c1f68fdb31bf 100644 --- a/drivers/acpi/acpi_apd.c +++ b/drivers/acpi/acpi_apd.c @@ -96,6 +96,8 @@ static int fch_misc_setup(struct apd_private_data *pdata) resource_size(rentry->res)); break; } + if (!clk_data->base) + return -ENOMEM;
acpi_dev_free_resource_list(&resource_list);
From: Yu Kuai yukuai3@huawei.com
stable inclusion from stable-v5.10.147 commit cce5dc03338e25e910fb5a2c4f2ce8a79644370f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60JC5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable. 2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3 HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size). 3) John Garry posted a patchset to fix the problem. 4) The temporary solution is reverted.
It's weird that the patch in 2) is backported to 5.10 stable alone, while the right thing to do is to backport them all together.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 311772b23a7d..4a2ed1a74ef9 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2787,7 +2787,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) struct hisi_hba *hisi_hba = shost_priv(shost); struct device *dev = hisi_hba->dev; int ret = sas_slave_configure(sdev); - unsigned int max_sectors;
if (ret) return ret; @@ -2805,12 +2804,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) } }
- /* Set according to IOMMU IOVA caching limit */ - max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue), - (PAGE_SIZE * 32) >> SECTOR_SHIFT); - - blk_queue_max_hw_sectors(sdev->request_queue, max_sectors); - return 0; }
From: Yanan Wang wangyanan55@huawei.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5WHHV CVE: NA
----------------------------------------------------
The "ncsnp" is an implementation specific CPU virtualization feature on Hisi 1620 series CPUs. This feature works just like ARM standard S2FWB to reduce some cache management operations in virtualization.
Given that it's Hisi specific feature, let's restrict the detection only to Hisi CPUs. To realize this: 1) Add a sub-directory `hisilicon/` within arch/arm64/kvm to hold code for Hisi specific virtualization features. 2) Add a new kconfig option `CONFIG_KVM_HISI_VIRT` for users to select the whole Hisi specific virtualization features. 3) Add a generic global KVM variable `kvm_ncsnp_support` which is `false` by default. Only re-initialize it when we have `CONFIG_KVM_HISI_VIRT` enabled.
Signed-off-by: Yanan Wang wangyanan55@huawei.com Reviewed-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 1 + arch/arm64/include/asm/hisi_cpu_model.h | 21 ---- arch/arm64/include/asm/kvm_host.h | 3 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arm.c | 13 ++- arch/arm64/kvm/hisi_cpu_model.c | 117 ---------------------- arch/arm64/kvm/hisilicon/Kconfig | 7 ++ arch/arm64/kvm/hisilicon/Makefile | 2 + arch/arm64/kvm/hisilicon/hisi_virt.c | 124 ++++++++++++++++++++++++ arch/arm64/kvm/hisilicon/hisi_virt.h | 19 ++++ 11 files changed, 166 insertions(+), 144 deletions(-) delete mode 100644 arch/arm64/include/asm/hisi_cpu_model.h delete mode 100644 arch/arm64/kvm/hisi_cpu_model.c create mode 100644 arch/arm64/kvm/hisilicon/Kconfig create mode 100644 arch/arm64/kvm/hisilicon/Makefile create mode 100644 arch/arm64/kvm/hisilicon/hisi_virt.c create mode 100644 arch/arm64/kvm/hisilicon/hisi_virt.h
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index c6c3df7a0459..c52e55e26c86 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -744,6 +744,7 @@ CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL=y CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y CONFIG_HAVE_KVM_IRQ_BYPASS=y CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE=y +CONFIG_KVM_HISI_VIRT=y CONFIG_KVM_ARM_PMU=y CONFIG_ARM64_CRYPTO=y CONFIG_CRYPTO_SHA256_ARM64=m diff --git a/arch/arm64/include/asm/hisi_cpu_model.h b/arch/arm64/include/asm/hisi_cpu_model.h deleted file mode 100644 index e0da0ef61613..000000000000 --- a/arch/arm64/include/asm/hisi_cpu_model.h +++ /dev/null @@ -1,21 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright(c) 2019 Huawei Technologies Co., Ltd - */ - -#ifndef __HISI_CPU_MODEL_H__ -#define __HISI_CPU_MODEL_H__ - -enum hisi_cpu_type { - HI_1612, - HI_1616, - HI_1620, - UNKNOWN_HI_TYPE -}; - -extern enum hisi_cpu_type hi_cpu_type; -extern bool kvm_ncsnp_support; - -void probe_hisi_cpu_type(void); -void probe_hisi_ncsnp_support(void); -#endif /* __HISI_CPU_MODEL_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 1a80df133d9d..71a3ba24b287 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -26,7 +26,6 @@ #include <asm/kvm.h> #include <asm/kvm_asm.h> #include <asm/thread_info.h> -#include <asm/hisi_cpu_model.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -715,4 +714,6 @@ extern unsigned int twedel; #define use_twed() (false) #endif
+extern bool kvm_ncsnp_support; + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bc6b692128c9..d984a6041860 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -49,6 +49,7 @@ menuconfig KVM if KVM
source "virt/kvm/Kconfig" +source "arch/arm64/kvm/hisilicon/Kconfig"
config KVM_ARM_PMU bool "Virtual Performance Monitoring Unit (PMU) support" diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 02612bfbbde7..6dc8c914c99b 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -17,7 +17,6 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ guest.o debug.o reset.o sys_regs.o \ vgic-sys-reg-v3.o fpsimd.o pmu.o \ aarch32.o arch_timer.o trng.o\ - hisi_cpu_model.o \ vgic/vgic.o vgic/vgic-init.o \ vgic/vgic-irqfd.o vgic/vgic-v2.o \ vgic/vgic-v3.o vgic/vgic-v4.o \ @@ -26,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_KVM_ARM_PMU) += pmu-emul.o +obj-$(CONFIG_KVM_HISI_VIRT) += hisilicon/ diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 384cc56a6549..469f324ce536 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -47,6 +47,10 @@ __asm__(".arch_extension virt"); #endif
+#ifdef CONFIG_KVM_HISI_VIRT +#include "hisilicon/hisi_virt.h" +#endif + DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); @@ -59,8 +63,7 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
static bool vgic_present;
-/* Hisi cpu type enum */ -enum hisi_cpu_type hi_cpu_type = UNKNOWN_HI_TYPE; +/* Capability of non-cacheable snooping */ bool kvm_ncsnp_support;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled); @@ -1859,9 +1862,11 @@ int kvm_arch_init(void *opaque) return -ENODEV; }
- /* Probe the Hisi CPU type */ +#ifdef CONFIG_KVM_HISI_VIRT probe_hisi_cpu_type(); - probe_hisi_ncsnp_support(); + kvm_ncsnp_support = hisi_ncsnp_supported(); +#endif + kvm_info("KVM ncsnp %s\n", kvm_ncsnp_support ? "enabled" : "disabled");
in_hyp_mode = is_kernel_in_hyp_mode();
diff --git a/arch/arm64/kvm/hisi_cpu_model.c b/arch/arm64/kvm/hisi_cpu_model.c deleted file mode 100644 index 52eecf1ba1cf..000000000000 --- a/arch/arm64/kvm/hisi_cpu_model.c +++ /dev/null @@ -1,117 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright(c) 2019 Huawei Technologies Co., Ltd - */ - -#include <linux/acpi.h> -#include <linux/of.h> -#include <linux/init.h> -#include <linux/kvm_host.h> - -#ifdef CONFIG_ACPI - -/* ACPI Hisi oem table id str */ -const char *oem_str[] = { - "HIP06", /* Hisi 1612 */ - "HIP07", /* Hisi 1616 */ - "HIP08" /* Hisi 1620 */ -}; - -/* - * Get Hisi oem table id. - */ -static void acpi_get_hw_cpu_type(void) -{ - struct acpi_table_header *table; - acpi_status status; - int i, str_size = ARRAY_SIZE(oem_str); - - /* Get oem table id from ACPI table header */ - status = acpi_get_table(ACPI_SIG_DSDT, 0, &table); - if (ACPI_FAILURE(status)) { - pr_err("Failed to get ACPI table: %s\n", - acpi_format_exception(status)); - return; - } - - for (i = 0; i < str_size; ++i) { - if (!strncmp(oem_str[i], table->oem_table_id, 5)) { - hi_cpu_type = i; - return; - } - } -} - -#else -static void acpi_get_hw_cpu_type(void) {} -#endif - -/* of Hisi cpu model str */ -const char *of_model_str[] = { - "Hi1612", - "Hi1616" -}; - -static void of_get_hw_cpu_type(void) -{ - const char *cpu_type; - int ret, i, str_size = ARRAY_SIZE(of_model_str); - - ret = of_property_read_string(of_root, "model", &cpu_type); - if (ret < 0) { - pr_err("Failed to get Hisi cpu model by OF.\n"); - return; - } - - for (i = 0; i < str_size; ++i) { - if (strstr(cpu_type, of_model_str[i])) { - hi_cpu_type = i; - return; - } - } -} - -void probe_hisi_cpu_type(void) -{ - if (!acpi_disabled) - acpi_get_hw_cpu_type(); - else - of_get_hw_cpu_type(); - - if (hi_cpu_type == UNKNOWN_HI_TYPE) - pr_warn("UNKNOWN Hisi cpu type.\n"); -} - -#define NCSNP_MMIO_BASE 0x20107E238 - -/* - * We have the fantastic HHA ncsnp capability on Kunpeng 920, - * with which hypervisor doesn't need to perform a lot of cache - * maintenance like before (in case the guest has non-cacheable - * Stage-1 mappings). - */ -void probe_hisi_ncsnp_support(void) -{ - void __iomem *base; - unsigned int high; - - kvm_ncsnp_support = false; - - if (hi_cpu_type != HI_1620) - goto out; - - base = ioremap(NCSNP_MMIO_BASE, 4); - if (!base) { - pr_err("Unable to map MMIO region when probing ncsnp!\n"); - goto out; - } - - high = readl_relaxed(base) >> 28; - iounmap(base); - if (high != 0x1) - kvm_ncsnp_support = true; - -out: - kvm_info("Hisi ncsnp: %s\n", kvm_ncsnp_support ? "enabled" : - "disabled"); -} diff --git a/arch/arm64/kvm/hisilicon/Kconfig b/arch/arm64/kvm/hisilicon/Kconfig new file mode 100644 index 000000000000..6536f897a32e --- /dev/null +++ b/arch/arm64/kvm/hisilicon/Kconfig @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only +config KVM_HISI_VIRT + bool "HiSilicon SoC specific virtualization features" + depends on ARCH_HISI + help + Support for HiSilicon SoC specific virtualization features. + On non-HiSilicon platforms, say N here. diff --git a/arch/arm64/kvm/hisilicon/Makefile b/arch/arm64/kvm/hisilicon/Makefile new file mode 100644 index 000000000000..849f99d1526d --- /dev/null +++ b/arch/arm64/kvm/hisilicon/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_KVM_HISI_VIRT) += hisi_virt.o diff --git a/arch/arm64/kvm/hisilicon/hisi_virt.c b/arch/arm64/kvm/hisilicon/hisi_virt.c new file mode 100644 index 000000000000..9587f9508a79 --- /dev/null +++ b/arch/arm64/kvm/hisilicon/hisi_virt.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2022 Huawei Technologies Co., Ltd + */ + +#include <linux/acpi.h> +#include <linux/of.h> +#include <linux/init.h> +#include <linux/kvm_host.h> +#include "hisi_virt.h" + +static enum hisi_cpu_type cpu_type = UNKNOWN_HI_TYPE; + +static const char * const hisi_cpu_type_str[] = { + "Hisi1612", + "Hisi1616", + "Hisi1620", + "Unknown" +}; + +/* ACPI Hisi oem table id str */ +static const char * const oem_str[] = { + "HIP06", /* Hisi 1612 */ + "HIP07", /* Hisi 1616 */ + "HIP08" /* Hisi 1620 */ +}; + +/* + * Probe Hisi CPU type form ACPI. + */ +static enum hisi_cpu_type acpi_get_hisi_cpu_type(void) +{ + struct acpi_table_header *table; + acpi_status status; + int i, str_size = ARRAY_SIZE(oem_str); + + /* Get oem table id from ACPI table header */ + status = acpi_get_table(ACPI_SIG_DSDT, 0, &table); + if (ACPI_FAILURE(status)) { + pr_warn("Failed to get ACPI table: %s\n", + acpi_format_exception(status)); + return UNKNOWN_HI_TYPE; + } + + for (i = 0; i < str_size; ++i) { + if (!strncmp(oem_str[i], table->oem_table_id, 5)) + return i; + } + + return UNKNOWN_HI_TYPE; +} + +/* of Hisi cpu model str */ +static const char * const of_model_str[] = { + "Hi1612", + "Hi1616" +}; + +/* + * Probe Hisi CPU type from DT. + */ +static enum hisi_cpu_type of_get_hisi_cpu_type(void) +{ + const char *model; + int ret, i, str_size = ARRAY_SIZE(of_model_str); + + /* + * Note: There may not be a "model" node in FDT, which + * is provided by the vendor. In this case, we are not + * able to get CPU type information through this way. + */ + ret = of_property_read_string(of_root, "model", &model); + if (ret < 0) { + pr_warn("Failed to get Hisi cpu model by OF.\n"); + return UNKNOWN_HI_TYPE; + } + + for (i = 0; i < str_size; ++i) { + if (strstr(model, of_model_str[i])) + return i; + } + + return UNKNOWN_HI_TYPE; +} + +void probe_hisi_cpu_type(void) +{ + if (!acpi_disabled) + cpu_type = acpi_get_hisi_cpu_type(); + else + cpu_type = of_get_hisi_cpu_type(); + + kvm_info("detected: Hisi CPU type '%s'\n", hisi_cpu_type_str[cpu_type]); +} + +/* + * We have the fantastic HHA ncsnp capability on Kunpeng 920, + * with which hypervisor doesn't need to perform a lot of cache + * maintenance like before (in case the guest has non-cacheable + * Stage-1 mappings). + */ +#define NCSNP_MMIO_BASE 0x20107E238 +bool hisi_ncsnp_supported(void) +{ + void __iomem *base; + unsigned int high; + bool supported = false; + + if (cpu_type != HI_1620) + return supported; + + base = ioremap(NCSNP_MMIO_BASE, 4); + if (!base) { + pr_warn("Unable to map MMIO region when probing ncsnp!\n"); + return supported; + } + + high = readl_relaxed(base) >> 28; + iounmap(base); + if (high != 0x1) + supported = true; + + return supported; +} diff --git a/arch/arm64/kvm/hisilicon/hisi_virt.h b/arch/arm64/kvm/hisilicon/hisi_virt.h new file mode 100644 index 000000000000..ef8de6a2101e --- /dev/null +++ b/arch/arm64/kvm/hisilicon/hisi_virt.h @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2022 Huawei Technologies Co., Ltd + */ + +#ifndef __HISI_VIRT_H__ +#define __HISI_VIRT_H__ + +enum hisi_cpu_type { + HI_1612, + HI_1616, + HI_1620, + UNKNOWN_HI_TYPE +}; + +void probe_hisi_cpu_type(void); +bool hisi_ncsnp_supported(void); + +#endif /* __HISI_VIRT_H__ */
From: Yang Jihong yangjihong1@huawei.com
mainline inclusion from mainline-v6.0 commit 6b959ba22d34ca793ffdb15b5715457c78e38b1a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60M4W CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
perf_output_read_group may respond to IPI request of other cores and invoke __perf_install_in_context function. As a result, hwc configuration is modified. causing inconsistency and unexpected consequences.
Interrupts are not disabled when perf_output_read_group reads PMU counter. In this case, IPI request may be received from other cores. As a result, PMU configuration is modified and an error occurs when reading PMU counter:
CPU0 CPU1 __se_sys_perf_event_open perf_install_in_context perf_output_read_group smp_call_function_single for_each_sibling_event(sub, leader) { generic_exec_single if ((sub != event) && remote_function (sub->state == PERF_EVENT_STATE_ACTIVE)) | <enter IPI handler: __perf_install_in_context> <----RAISE IPI-----+ __perf_install_in_context ctx_resched event_sched_out armpmu_del ... hwc->idx = -1; // event->hwc.idx is set to -1 ... <exit IPI> sub->pmu->read(sub); armpmu_read armv8pmu_read_counter armv8pmu_read_hw_counter int idx = event->hw.idx; // idx = -1 u64 val = armv8pmu_read_evcntr(idx); u32 counter = ARMV8_IDX_TO_COUNTER(idx); // invalid counter = 30 read_pmevcntrn(counter) // undefined instruction
Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220902082918.179248-1-yangjihong1@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/events/core.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c index c72e1e9c7828..05d7b37e3020 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6780,6 +6780,13 @@ static void perf_output_read_group(struct perf_output_handle *handle, u64 read_format = event->attr.read_format; u64 values[5]; int n = 0; + unsigned long flags; + + /* + * Disabling interrupts avoids all counter scheduling + * (context switches, timer based rotation and IPIs). + */ + local_irq_save(flags);
values[n++] = 1 + leader->nr_siblings;
@@ -6812,6 +6819,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
__output_copy(handle, values, n * sizeof(u64)); } + + local_irq_restore(flags); }
#define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
From: Lu Wei luwei32@huawei.com
mainline inclusion from mainline-v6.0-rc6 commit 81225b2ea161af48e093f58e8dfee6d705b16af4 category: bugfix bugzilla: 187550, https://gitee.com/openeuler/kernel/issues/I60MR4 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=81...
--------------------------------
If an AF_PACKET socket is used to send packets through ipvlan and the default xmit function of the AF_PACKET socket is changed from dev_queue_xmit() to packet_direct_xmit() via setsockopt() with the option name of PACKET_QDISC_BYPASS, the skb->mac_header may not be reset and remains as the initial value of 65535, this may trigger slab-out-of-bounds bugs as following:
================================================================= UG: KASAN: slab-out-of-bounds in ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan] PU: 2 PID: 1768 Comm: raw_send Kdump: loaded Not tainted 6.0.0-rc4+ #6 ardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 all Trace: print_address_description.constprop.0+0x1d/0x160 print_report.cold+0x4f/0x112 kasan_report+0xa3/0x130 ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan] ipvlan_start_xmit+0x29/0xa0 [ipvlan] __dev_direct_xmit+0x2e2/0x380 packet_direct_xmit+0x22/0x60 packet_snd+0x7c9/0xc40 sock_sendmsg+0x9a/0xa0 __sys_sendto+0x18a/0x230 __x64_sys_sendto+0x74/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is: 1. packet_snd() only reset skb->mac_header when sock->type is SOCK_RAW and skb->protocol is not specified as in packet_parse_headers()
2. packet_direct_xmit() doesn't reset skb->mac_header as dev_queue_xmit()
In this case, skb->mac_header is 65535 when ipvlan_xmit_mode_l2() is called. So when ipvlan_xmit_mode_l2() gets mac header with eth_hdr() which use "skb->head + skb->mac_header", out-of-bound access occurs.
This patch replaces eth_hdr() with skb_eth_hdr() in ipvlan_xmit_mode_l2() and reset mac header in multicast to solve this out-of-bound bug.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/net/ipvlan/ipvlan_core.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 56ae0814982e..1c2d92b43639 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -496,7 +496,6 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb)
static int ipvlan_process_outbound(struct sk_buff *skb) { - struct ethhdr *ethh = eth_hdr(skb); int ret = NET_XMIT_DROP;
/* The ipvlan is a pseudo-L2 device, so the packets that we receive @@ -506,6 +505,8 @@ static int ipvlan_process_outbound(struct sk_buff *skb) if (skb_mac_header_was_set(skb)) { /* In this mode we dont care about * multicast and broadcast traffic */ + struct ethhdr *ethh = eth_hdr(skb); + if (is_multicast_ether_addr(ethh->h_dest)) { pr_debug_ratelimited( "Dropped {multi|broad}cast of type=[%x]\n", @@ -614,7 +615,7 @@ static int ipvlan_process_v6_forward(struct sk_buff *skb)
static int ipvlan_process_forward(struct sk_buff *skb) { - struct ethhdr *ethh = eth_hdr(skb); + struct ethhdr *ethh = skb_eth_hdr(skb); int ret = NET_XMIT_DROP;
/* In this mode we dont care about multicast and broadcast traffic */ @@ -706,7 +707,7 @@ static int ipvlan_xmit_mode_l3(struct sk_buff *skb, struct net_device *dev) static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev) { const struct ipvl_dev *ipvlan = netdev_priv(dev); - struct ethhdr *eth = eth_hdr(skb); + struct ethhdr *eth = skb_eth_hdr(skb); struct ipvl_addr *addr; void *lyr3h; int addr_type; @@ -736,6 +737,7 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev) return dev_forward_skb(ipvlan->phy_dev, skb);
} else if (is_multicast_ether_addr(eth->h_dest)) { + skb_reset_mac_header(skb); ipvlan_skb_crossing_ns(skb, NULL); ipvlan_multicast_enqueue(ipvlan->port, skb, true); return NET_XMIT_SUCCESS; @@ -776,7 +778,7 @@ static int ipvlan_l2e_local_xmit_event(struct ipvl_dev *ipvlan, static int ipvlan_xmit_mode_l2e(struct sk_buff *skb, struct net_device *dev) { struct ipvl_dev *ipvlan = netdev_priv(dev); - struct ethhdr *eth = eth_hdr(skb); + struct ethhdr *eth = skb_eth_hdr(skb); struct ipvl_addr *addr; void *lyr3h; int addr_type; @@ -809,6 +811,7 @@ static int ipvlan_xmit_mode_l2e(struct sk_buff *skb, struct net_device *dev) ipvlan_skb_crossing_ns(skb, ipvlan->phy_dev); return ipvlan_process_forward(skb); } else if (is_multicast_ether_addr(eth->h_dest)) { + skb_reset_mac_header(skb); ipvlan_skb_crossing_ns(skb, NULL); ipvlan_multicast_enqueue(ipvlan->port, skb, true); return NET_XMIT_SUCCESS;
From: Lu Wei luwei32@huawei.com
hulk inclusion category: bugfix bugzilla: 186745, https://gitee.com/openeuler/kernel/issues/I60M9A CVE: NA
----------------------------------------
When tcp compression is uses, if server receives a large packet, which is larger than PAGE_SIZES * MAX_SKB_FRAGS bytes, it needs more than one skb to store the data since one skb can only store PAGE_SIZES * MAX_SKB_FRAGS bytes data. In function tcp_comp_decompress(), it breaks the while-loop which copy decompressed data to new skb when more than MAX_SKB_FRAGS pages is needed, causing part of data is copied into the new skb and received by user but the compressed_len is not added. In this case, the data will be decompressed infinitely. The patch limits the decompressed length so it can be copied into one skb. Moreover, once a skb is full, quit decompress process directly.
Fixes: c31c696f9300 ("tcp_comp: Del compressed_data and remaining_data from tcp_comp_context_rx") Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/ipv4/tcp_comp.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_comp.c b/net/ipv4/tcp_comp.c index ffddbd6d3a6b..add060c8432d 100644 --- a/net/ipv4/tcp_comp.c +++ b/net/ipv4/tcp_comp.c @@ -9,9 +9,10 @@ #include <linux/zstd.h>
#define TCP_COMP_MAX_PADDING 64 -#define TCP_COMP_SCRATCH_SIZE 65535 +#define TCP_COMP_DATA_SIZE 65536 +#define TCP_COMP_SCRATCH_SIZE (TCP_COMP_DATA_SIZE - 1) #define TCP_COMP_MAX_CSIZE (TCP_COMP_SCRATCH_SIZE + TCP_COMP_MAX_PADDING) -#define TCP_COMP_ALLOC_ORDER get_order(65536) +#define TCP_COMP_ALLOC_ORDER get_order(TCP_COMP_DATA_SIZE) #define TCP_COMP_MAX_WINDOWLOG 17 #define TCP_COMP_MAX_INPUT (1 << TCP_COMP_MAX_WINDOWLOG)
@@ -589,6 +590,9 @@ static int tcp_comp_decompress(struct sock *sk, struct sk_buff *skb, int flags) return -ENOMEM;
while (compressed_len < (skb->len - rxm->offset)) { + if (skb_shinfo(nskb)->nr_frags >= MAX_SKB_FRAGS) + break; + len = 0; plen = skb->len - rxm->offset - compressed_len; if (plen > TCP_COMP_MAX_CSIZE) @@ -600,7 +604,8 @@ static int tcp_comp_decompress(struct sock *sk, struct sk_buff *skb, int flags)
outbuf.dst = ctx->rx.plaintext_data; outbuf.pos = 0; - outbuf.size = TCP_COMP_MAX_CSIZE * 32; + outbuf.size = MAX_SKB_FRAGS * TCP_COMP_DATA_SIZE; + outbuf.size -= skb_shinfo(nskb)->nr_frags * TCP_COMP_DATA_SIZE;
ret = ZSTD_decompressStream(ctx->rx.dstream, &outbuf, &inbuf); if (ZSTD_isError(ret)) {
From: Luo Meng luomeng12@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60Q98 CVE: NA
--------------------------------
A crash as follows:
BUG: unable to handle page fault for address: 000000011241cec7 sd 5:0:0:1: [sdl] Synchronizing SCSI cache #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2465367 Comm: multipath Kdump: loaded Tainted: G W O 5.10.0-60.18.0.50.h478.eulerosv2r11.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220525_182517-szxrtosci10000 04/01/2014 RIP: 0010:kernfs_new_node+0x22/0x60 Code: cc cc 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 41 89 cb 0f b7 ca 48 89 f2 53 48 8b 47 08 48 89 fb 48 89 de 48 85 c0 48 0f 44 c7 <48> 8b 78 50 41 51 45 89 c1 45 89 d8 e8 4d ee ff ff 5a 49 89 c4 48 RSP: 0018:ffffa178419539e8 EFLAGS: 00010206 RAX: 000000011241ce77 RBX: ffff9596828395a0 RCX: 000000000000a1ff RDX: ffff9595ada828b0 RSI: ffff9596828395a0 RDI: ffff9596828395a0 RBP: ffff95959a9a2a80 R08: 0000000000000000 R09: 0000000000000004 R10: ffff9595ca0bf930 R11: 0000000000000000 R12: ffff9595ada828b0 R13: ffff9596828395a0 R14: 0000000000000001 R15: ffff9595948c5c80 FS: 00007f64baa10200(0000) GS:ffff9596bad80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000011241cec7 CR3: 000000011923e003 CR4: 0000000000170ee0 Call Trace: kernfs_create_link+0x31/0xa0 sysfs_do_create_link_sd+0x61/0xc0 bd_link_disk_holder+0x10a/0x180 dm_get_table_device+0x10b/0x1f0 [dm_mod] __dm_get_device+0x1e2/0x280 [dm_mod] ? kmem_cache_alloc_trace+0x2fb/0x410 parse_path+0xca/0x200 [dm_multipath] parse_priority_group+0x19d/0x1f0 [dm_multipath] multipath_ctr+0x27a/0x491 [dm_multipath] dm_table_add_target+0x177/0x360 [dm_mod] table_load+0x12b/0x380 [dm_mod] ctl_ioctl+0x199/0x290 [dm_mod] ? dev_suspend+0xd0/0xd0 [dm_mod] dm_ctl_ioctl+0xa/0x20 [dm_mod] __se_sys_ioctl+0x85/0xc0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6
This can be easy reproduce: Add delay before ret = add_symlink(bdev->bd_part->holder_dir...) in bd_link_disk_holder() dmsetup create xxx --tabel "0 1000 linear /dev/sda 0" echo 1 > /sys/block/sda/device/delete
Delete /dev/sda will release holder_dir, but add_symlink() will use holder_dir. Therefore UAF will occur in this case.
Fix this problem by adding reference count to holder_dir.
Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/block_dev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c index c55181b4a665..33eed9087b3b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1630,6 +1630,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, void *holder, } } bdev->bd_openers++; + kobject_get(bdev->bd_part->holder_dir); if (for_part) bdev->bd_part_count++;
@@ -1899,6 +1900,7 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) } #endif
+ kobject_put(bdev->bd_part->holder_dir); if (!--bdev->bd_openers) { WARN_ON_ONCE(bdev->bd_holders); sync_blockdev(bdev);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60Q98 CVE: NA
--------------------------------
Official solution will be applied to mainline, and this solution can't fix uaf for 'bd_holder_dir' thoroughly. hence revert this temporary solution. Officail solution will be backported in the next patch.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/block_dev.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 33eed9087b3b..c55181b4a665 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1630,7 +1630,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, void *holder, } } bdev->bd_openers++; - kobject_get(bdev->bd_part->holder_dir); if (for_part) bdev->bd_part_count++;
@@ -1900,7 +1899,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) } #endif
- kobject_put(bdev->bd_part->holder_dir); if (!--bdev->bd_openers) { WARN_ON_ONCE(bdev->bd_holders); sync_blockdev(bdev);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60Q98 CVE: NA
--------------------------------
Currently, the caller of bd_link_disk_holer() get 'bdev' by blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'. Howerver, it's possible that del_gendisk() can be called currently, and 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus use after free is triggered.
t1: t2: bdev = blkdev_get_by_dev del_gendisk kobject_put(bd_holder_dir) kobject_free() bd_link_disk_holder
Fix the problem by checking disk is still live and grabbing a reference to 'bd_holder_dir' first in bd_link_disk_holder().
Link: https://lore.kernel.org/all/20221103025541.1875809-3-yukuai1@huaweicloud.com... Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/block_dev.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index c55181b4a665..c8aa41edc9bd 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1264,16 +1264,31 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) struct bd_holder_disk *holder; int ret = 0;
- mutex_lock(&bdev->bd_mutex); + /* + * bdev could be deleted beneath us which would implicitly destroy + * the holder directory. Hold on to it. + */ + down_read(&bdev->bd_disk->lookup_sem); + if (!(disk->flags & GENHD_FL_UP)) { + up_read(&bdev->bd_disk->lookup_sem); + return -ENODEV; + }
+ kobject_get(bdev->bd_part->holder_dir); + up_read(&bdev->bd_disk->lookup_sem); + + mutex_lock(&bdev->bd_mutex); WARN_ON_ONCE(!bdev->bd_holder);
/* FIXME: remove the following once add_disk() handles errors */ - if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir)) + if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir)) { + kobject_put(bdev->bd_part->holder_dir); goto out_unlock; + }
holder = bd_find_holder_disk(bdev, disk); if (holder) { + kobject_put(bdev->bd_part->holder_dir); holder->refcnt++; goto out_unlock; } @@ -1295,11 +1310,6 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj); if (ret) goto out_del; - /* - * bdev could be deleted beneath us which would implicitly destroy - * the holder directory. Hold on to it. - */ - kobject_get(bdev->bd_part->holder_dir);
list_add(&holder->list, &bdev->bd_holder_disks); goto out_unlock; @@ -1310,6 +1320,8 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) kfree(holder); out_unlock: mutex_unlock(&bdev->bd_mutex); + if (ret) + kobject_put(bdev->bd_part->holder_dir); return ret; } EXPORT_SYMBOL_GPL(bd_link_disk_holder);
From: Baisong Zhong zhongbaisong@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60P4J CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=d3fd2...
--------------------------------
We got a syzkaller problem because of aarch64 alignment fault if KFENCE enabled. When the size from user bpf program is an odd number, like 399, 407, etc, it will cause the struct skb_shared_info's unaligned access. As seen below:
BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
Use-after-free read at 0xffff6254fffac077 (in kfence-#213): __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 skb_clone+0xf4/0x214 net/core/skbuff.c:1481 ____bpf_clone_redirect net/core/filter.c:2433 [inline] bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 bpf_prog_d3839dd9068ceb51+0x80/0x330 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
allocated by task 15074 on cpu 0 at 1342.585390s: kmalloc include/linux/slab.h:568 [inline] kzalloc include/linux/slab.h:675 [inline] bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
To fix the problem, we adjust @size so that (@size + @hearoom) is a multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info is aligned to a cache line.
Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/bpf/test_run.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index f266a9453c8e..141c39274066 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -188,6 +188,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size, if (user_size > size) return ERR_PTR(-EMSGSIZE);
+ size = SKB_DATA_ALIGN(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data) return ERR_PTR(-ENOMEM);
From: Wang Yufen wangyufen@huawei.com
mainline inclusion from mainline-v5.19-rc2 commit f93431c86b631bbca5614c66f966bf3ddb3c2803 category: bugfix bugzilla: 186067, https://gitee.com/openeuler/kernel/issues/I61COA CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/id...
--------------------------------
Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t.
UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170
Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wang Yufen wangyufen@huawei.com Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org
Conflicts: include/net/ipv6.h net/ipv6/ip6_output.c
Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/net/ipv6.h | 4 ++-- net/ipv6/ip6_output.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 9392a81a3ae4..292bc81e7515 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -995,7 +995,7 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr); int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, + void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags);
@@ -1011,7 +1011,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, struct sk_buff_head *queue, struct sk_buff *ip6_make_skb(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, + void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags, struct inet_cork_full *cork); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 05e19e5d6514..02b156dfd3d5 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1455,7 +1455,7 @@ static int __ip6_append_data(struct sock *sk, struct page_frag *pfrag, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, + void *from, size_t length, int transhdrlen, unsigned int flags, struct ipcm6_cookie *ipc6) { struct sk_buff *skb, *skb_prev = NULL; @@ -1801,7 +1801,7 @@ static int __ip6_append_data(struct sock *sk, int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, + void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags) { @@ -1988,7 +1988,7 @@ EXPORT_SYMBOL_GPL(ip6_flush_pending_frames); struct sk_buff *ip6_make_skb(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, + void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags, struct inet_cork_full *cork)
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.18-rc1 commit 763087dab97547230a6807c865a6a5ae53a59247 category: bugfix bugzilla: 186761, https://gitee.com/openeuler/kernel/issues/I60MPY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We have multiple places where this helper is convenient, and plan using it in the following patch.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/skbuff.h | 10 ++++++++++ net/core/skbuff.c | 19 +++++-------------- 2 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4739ce5f0913..bb6e2eacb660 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1424,6 +1424,11 @@ static inline unsigned int skb_end_offset(const struct sk_buff *skb) { return skb->end; } + +static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) +{ + skb->end = offset; +} #else static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { @@ -1434,6 +1439,11 @@ static inline unsigned int skb_end_offset(const struct sk_buff *skb) { return skb->end - skb->head; } + +static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) +{ + skb->end = skb->head + offset; +} #endif
/* Internal */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 379c426f8d65..9140c8b639a9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -231,7 +231,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, skb->head = data; skb->data = data; skb_reset_tail_pointer(skb); - skb->end = skb->tail + size; + skb_set_end_offset(skb, size); skb->mac_header = (typeof(skb->mac_header))~0U; skb->transport_header = (typeof(skb->transport_header))~0U;
@@ -1691,11 +1691,10 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, skb->head = data; skb->head_frag = 0; skb->data += off; + + skb_set_end_offset(skb, size); #ifdef NET_SKBUFF_DATA_USES_OFFSET - skb->end = size; off = nhead; -#else - skb->end = skb->head + size; #endif skb->tail += off; skb_headers_offset_update(skb, nhead); @@ -6043,11 +6042,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, skb->head = data; skb->data = data; skb->head_frag = 0; -#ifdef NET_SKBUFF_DATA_USES_OFFSET - skb->end = size; -#else - skb->end = skb->head + size; -#endif + skb_set_end_offset(skb, size); skb_set_tail_pointer(skb, skb_headlen(skb)); skb_headers_offset_update(skb, 0); skb->cloned = 0; @@ -6185,11 +6180,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, skb->head = data; skb->head_frag = 0; skb->data = data; -#ifdef NET_SKBUFF_DATA_USES_OFFSET - skb->end = size; -#else - skb->end = skb->head + size; -#endif + skb_set_end_offset(skb, size); skb_reset_tail_pointer(skb); skb_headers_offset_update(skb, 0); skb->cloned = 0;
From: Eric Dumazet edumazet@google.com
mainline inclusion from mainline-v5.18-rc1 commit 2b88cba55883eaafbc9b7cbff0b2c7cdba71ed01 category: bugfix bugzilla: 186761, https://gitee.com/openeuler/kernel/issues/I60MPY CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len) in skb_try_coalesce() [1]
I was able to root cause the issue to kfence.
When kfence is in action, the following assertion is no longer true:
int size = xxxx; void *ptr1 = kmalloc(size, gfp); void *ptr2 = kmalloc(size, gfp);
if (ptr1 && ptr2) ASSERT(ksize(ptr1) == ksize(ptr2));
We attempted to fix these issues in the blamed commits, but forgot that TCP was possibly shifting data after skb_unclone_keeptruesize() has been used, notably from tcp_retrans_try_collapse().
So we not only need to keep same skb->truesize value, we also need to make sure TCP wont fill new tailroom that pskb_expand_head() was able to get from a addr = kmalloc(...) followed by ksize(addr)
Split skb_unclone_keeptruesize() into two parts:
1) Inline skb_unclone_keeptruesize() for the common case, when skb is not cloned.
2) Out of line __skb_unclone_keeptruesize() for the 'slow path'.
WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Modules linked in: CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa RSP: 0018:ffffc900063af268 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000 RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003 RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000 R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8 R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155 FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline] tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630 tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914 tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025 tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947 tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719 sk_backlog_rcv include/net/sock.h:1037 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2779 release_sock+0x54/0x1b0 net/core/sock.c:3311 sk_wait_data+0x177/0x450 net/core/sock.c:2821 tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457 tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572 inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c4777efa751d ("net: add and use skb_unclone_keeptruesize() helper") Fixes: 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Marco Elver elver@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org
conflict: net/core/skbuff.c
Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/skbuff.h | 18 +++++++++--------- net/core/skbuff.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index bb6e2eacb660..26c431883c69 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1663,19 +1663,19 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t pri) return 0; }
-/* This variant of skb_unclone() makes sure skb->truesize is not changed */ +/* This variant of skb_unclone() makes sure skb->truesize + * and skb_end_offset() are not changed, whenever a new skb->head is needed. + * + * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X)) + * when various debugging features are in place. + */ +int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri); static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri) { might_sleep_if(gfpflags_allow_blocking(pri));
- if (skb_cloned(skb)) { - unsigned int save = skb->truesize; - int res; - - res = pskb_expand_head(skb, 0, 0, pri); - skb->truesize = save; - return res; - } + if (skb_cloned(skb)) + return __skb_unclone_keeptruesize(skb, pri); return 0; }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9140c8b639a9..1aa1be87238b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1742,6 +1742,38 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom) } EXPORT_SYMBOL(skb_realloc_headroom);
+int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri) +{ + unsigned int saved_end_offset, saved_truesize; + struct skb_shared_info *shinfo; + int res; + + saved_end_offset = skb_end_offset(skb); + saved_truesize = skb->truesize; + + res = pskb_expand_head(skb, 0, 0, pri); + if (res) + return res; + + skb->truesize = saved_truesize; + + if (likely(skb_end_offset(skb) == saved_end_offset)) + return 0; + + shinfo = skb_shinfo(skb); + + /* We are about to change back skb->end, + * we need to move skb_shinfo() to its new location. + */ + memmove(skb->head + saved_end_offset, + shinfo, + offsetof(struct skb_shared_info, frags[shinfo->nr_frags])); + + skb_set_end_offset(skb, saved_end_offset); + + return 0; +} + /** * skb_copy_expand - copy and expand sk_buff * @skb: buffer to copy
From: Yu Liao liaoyu15@huawei.com
hulk inclusion category: feature bugzilla: 186841, https://gitee.com/openeuler/kernel/issues/I61188 CVE: NA
--------------------------------
Enable CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE by default.
Signed-off-by: Yu Liao liaoyu15@huawei.com Reviewed-by: Wei Li liwei391@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 2 +- arch/x86/configs/openeuler_defconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index c52e55e26c86..df7082b448b4 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -645,7 +645,7 @@ CONFIG_FW_CFG_SYSFS=y # CONFIG_EFI_ESRT=y CONFIG_EFI_VARS_PSTORE=y -# CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set +CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y CONFIG_EFI_FAKE_MEMMAP=y CONFIG_EFI_MAX_FAKE_MEM=8 CONFIG_EFI_PARAMS_FROM_FDT=y diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index 5e731188ab57..d13b457dd01d 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -682,7 +682,7 @@ CONFIG_FW_CFG_SYSFS=y # CONFIG_EFI_VARS is not set CONFIG_EFI_ESRT=y CONFIG_EFI_VARS_PSTORE=y -# CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set +CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y CONFIG_EFI_RUNTIME_MAP=y # CONFIG_EFI_FAKE_MEMMAP is not set CONFIG_EFI_SOFT_RESERVE=y
From: Baokun Li libaokun1@huawei.com
hulk inclusion category: bugfix bugzilla: 187327,https://gitee.com/openeuler/kernel/issues/I6111I CVE: NA
--------------------------------
Print error when hc and md do not match, which is convenient for locating the cause of the problem
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/md/dm-ioctl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index b839705654d4..b012a2748af8 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -2033,6 +2033,8 @@ int dm_copy_name_and_uuid(struct mapped_device *md, char *name, char *uuid) mutex_lock(&dm_hash_cells_mutex); hc = dm_get_mdptr(md); if (!hc || hc->md != md) { + if (hc) + DMERR("hash cell and mapped device do not match!"); r = -ENXIO; goto out; }
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-v5.16-rc1 commit d417b49fff3e2f21043c834841e8623a6098741d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6110W cve: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
It is not safe to check page->index without holding the page lock. It can be changed if the page is moved between the swap cache and the page cache for a shmem file, for example. There is a VM_BUG_ON below which checks page->index is correct after taking the page lock.
Link: https://lkml.kernel.org/r/20210818144932.940640-1-willy@infradead.org Fixes: 5c211ba29deb ("mm: add and use find_lock_entries") Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Reported-by: syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com Cc: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/filemap.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c index 6480600cf0ea..1b502f2e5253 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1961,7 +1961,6 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t start, next_idx = page->index + thp_nr_pages(page); if (page->index < start) goto put; - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); if (page->index + thp_nr_pages(page) - 1 > end) goto put; if (!trylock_page(page))
From: Hugh Dickins hughd@google.com
mainline inclusion from mainline-v5.18-rc1 commit 56a8c8eb1eaf21261be8cdc4e3715239ac087342 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Mikulas asked in "Do we still need commit a0ee5ec520ed ('tmpfs: allocate on read when stacked')?" in [1]
Lukas noticed this unusual behavior of loop device backed by tmpfs in [2].
Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes; but if it looks like it might be a read for "a stacking filesystem", it allocates actual pages to the page cache, and even marks them as dirty. And reads from the loop device do satisfy the test that is used.
This oddity was added for an old version of unionfs, to help to limit its usage to the limited size of the tmpfs mount involved; but about the same time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed differently; and the mod kept just in case others needed it.
Do we still need it? I cannot answer with more certainty than "Probably not". It's nasty enough that we really should try to delete it; but if a regression is reported somewhere, then we might have to revert later.
It's not quite as simple as just removing the test (as Mikulas did): xfstests generic/013 hung because splice from tmpfs failed on page not up-to-date and page mapping unset. That can be fixed just by marking the ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() - it might be useful to others than tmpfs.
My intention, though, was to stop using the ZERO_PAGE here altogether: surely iov_iter_zero() is better for this case? Sadly not: it relies on clear_user(), and the x86 clear_user() is slower than its copy_user() [3].
But while we are still using the ZERO_PAGE, let's stop dirtying its struct page cacheline with unnecessary get_page() and put_page().
Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.i... [1] Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2] Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com... [3] Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.com Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Lukas Czerner lczerner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Cc: Zdenek Kabelac zkabelac@redhat.com Cc: "Darrick J. Wong" djwong@kernel.org Cc: Miklos Szeredi miklos@szeredi.hu Cc: Borislav Petkov bp@suse.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/filemap.c | 6 ++++++ mm/shmem.c | 20 ++++++-------------- 2 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index 1b502f2e5253..8e87267e1a78 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1001,6 +1001,12 @@ void __init pagecache_init(void) init_waitqueue_head(&page_wait_table[i]);
page_writeback_init(); + + /* + * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date, + * and splice's page_cache_pipe_buf_confirm() needs to see that. + */ + SetPageUptodate(ZERO_PAGE(0)); }
/* diff --git a/mm/shmem.c b/mm/shmem.c index e85ac8c2150f..46e7dea65db1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2586,19 +2586,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) struct address_space *mapping = inode->i_mapping; pgoff_t index; unsigned long offset; - enum sgp_type sgp = SGP_READ; int error = 0; ssize_t retval = 0; loff_t *ppos = &iocb->ki_pos;
- /* - * Might this read be for a stacking filesystem? Then when reading - * holes of a sparse file, we actually need to allocate those pages, - * and even mark them dirty, so it cannot exceed the max_blocks limit. - */ - if (!iter_is_iovec(to)) - sgp = SGP_CACHE; - index = *ppos >> PAGE_SHIFT; offset = *ppos & ~PAGE_MASK;
@@ -2607,6 +2598,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) pgoff_t end_index; unsigned long nr, ret; loff_t i_size = i_size_read(inode); + bool got_page;
end_index = i_size >> PAGE_SHIFT; if (index > end_index) @@ -2617,15 +2609,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) break; }
- error = shmem_getpage(inode, index, &page, sgp); + error = shmem_getpage(inode, index, &page, SGP_READ); if (error) { if (error == -EINVAL) error = 0; break; } if (page) { - if (sgp == SGP_CACHE) - set_page_dirty(page); unlock_page(page); }
@@ -2659,9 +2649,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) */ if (!offset) mark_page_accessed(page); + got_page = true; } else { page = ZERO_PAGE(0); - get_page(page); + got_page = false; }
/* @@ -2674,7 +2665,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) index += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK;
- put_page(page); + if (got_page) + put_page(page); if (!iov_iter_count(to)) break; if (ret < nr) {
From: Hugh Dickins hughd@google.com
mainline inclusion from mainline-v5.18-rc3 commit 1bdec44b1eee32e311b44b5b06144bb7d9b33938 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change.
Whilst nfsd_splice_action() does contain some questionable handling of repeated pages, and Chuck was able to work around there, history from Mark Hemment makes clear that there might be similar dangers elsewhere: it was not a good idea for me to pass ZERO_PAGE down to unknown actors.
Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when iter_is_iovec(); in other cases, use the more natural iov_iter_zero() instead of copy_page_to_iter().
We would use iov_iter_zero() throughout, but the x86 clear_user() is not nearly so well optimized as copy to user (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds).
And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)): which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards
Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com Fixes: 56a8c8eb1eaf ("tmpfs: do not allocate pages on read") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Patrice CHOTARD patrice.chotard@foss.st.com Reported-by: Chuck Lever III chuck.lever@oracle.com Tested-by: Chuck Lever III chuck.lever@oracle.com Cc: Mark Hemment markhemm@googlemail.com Cc: Patrice CHOTARD patrice.chotard@foss.st.com Cc: Mikulas Patocka mpatocka@redhat.com Cc: Lukas Czerner lczerner@redhat.com Cc: Christoph Hellwig hch@lst.de Cc: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/filemap.c | 6 ------ mm/shmem.c | 31 ++++++++++++++++++++----------- 2 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index 8e87267e1a78..1b502f2e5253 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1001,12 +1001,6 @@ void __init pagecache_init(void) init_waitqueue_head(&page_wait_table[i]);
page_writeback_init(); - - /* - * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date, - * and splice's page_cache_pipe_buf_confirm() needs to see that. - */ - SetPageUptodate(ZERO_PAGE(0)); }
/* diff --git a/mm/shmem.c b/mm/shmem.c index 46e7dea65db1..258179b89a9b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2598,7 +2598,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) pgoff_t end_index; unsigned long nr, ret; loff_t i_size = i_size_read(inode); - bool got_page;
end_index = i_size >> PAGE_SHIFT; if (index > end_index) @@ -2649,24 +2648,34 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) */ if (!offset) mark_page_accessed(page); - got_page = true; + /* + * Ok, we have the page, and it's up-to-date, so + * now we can copy it to user space... + */ + ret = copy_page_to_iter(page, offset, nr, to); + put_page(page); + + } else if (iter_is_iovec(to)) { + /* + * Copy to user tends to be so well optimized, but + * clear_user() not so much, that it is noticeably + * faster to copy the zero page instead of clearing. + */ + ret = copy_page_to_iter(ZERO_PAGE(0), offset, nr, to); } else { - page = ZERO_PAGE(0); - got_page = false; + /* + * But submitting the same page twice in a row to + * splice() - or others? - can result in confusion: + * so don't attempt that optimization on pipes etc. + */ + ret = iov_iter_zero(nr, to); }
- /* - * Ok, we have the page, and it's up-to-date, so - * now we can copy it to user space... - */ - ret = copy_page_to_iter(page, offset, nr, to); retval += ret; offset += ret; index += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK;
- if (got_page) - put_page(page); if (!iov_iter_count(to)) break; if (ret < nr) {
From: Selvin Xavier selvin.xavier@broadcom.com
mainline inclusion from mainline-v5.16-rc1 commit 5ec0a6fcb60ea430f8ee7e0bec22db9b22f856d3 category: bugfix bugzilla: 187447,https://gitee.com/openeuler/kernel/issues/I612GP
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Host crashes when pci_enable_atomic_ops_to_root() is called for VFs with virtual buses. The virtual buses added to SR-IOV have bus->self set to NULL and host crashes due to this.
PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" ... #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace [exception RIP: pcie_capability_read_dword+28] RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6 #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re] #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit in Device Control 2 is reserved for VFs. The PF value applies to all associated VFs.
Return -EINVAL if pci_enable_atomic_ops_to_root() is called for a VF.
Link: https://lore.kernel.org/r/1631354585-16597-1-git-send-email-selvin.xavier@br... Fixes: 35f5ace5dea4 ("RDMA/bnxt_re: Enable global atomic ops if platform supports") Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()") Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Andy Gospodarek gospo@broadcom.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/pci/pci.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 1c3336910419..77c1ff3e9a7c 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3674,6 +3674,14 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask) struct pci_dev *bridge; u32 cap, ctl2;
+ /* + * Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit + * in Device Control 2 is reserved in VFs and the PF value applies + * to all associated VFs. + */ + if (dev->is_virtfn) + return -EINVAL; + if (!pci_is_pcie(dev)) return -EINVAL;
From: Hyunwoo Kim imv4bel@gmail.com
mainline inclusion from mainline-v5.19-rc4 commit a09d2d00af53b43c6f11e6ab3cb58443c2cac8a7 category: bugfix bugzilla: 187590, https://gitee.com/src-openeuler/kernel/issues/I5PRMO CVE: CVE-2022-39842
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------
In pxa3xx_gcu_write, a count parameter of type size_t is passed to words of type int. Then, copy_from_user() may cause a heap overflow because it is used as the third argument of copy_from_user().
Signed-off-by: Hyunwoo Kim imv4bel@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Zhao Wenhui zhaowenhui8@huawei.com Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/video/fbdev/pxa3xx-gcu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/pxa3xx-gcu.c b/drivers/video/fbdev/pxa3xx-gcu.c index 9421d14d0eb0..9e9888e40c57 100644 --- a/drivers/video/fbdev/pxa3xx-gcu.c +++ b/drivers/video/fbdev/pxa3xx-gcu.c @@ -381,7 +381,7 @@ pxa3xx_gcu_write(struct file *file, const char *buff, struct pxa3xx_gcu_batch *buffer; struct pxa3xx_gcu_priv *priv = to_pxa3xx_gcu_priv(file);
- int words = count / 4; + size_t words = count / 4;
/* Does not need to be atomic. There's a lock in user space, * but anyhow, this is just for statistics. */
From: Qi Zheng zhengqi.arch@bytedance.com
mainline inclusion from mainline-v5.19-rc1 commit 3f913fc5f9745613088d3c569778c9813ab9c129 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I610B5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
We expect no warnings to be issued when we specify __GFP_NOWARN, but currently in paths like alloc_pages() and kmalloc(), there are still some warnings printed, fix it.
But for some warnings that report usage problems, we don't deal with them. If such warnings are printed, then we should fix the usage problems. Such as the following case:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
[zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng zhengqi.arch@bytedance.com Cc: Akinobu Mita akinobu.mita@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Conflict: mm/internal.h mm/page_alloc.c
Signed-off-by: Ye Weihua yeweihua4@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/fault-inject.h | 2 ++ lib/fault-inject.c | 3 +++ mm/failslab.c | 3 +++ mm/internal.h | 15 +++++++++++++++ mm/page_alloc.c | 16 +++++++++------- 5 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h index e525f6957c49..d506ee960ffd 100644 --- a/include/linux/fault-inject.h +++ b/include/linux/fault-inject.h @@ -20,6 +20,7 @@ struct fault_attr { atomic_t space; unsigned long verbose; bool task_filter; + bool no_warn; unsigned long stacktrace_depth; unsigned long require_start; unsigned long require_end; @@ -39,6 +40,7 @@ struct fault_attr { .ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \ .verbose = 2, \ .dname = NULL, \ + .no_warn = false, \ }
#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER diff --git a/lib/fault-inject.c b/lib/fault-inject.c index ce12621b4275..423784d9c058 100644 --- a/lib/fault-inject.c +++ b/lib/fault-inject.c @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(setup_fault_attr);
static void fail_dump(struct fault_attr *attr) { + if (attr->no_warn) + return; + if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) { printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n" "name %pd, interval %lu, probability %lu, " diff --git a/mm/failslab.c b/mm/failslab.c index f92fed91ac23..58df9789f1d2 100644 --- a/mm/failslab.c +++ b/mm/failslab.c @@ -30,6 +30,9 @@ bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags) if (failslab.cache_filter && !(s->flags & SLAB_FAILSLAB)) return false;
+ if (gfpflags & __GFP_NOWARN) + failslab.attr.no_warn = true; + return should_fail(&failslab.attr, s->object_size); }
diff --git a/mm/internal.h b/mm/internal.h index 8ef8cdd929fa..13f78c5cd961 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -32,6 +32,21 @@ /* Do not use these with a slab allocator */ #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
+/* + * Different from WARN_ON_ONCE(), no warning will be issued + * when we specify __GFP_NOWARN. + */ +#define WARN_ON_ONCE_GFP(cond, gfp) ({ \ + static bool __section(".data.once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(!(gfp & __GFP_NOWARN) && __ret_warn_once && !__warned)) { \ + __warned = true; \ + WARN_ON(1); \ + } \ + unlikely(__ret_warn_once); \ +}) + void page_writeback_init(void);
vm_fault_t do_swap_page(struct vm_fault *vmf); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2feb99a0b98f..81840e28d78f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3555,6 +3555,9 @@ static bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) (gfp_mask & __GFP_DIRECT_RECLAIM)) return false;
+ if (gfp_mask & __GFP_NOWARN) + fail_page_alloc.attr.no_warn = true; + return should_fail(&fail_page_alloc.attr, 1 << order); }
@@ -4130,7 +4133,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, */
/* Exhausted what can be done so it's blame time */ - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { + if (out_of_memory(&oc) || + WARN_ON_ONCE_GFP(gfp_mask & __GFP_NOFAIL, gfp_mask)) { *did_some_progress = 1;
/* @@ -4914,7 +4918,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * All existing users of the __GFP_NOFAIL are blockable, so warn * of any new users that actually require GFP_NOWAIT */ - if (WARN_ON_ONCE(!can_direct_reclaim)) + if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) goto fail;
/* @@ -4922,7 +4926,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * because we cannot reclaim anything and only can loop waiting * for somebody to do a work for us */ - WARN_ON_ONCE(current->flags & PF_MEMALLOC); + WARN_ON_ONCE_GFP(current->flags & PF_MEMALLOC, gfp_mask);
/* * non failing costly orders are a hard requirement which we @@ -4930,7 +4934,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * so that we can identify them and convert them to something * else. */ - WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER); + WARN_ON_ONCE_GFP(order > PAGE_ALLOC_COSTLY_ORDER, gfp_mask);
/* * Help non-failing allocations by giving them access to memory @@ -5295,10 +5299,8 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, * There are several places where we assume that the order value is sane * so bail out early if the request is out of bound. */ - if (unlikely(order >= MAX_ORDER)) { - WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); + if (WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp)) return NULL; - }
gfp &= gfp_allowed_mask;
From: Ye Weihua yeweihua4@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I610B5 CVE: NA
--------------------------------
__dend_signal_locked() invokes __sigqueue_alloc() which may invoke a normal printk() to print failure message. This can cause a deadlock in the scenario reported by syz-bot below (test in 5.10):
CPU0 CPU1 ---- ---- lock(&sighand->siglock); lock(&tty->read_wait); lock(&sighand->siglock); lock(console_owner);
This patch specities __GFP_NOWARN to __sigqueue_alloc(), so that printk will not be called, and this deadlock problem can be avoided.
Syzbot reported the following lockdep error:
====================================================== WARNING: possible circular locking dependency detected 5.10.0-04424-ga472e3c833d3 #1 Not tainted ------------------------------------------------------ syz-executor.2/31970 is trying to acquire lock: ffffa00014066a60 (console_owner){-.-.}-{0:0}, at: console_trylock_spinning+0xf0/0x2e0 kernel/printk/printk.c:1854
but task is already holding lock: ffff0000ddb38a98 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x60/0x260 kernel/signal.c:1322
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&sighand->siglock){-.-.}-{2:2}: validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728 __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954 lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159 __lock_task_sighand+0xf0/0x370 kernel/signal.c:1396 lock_task_sighand include/linux/sched/signal.h:699 [inline] task_work_add+0x1f8/0x2a0 kernel/task_work.c:58 io_req_task_work_add+0x98/0x10c fs/io_uring.c:2115 __io_async_wake+0x338/0x780 fs/io_uring.c:4984 io_poll_wake+0x40/0x50 fs/io_uring.c:5461 __wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123 __wake_up+0x1c/0x24 kernel/sched/wait.c:142 pty_set_termios+0x1ac/0x2d0 drivers/tty/pty.c:286 tty_set_termios+0x310/0x46c drivers/tty/tty_ioctl.c:334 set_termios.part.0+0x2dc/0xa50 drivers/tty/tty_ioctl.c:414 set_termios drivers/tty/tty_ioctl.c:368 [inline] tty_mode_ioctl+0x4f4/0xbec drivers/tty/tty_ioctl.c:736 n_tty_ioctl_helper+0x74/0x260 drivers/tty/tty_ioctl.c:883 n_tty_ioctl+0x80/0x3d0 drivers/tty/n_tty.c:2516 tty_ioctl+0x508/0x1100 drivers/tty/tty_io.c:2751 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0x12c/0x18c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155 do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
-> #3 (&tty->read_wait){....}-{2:2}: validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728 __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954 lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_poll_double_wake+0x158/0x30c fs/io_uring.c:5093 __wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123 __wake_up+0x1c/0x24 kernel/sched/wait.c:142 pty_close+0x1bc/0x330 drivers/tty/pty.c:68 tty_release+0x1e0/0x88c drivers/tty/tty_io.c:1761 __fput+0x1dc/0x500 fs/file_table.c:281 ____fput+0x24/0x30 fs/file_table.c:314 task_work_run+0xf4/0x1ec kernel/task_work.c:151 tracehook_notify_resume include/linux/tracehook.h:188 [inline] do_notify_resume+0x378/0x410 arch/arm64/kernel/signal.c:718 work_pending+0xc/0x198
-> #2 (&tty->write_wait){....}-{2:2}: validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728 __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954 lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb0/0x130 kernel/sched/wait.c:122 __wake_up+0x1c/0x24 kernel/sched/wait.c:142 tty_wakeup+0x54/0xbc drivers/tty/tty_io.c:539 tty_port_default_wakeup+0x38/0x50 drivers/tty/tty_port.c:50 tty_port_tty_wakeup+0x3c/0x50 drivers/tty/tty_port.c:388 uart_write_wakeup+0x38/0x60 drivers/tty/serial/serial_core.c:106 pl011_tx_chars+0x530/0x5c0 drivers/tty/serial/amba-pl011.c:1418 pl011_start_tx_pio drivers/tty/serial/amba-pl011.c:1303 [inline] pl011_start_tx+0x1b4/0x430 drivers/tty/serial/amba-pl011.c:1315 __uart_start.isra.0+0xb4/0xcc drivers/tty/serial/serial_core.c:127 uart_write+0x21c/0x460 drivers/tty/serial/serial_core.c:613 process_output_block+0x120/0x3ac drivers/tty/n_tty.c:590 n_tty_write+0x2c8/0x650 drivers/tty/n_tty.c:2383 do_tty_write drivers/tty/tty_io.c:1028 [inline] file_tty_write.constprop.0+0x2d0/0x520 drivers/tty/tty_io.c:1118 tty_write drivers/tty/tty_io.c:1125 [inline] redirected_tty_write+0xe4/0x104 drivers/tty/tty_io.c:1147 call_write_iter include/linux/fs.h:1960 [inline] new_sync_write+0x264/0x37c fs/read_write.c:515 vfs_write+0x694/0x9d0 fs/read_write.c:602 ksys_write+0xfc/0x200 fs/read_write.c:655 __do_sys_write fs/read_write.c:667 [inline] __se_sys_write fs/read_write.c:664 [inline] __arm64_sys_write+0x50/0x60 fs/read_write.c:664 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155 do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
-> #1 (&port_lock_key){-.-.}-{2:2}: validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728 __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954 lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] pl011_console_write+0x2f0/0x410 drivers/tty/serial/amba-pl011.c:2263 call_console_drivers.constprop.0+0x1f8/0x3b0 kernel/printk/printk.c:1932 console_unlock+0x36c/0x9ec kernel/printk/printk.c:2553 vprintk_emit+0x40c/0x4b0 kernel/printk/printk.c:2075 vprintk_default+0x48/0x54 kernel/printk/printk.c:2092 vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404 printk+0xbc/0xf0 kernel/printk/printk.c:2123 register_console+0x580/0x790 kernel/printk/printk.c:2905 uart_configure_port.constprop.0+0x4a0/0x4e0 drivers/tty/serial/serial_core.c:2431 uart_add_one_port+0x378/0x550 drivers/tty/serial/serial_core.c:2944 pl011_register_port+0xb4/0x210 drivers/tty/serial/amba-pl011.c:2686 pl011_probe+0x334/0x3ec drivers/tty/serial/amba-pl011.c:2736 amba_probe+0x14c/0x2f0 drivers/amba/bus.c:283 really_probe+0x210/0xa5c drivers/base/dd.c:562 driver_probe_device+0x1c8/0x280 drivers/base/dd.c:747 __device_attach_driver+0x18c/0x260 drivers/base/dd.c:853 bus_for_each_drv+0x120/0x1a0 drivers/base/bus.c:431 __device_attach+0x16c/0x3b4 drivers/base/dd.c:922 device_initial_probe+0x28/0x34 drivers/base/dd.c:971 bus_probe_device+0x124/0x13c drivers/base/bus.c:491 fw_devlink_resume+0x164/0x270 drivers/base/core.c:1601 of_platform_default_populate_init+0xf4/0x114 drivers/of/platform.c:543 do_one_initcall+0x11c/0x770 init/main.c:1217 do_initcall_level+0x364/0x388 init/main.c:1290 do_initcalls+0x90/0xc0 init/main.c:1306 do_basic_setup init/main.c:1326 [inline] kernel_init_freeable+0x57c/0x63c init/main.c:1529 kernel_init+0x1c/0x20c init/main.c:1417 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034
-> #0 (console_owner){-.-.}-{0:0}: check_prev_add+0xe0/0x105c kernel/locking/lockdep.c:2988 check_prevs_add+0x1c8/0x3d4 kernel/locking/lockdep.c:3113 validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728 __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954 lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564 console_trylock_spinning+0x130/0x2e0 kernel/printk/printk.c:1875 vprintk_emit+0x268/0x4b0 kernel/printk/printk.c:2074 vprintk_default+0x48/0x54 kernel/printk/printk.c:2092 vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404 printk+0xbc/0xf0 kernel/printk/printk.c:2123 fail_dump lib/fault-inject.c:45 [inline] should_fail+0x2a0/0x370 lib/fault-inject.c:146 __should_failslab+0x8c/0xe0 mm/failslab.c:33 should_failslab+0x14/0x2c mm/slab_common.c:1181 slab_pre_alloc_hook mm/slab.h:495 [inline] slab_alloc_node mm/slub.c:2842 [inline] slab_alloc mm/slub.c:2931 [inline] kmem_cache_alloc+0x8c/0xe64 mm/slub.c:2936 __sigqueue_alloc+0x224/0x5a4 kernel/signal.c:437 __send_signal+0x700/0xeac kernel/signal.c:1121 send_signal+0x348/0x6a0 kernel/signal.c:1247 force_sig_info_to_task+0x184/0x260 kernel/signal.c:1339 force_sig_fault_to_task kernel/signal.c:1678 [inline] force_sig_fault+0xb0/0xf0 kernel/signal.c:1685 arm64_force_sig_fault arch/arm64/kernel/traps.c:182 [inline] arm64_notify_die arch/arm64/kernel/traps.c:208 [inline] arm64_notify_die+0xdc/0x160 arch/arm64/kernel/traps.c:199 do_sp_pc_abort+0x4c/0x60 arch/arm64/mm/fault.c:794 el0_pc+0xd8/0x19c arch/arm64/kernel/entry-common.c:309 el0_sync_handler+0x12c/0x1e0 arch/arm64/kernel/entry-common.c:394 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
other info that might help us debug this:
Chain exists of: console_owner --> &tty->read_wait --> &sighand->siglock
Signed-off-by: Ye Weihua yeweihua4@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/signal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/signal.c b/kernel/signal.c index 6d374d02a2cb..9aec5fdcc8d0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1118,7 +1118,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc else override_rlimit = 0;
- q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); + q = __sigqueue_alloc(sig, t, GFP_ATOMIC | __GFP_NOWARN, + override_rlimit); if (q) { list_add_tail(&q->list, &pending->list); switch ((unsigned long) info) {
From: Joe Thornber ejt@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit f73e2e70ec48c9a9d45494c4866230a5059062ad category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5JCAH
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Remove this extra BUG_ON() that calls node_check() -- which avoids extra crc checking.
Signed-off-by: Joe Thornber ejt@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Reviewed-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/md/persistent-data/dm-btree-spine.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/md/persistent-data/dm-btree-spine.c b/drivers/md/persistent-data/dm-btree-spine.c index e03cb9e48773..c4b386b2be97 100644 --- a/drivers/md/persistent-data/dm-btree-spine.c +++ b/drivers/md/persistent-data/dm-btree-spine.c @@ -30,8 +30,6 @@ static void node_prepare_for_write(struct dm_block_validator *v, h->csum = cpu_to_le32(dm_bm_checksum(&h->flags, block_size - sizeof(__le32), BTREE_CSUM_XOR)); - - BUG_ON(node_check(v, b, 4096)); }
static int node_check(struct dm_block_validator *v,
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5JCAH
--------------------------------
The BUG_ON is unneed Since f73e2e70ec48 ("dm btree spine: remove paranoid node_check call in node_prep_for_write()") merged in v5.13.
For debug reason, we also want to know the data on disk is corrupted by write or disk fault. So also add check and print some info when data corrupted.
Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Reviewed-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/md/persistent-data/dm-btree-spine.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/md/persistent-data/dm-btree-spine.c b/drivers/md/persistent-data/dm-btree-spine.c index c4b386b2be97..859aae0a8683 100644 --- a/drivers/md/persistent-data/dm-btree-spine.c +++ b/drivers/md/persistent-data/dm-btree-spine.c @@ -30,6 +30,8 @@ static void node_prepare_for_write(struct dm_block_validator *v, h->csum = cpu_to_le32(dm_bm_checksum(&h->flags, block_size - sizeof(__le32), BTREE_CSUM_XOR)); + if (node_check(v, b, 4096)) + DMWARN_LIMIT("%s node_check failed", __func__); }
static int node_check(struct dm_block_validator *v,
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
Some interface files in debugfs support the read method dfs_file_read(), but their rwx permissions is shown as unreadable.
For example:
# ls -l /sys/kernel/debug/ubi/ubi0/ total 0 --w------- 1 root root 0 Oct 22 16:26 chk_fastmap --w------- 1 root root 0 Oct 22 16:26 chk_gen --w------- 1 root root 0 Oct 22 16:26 chk_io -r-------- 1 root root 0 Oct 22 16:26 detailed_erase_block_info --w------- 1 root root 0 Oct 22 16:26 tst_disable_bgt --w------- 1 root root 0 Oct 22 16:26 tst_emulate_bitflips --w------- 1 root root 0 Oct 22 16:26 tst_emulate_io_failures --w------- 1 root root 0 Oct 22 16:26 tst_emulate_power_cut --w------- 1 root root 0 Oct 22 16:26 tst_emulate_power_cut_max --w------- 1 root root 0 Oct 22 16:26 tst_emulate_power_cut_min
It shows that these files do not have read permission, but we can actually read their contents.
# echo 1 > /sys/kernel/debug/ubi/ubi0/chk_io # cat /sys/kernel/debug/ubi/ubi0/chk_io 1
This patch adds the read permission display for file that support the read method.
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/ubi/debug.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c index ac2bdba8bb1a..5a704d598181 100644 --- a/drivers/mtd/ubi/debug.c +++ b/drivers/mtd/ubi/debug.c @@ -504,6 +504,7 @@ int ubi_debugfs_init_dev(struct ubi_device *ubi) { unsigned long ubi_num = ubi->ubi_num; struct ubi_debug_info *d = &ubi->dbg; + umode_t mode = S_IRUSR | S_IWUSR; int n;
if (!IS_ENABLED(CONFIG_DEBUG_FS)) @@ -518,41 +519,41 @@ int ubi_debugfs_init_dev(struct ubi_device *ubi)
d->dfs_dir = debugfs_create_dir(d->dfs_dir_name, dfs_rootdir);
- d->dfs_chk_gen = debugfs_create_file("chk_gen", S_IWUSR, d->dfs_dir, + d->dfs_chk_gen = debugfs_create_file("chk_gen", mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
- d->dfs_chk_io = debugfs_create_file("chk_io", S_IWUSR, d->dfs_dir, + d->dfs_chk_io = debugfs_create_file("chk_io", mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
- d->dfs_chk_fastmap = debugfs_create_file("chk_fastmap", S_IWUSR, + d->dfs_chk_fastmap = debugfs_create_file("chk_fastmap", mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
- d->dfs_disable_bgt = debugfs_create_file("tst_disable_bgt", S_IWUSR, + d->dfs_disable_bgt = debugfs_create_file("tst_disable_bgt", mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
d->dfs_emulate_bitflips = debugfs_create_file("tst_emulate_bitflips", - S_IWUSR, d->dfs_dir, + mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
d->dfs_emulate_io_failures = debugfs_create_file("tst_emulate_io_failures", - S_IWUSR, d->dfs_dir, + mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
d->dfs_emulate_power_cut = debugfs_create_file("tst_emulate_power_cut", - S_IWUSR, d->dfs_dir, + mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
d->dfs_power_cut_min = debugfs_create_file("tst_emulate_power_cut_min", - S_IWUSR, d->dfs_dir, + mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
d->dfs_power_cut_max = debugfs_create_file("tst_emulate_power_cut_max", - S_IWUSR, d->dfs_dir, + mode, d->dfs_dir, (void *)ubi_num, &dfs_fops);
debugfs_create_file("detailed_erase_block_info", S_IRUSR, d->dfs_dir,
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
To make debug parameters configurable at run time, use the fault injection framework to reconstruct the debugfs interface.
Now, the file emulate_failures and fault_attr files control whether to enable fault emmulation.
The file emulate_failures receives a mask that controls type and process of fault injection. Generally, for ease of use, you can directly enter a mask with all 1s.
echo 0xffff > /sys/kernel/debug/ubi/ubi0/emulate_failures
And you need to configure other fault-injection capabilities for testing purpose:
echo 100 > /sys/kernel/debug/ubi/fault_inject/emulate_power_cut/probability echo 15 > /sys/kernel/debug/ubi/fault_inject/emulate_power_cut/space echo 2 > /sys/kernel/debug/ubi/fault_inject/emulate_power_cut/verbose printf %#x 1 > /sys/kernel/debug/ubi/fault_inject/emulate_power_cut/times
The CONFIG_MTD_UBI_FAULT_INJECTION to enable the Fault Injection is added to kconfig.
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/ubi/Kconfig | 7 ++ drivers/mtd/ubi/debug.c | 148 ++++++++++++++++------------------------ drivers/mtd/ubi/debug.h | 91 +++++++++++++++++++----- drivers/mtd/ubi/io.c | 10 ++- drivers/mtd/ubi/ubi.h | 37 ++-------- 5 files changed, 150 insertions(+), 143 deletions(-)
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig index 2ed77b7b3fcb..a739c533b66a 100644 --- a/drivers/mtd/ubi/Kconfig +++ b/drivers/mtd/ubi/Kconfig @@ -103,5 +103,12 @@ config MTD_UBI_BLOCK When selected, this feature will be built in the UBI driver.
If in doubt, say "N". +config MTD_UBI_FAULT_INJECTION + bool "Fault injection capability of UBI device" + default n + depends on FAULT_INJECTION_DEBUG_FS + help + this option enable fault-injection support for UBI devices for + testing purposes.
endif # MTD_UBI diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c index 5a704d598181..24b7775241a8 100644 --- a/drivers/mtd/ubi/debug.c +++ b/drivers/mtd/ubi/debug.c @@ -10,6 +10,23 @@ #include <linux/uaccess.h> #include <linux/module.h> #include <linux/seq_file.h> +#include <linux/fault-inject.h> + +#ifdef CONFIG_MTD_UBI_FAULT_INJECTION +static DECLARE_FAULT_ATTR(fault_bitflips_attr); +static DECLARE_FAULT_ATTR(fault_io_failures_attr); +static DECLARE_FAULT_ATTR(fault_power_cut_attr); + +#define FAIL_ACTION(name, fault_attr) \ +bool should_fail_##name(void) \ +{ \ + return should_fail(&fault_attr, 1); \ +} + +FAIL_ACTION(bitflips, fault_bitflips_attr) +FAIL_ACTION(io_failures, fault_io_failures_attr) +FAIL_ACTION(power_cut, fault_power_cut_attr) +#endif
/** @@ -212,6 +229,32 @@ void ubi_dump_mkvol_req(const struct ubi_mkvol_req *req) */ static struct dentry *dfs_rootdir;
+#ifdef CONFIG_MTD_UBI_FAULT_INJECTION +static void dfs_create_fault_entry(struct dentry *parent) +{ + struct dentry *dir; + + dir = debugfs_create_dir("fault_inject", parent); + + if (IS_ERR_OR_NULL(dir)) { + int err = dir ? PTR_ERR(dir) : -ENODEV; + + pr_warn("UBI error: cannot create "fault_inject" debugfs directory, error %d\n", + err); + return; + } + + fault_create_debugfs_attr("emulate_bitflips", dir, + &fault_bitflips_attr); + + fault_create_debugfs_attr("emulate_io_failures", dir, + &fault_io_failures_attr); + + fault_create_debugfs_attr("emulate_power_cut", dir, + &fault_power_cut_attr); +} +#endif + /** * ubi_debugfs_init - create UBI debugfs directory. * @@ -232,6 +275,10 @@ int ubi_debugfs_init(void) return err; }
+#ifdef CONFIG_MTD_UBI_FAULT_INJECTION + dfs_create_fault_entry(dfs_rootdir); +#endif + return 0; }
@@ -268,27 +315,12 @@ static ssize_t dfs_file_read(struct file *file, char __user *user_buf, val = d->chk_fastmap; else if (dent == d->dfs_disable_bgt) val = d->disable_bgt; - else if (dent == d->dfs_emulate_bitflips) - val = d->emulate_bitflips; - else if (dent == d->dfs_emulate_io_failures) - val = d->emulate_io_failures; - else if (dent == d->dfs_emulate_power_cut) { - snprintf(buf, sizeof(buf), "%u\n", d->emulate_power_cut); + else if (dent == d->dfs_emulate_failures) { + snprintf(buf, sizeof(buf), "%u\n", d->emulate_failures); count = simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf)); goto out; - } else if (dent == d->dfs_power_cut_min) { - snprintf(buf, sizeof(buf), "%u\n", d->power_cut_min); - count = simple_read_from_buffer(user_buf, count, ppos, - buf, strlen(buf)); - goto out; - } else if (dent == d->dfs_power_cut_max) { - snprintf(buf, sizeof(buf), "%u\n", d->power_cut_max); - count = simple_read_from_buffer(user_buf, count, ppos, - buf, strlen(buf)); - goto out; - } - else { + } else { count = -EINVAL; goto out; } @@ -330,20 +362,10 @@ static ssize_t dfs_file_write(struct file *file, const char __user *user_buf, goto out; }
- if (dent == d->dfs_power_cut_min) { - if (kstrtouint(buf, 0, &d->power_cut_min) != 0) - count = -EINVAL; - goto out; - } else if (dent == d->dfs_power_cut_max) { - if (kstrtouint(buf, 0, &d->power_cut_max) != 0) + if (dent == d->dfs_emulate_failures) { + if (kstrtouint(buf, 0, &d->emulate_failures) != 0) count = -EINVAL; goto out; - } else if (dent == d->dfs_emulate_power_cut) { - if (kstrtoint(buf, 0, &val) != 0) - count = -EINVAL; - else - d->emulate_power_cut = val; - goto out; }
if (buf[0] == '1') @@ -363,10 +385,6 @@ static ssize_t dfs_file_write(struct file *file, const char __user *user_buf, d->chk_fastmap = val; else if (dent == d->dfs_disable_bgt) d->disable_bgt = val; - else if (dent == d->dfs_emulate_bitflips) - d->emulate_bitflips = val; - else if (dent == d->dfs_emulate_io_failures) - d->emulate_io_failures = val; else count = -EINVAL;
@@ -386,6 +404,7 @@ static const struct file_operations dfs_fops = { .owner = THIS_MODULE, };
+ /* As long as the position is less then that total number of erase blocks, * we still have more to print. */ @@ -533,32 +552,14 @@ int ubi_debugfs_init_dev(struct ubi_device *ubi) d->dfs_dir, (void *)ubi_num, &dfs_fops);
- d->dfs_emulate_bitflips = debugfs_create_file("tst_emulate_bitflips", - mode, d->dfs_dir, - (void *)ubi_num, - &dfs_fops); - - d->dfs_emulate_io_failures = debugfs_create_file("tst_emulate_io_failures", - mode, d->dfs_dir, - (void *)ubi_num, - &dfs_fops); - - d->dfs_emulate_power_cut = debugfs_create_file("tst_emulate_power_cut", - mode, d->dfs_dir, - (void *)ubi_num, - &dfs_fops); - - d->dfs_power_cut_min = debugfs_create_file("tst_emulate_power_cut_min", - mode, d->dfs_dir, - (void *)ubi_num, &dfs_fops); - - d->dfs_power_cut_max = debugfs_create_file("tst_emulate_power_cut_max", - mode, d->dfs_dir, - (void *)ubi_num, &dfs_fops); - debugfs_create_file("detailed_erase_block_info", S_IRUSR, d->dfs_dir, (void *)ubi_num, &eraseblk_count_fops);
+#ifdef CONFIG_MTD_UBI_FAULT_INJECTION + d->dfs_emulate_failures = debugfs_create_file("emulate_failures", mode, + d->dfs_dir, (void *)ubi_num, + &dfs_fops); +#endif return 0; }
@@ -571,36 +572,3 @@ void ubi_debugfs_exit_dev(struct ubi_device *ubi) if (IS_ENABLED(CONFIG_DEBUG_FS)) debugfs_remove_recursive(ubi->dbg.dfs_dir); } - -/** - * ubi_dbg_power_cut - emulate a power cut if it is time to do so - * @ubi: UBI device description object - * @caller: Flags set to indicate from where the function is being called - * - * Returns non-zero if a power cut was emulated, zero if not. - */ -int ubi_dbg_power_cut(struct ubi_device *ubi, int caller) -{ - unsigned int range; - - if ((ubi->dbg.emulate_power_cut & caller) == 0) - return 0; - - if (ubi->dbg.power_cut_counter == 0) { - ubi->dbg.power_cut_counter = ubi->dbg.power_cut_min; - - if (ubi->dbg.power_cut_max > ubi->dbg.power_cut_min) { - range = ubi->dbg.power_cut_max - ubi->dbg.power_cut_min; - ubi->dbg.power_cut_counter += prandom_u32() % range; - } - return 0; - } - - ubi->dbg.power_cut_counter--; - if (ubi->dbg.power_cut_counter) - return 0; - - ubi_msg(ubi, "XXXXXXXXXXXXXXX emulating a power cut XXXXXXXXXXXXXXXX"); - ubi_ro_mode(ubi); - return 1; -} diff --git a/drivers/mtd/ubi/debug.h b/drivers/mtd/ubi/debug.h index 118248a5d7d4..bb0f9b26b5c9 100644 --- a/drivers/mtd/ubi/debug.h +++ b/drivers/mtd/ubi/debug.h @@ -8,6 +8,18 @@ #ifndef __UBI_DEBUG_H__ #define __UBI_DEBUG_H__
+/** + * MASK_XXX: Mask for emulate_failures in ubi_debug_info.The mask is used to + * precisely control the type and process of fault injection. + */ +/* Emulate bit-flips */ +#define MASK_BITFLIPS (1 << 0) +/* Emulates -EIO during write/erase */ +#define MASK_IO_FAILURE (1 << 1) +/* Emulate a power cut when writing EC/VID header */ +#define MASK_POWER_CUT_EC (1 << 2) +#define MASK_POWER_CUT_VID (1 << 3) + void ubi_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len); void ubi_dump_ec_hdr(const struct ubi_ec_hdr *ec_hdr); void ubi_dump_vid_hdr(const struct ubi_vid_hdr *vid_hdr); @@ -64,47 +76,91 @@ static inline int ubi_dbg_is_bgt_disabled(const struct ubi_device *ubi) return ubi->dbg.disable_bgt; }
+#ifdef CONFIG_MTD_UBI_FAULT_INJECTION + +extern bool should_fail_bitflips(void); +extern bool should_fail_io_failures(void); +extern bool should_fail_power_cut(void); + /** * ubi_dbg_is_bitflip - if it is time to emulate a bit-flip. * @ubi: UBI device description object * - * Returns non-zero if a bit-flip should be emulated, otherwise returns zero. + * Returns true if a bit-flip should be emulated, otherwise returns false. */ -static inline int ubi_dbg_is_bitflip(const struct ubi_device *ubi) +static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) { - if (ubi->dbg.emulate_bitflips) - return !(prandom_u32() % 200); - return 0; + if (ubi->dbg.emulate_failures & MASK_BITFLIPS) + return should_fail_bitflips(); + return false; }
/** * ubi_dbg_is_write_failure - if it is time to emulate a write failure. * @ubi: UBI device description object * - * Returns non-zero if a write failure should be emulated, otherwise returns - * zero. + * Returns true if a write failure should be emulated, otherwise returns + * false. */ -static inline int ubi_dbg_is_write_failure(const struct ubi_device *ubi) +static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) { - if (ubi->dbg.emulate_io_failures) - return !(prandom_u32() % 500); - return 0; + if (ubi->dbg.emulate_failures & MASK_IO_FAILURE) + return should_fail_io_failures(); + return false; }
/** * ubi_dbg_is_erase_failure - if its time to emulate an erase failure. * @ubi: UBI device description object * - * Returns non-zero if an erase failure should be emulated, otherwise returns - * zero. + * Returns true if an erase failure should be emulated, otherwise returns + * false. */ -static inline int ubi_dbg_is_erase_failure(const struct ubi_device *ubi) +static inline bool ubi_dbg_is_erase_failure(const struct ubi_device *ubi) +{ + if (ubi->dbg.emulate_failures & MASK_IO_FAILURE) + return should_fail_io_failures(); + return false; +} + +/** + * ubi_dbg_power_cut - if it is time to emulate power cut. + * @ubi: UBI device description object + * + * Returns true if power cut should be emulated, otherwise returns false. + */ +static inline bool ubi_dbg_power_cut(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_power_cut(); + return false; +} + +#else /* CONFIG_MTD_UBI_FAULT_INJECTION */ + +static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) +{ + return false; +} + +static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) +{ + return false; +} + +static inline bool ubi_dbg_is_erase_failure(const struct ubi_device *ubi) +{ + return false; +} + +static inline bool ubi_dbg_power_cut(const struct ubi_device *ubi, + unsigned int caller) { - if (ubi->dbg.emulate_io_failures) - return !(prandom_u32() % 400); - return 0; + return false; }
+#endif static inline int ubi_dbg_chk_io(const struct ubi_device *ubi) { return ubi->dbg.chk_io; @@ -125,5 +181,4 @@ static inline void ubi_enable_dbg_chk_fastmap(struct ubi_device *ubi) ubi->dbg.chk_fastmap = 1; }
-int ubi_dbg_power_cut(struct ubi_device *ubi, int caller); #endif /* !__UBI_DEBUG_H__ */ diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c index 14d890b00d2c..51bc4a73eaaf 100644 --- a/drivers/mtd/ubi/io.c +++ b/drivers/mtd/ubi/io.c @@ -814,8 +814,11 @@ int ubi_io_write_ec_hdr(struct ubi_device *ubi, int pnum, if (err) return err;
- if (ubi_dbg_power_cut(ubi, POWER_CUT_EC_WRITE)) + if (ubi_dbg_power_cut(ubi, MASK_POWER_CUT_EC)) { + ubi_warn(ubi, "XXXXX emulating a power cut when writing EC header XXXXX"); + ubi_ro_mode(ubi); return -EROFS; + }
err = ubi_io_write(ubi, ec_hdr, pnum, 0, ubi->ec_hdr_alsize); return err; @@ -1069,8 +1072,11 @@ int ubi_io_write_vid_hdr(struct ubi_device *ubi, int pnum, if (err) return err;
- if (ubi_dbg_power_cut(ubi, POWER_CUT_VID_WRITE)) + if (ubi_dbg_power_cut(ubi, MASK_POWER_CUT_VID)) { + ubi_warn(ubi, "XXXXX emulating a power cut when writing VID header XXXXX"); + ubi_ro_mode(ubi); return -EROFS; + }
err = ubi_io_write(ubi, p, pnum, ubi->vid_hdr_aloffset, ubi->vid_hdr_alsize); diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index 4f84607dafb4..c8df4ad91534 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -142,17 +142,6 @@ enum { UBI_BAD_FASTMAP, };
-/* - * Flags for emulate_power_cut in ubi_debug_info - * - * POWER_CUT_EC_WRITE: Emulate a power cut when writing an EC header - * POWER_CUT_VID_WRITE: Emulate a power cut when writing a VID header - */ -enum { - POWER_CUT_EC_WRITE = 0x01, - POWER_CUT_VID_WRITE = 0x02, -}; - /** * struct ubi_vid_io_buf - VID buffer used to read/write VID info to/from the * flash. @@ -397,46 +386,28 @@ struct ubi_wl_entry; * @chk_io: if UBI I/O extra checks are enabled * @chk_fastmap: if UBI fastmap extra checks are enabled * @disable_bgt: disable the background task for testing purposes - * @emulate_bitflips: emulate bit-flips for testing purposes - * @emulate_io_failures: emulate write/erase failures for testing purposes - * @emulate_power_cut: emulate power cut for testing purposes - * @power_cut_counter: count down for writes left until emulated power cut - * @power_cut_min: minimum number of writes before emulating a power cut - * @power_cut_max: maximum number of writes until emulating a power cut + * @emulate_failures: emulate failures for testing purposes * @dfs_dir_name: name of debugfs directory containing files of this UBI device * @dfs_dir: direntry object of the UBI device debugfs directory * @dfs_chk_gen: debugfs knob to enable UBI general extra checks * @dfs_chk_io: debugfs knob to enable UBI I/O extra checks * @dfs_chk_fastmap: debugfs knob to enable UBI fastmap extra checks * @dfs_disable_bgt: debugfs knob to disable the background task - * @dfs_emulate_bitflips: debugfs knob to emulate bit-flips - * @dfs_emulate_io_failures: debugfs knob to emulate write/erase failures - * @dfs_emulate_power_cut: debugfs knob to emulate power cuts - * @dfs_power_cut_min: debugfs knob for minimum writes before power cut - * @dfs_power_cut_max: debugfs knob for maximum writes until power cut + * @dfs_emulate_failures: debugfs entry to control the fault injection type */ struct ubi_debug_info { unsigned int chk_gen:1; unsigned int chk_io:1; unsigned int chk_fastmap:1; unsigned int disable_bgt:1; - unsigned int emulate_bitflips:1; - unsigned int emulate_io_failures:1; - unsigned int emulate_power_cut:2; - unsigned int power_cut_counter; - unsigned int power_cut_min; - unsigned int power_cut_max; + unsigned int emulate_failures; char dfs_dir_name[UBI_DFS_DIR_LEN + 1]; struct dentry *dfs_dir; struct dentry *dfs_chk_gen; struct dentry *dfs_chk_io; struct dentry *dfs_chk_fastmap; struct dentry *dfs_disable_bgt; - struct dentry *dfs_emulate_bitflips; - struct dentry *dfs_emulate_io_failures; - struct dentry *dfs_emulate_power_cut; - struct dentry *dfs_power_cut_min; - struct dentry *dfs_power_cut_max; + struct dentry *dfs_emulate_failures; };
/**
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
The emulate_io_failures debugfs entry controls both write failure and erase failure. This patch split io_failures to write_failure and erase_failure.
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/ubi/debug.c | 14 +++++++++----- drivers/mtd/ubi/debug.h | 18 ++++++++++-------- 2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c index 24b7775241a8..f88dce06b5d1 100644 --- a/drivers/mtd/ubi/debug.c +++ b/drivers/mtd/ubi/debug.c @@ -14,7 +14,8 @@
#ifdef CONFIG_MTD_UBI_FAULT_INJECTION static DECLARE_FAULT_ATTR(fault_bitflips_attr); -static DECLARE_FAULT_ATTR(fault_io_failures_attr); +static DECLARE_FAULT_ATTR(fault_write_failure_attr); +static DECLARE_FAULT_ATTR(fault_erase_failure_attr); static DECLARE_FAULT_ATTR(fault_power_cut_attr);
#define FAIL_ACTION(name, fault_attr) \ @@ -24,7 +25,8 @@ bool should_fail_##name(void) \ }
FAIL_ACTION(bitflips, fault_bitflips_attr) -FAIL_ACTION(io_failures, fault_io_failures_attr) +FAIL_ACTION(write_failure, fault_write_failure_attr) +FAIL_ACTION(erase_failure, fault_erase_failure_attr) FAIL_ACTION(power_cut, fault_power_cut_attr) #endif
@@ -247,8 +249,11 @@ static void dfs_create_fault_entry(struct dentry *parent) fault_create_debugfs_attr("emulate_bitflips", dir, &fault_bitflips_attr);
- fault_create_debugfs_attr("emulate_io_failures", dir, - &fault_io_failures_attr); + fault_create_debugfs_attr("emulate_write_failure", dir, + &fault_write_failure_attr); + + fault_create_debugfs_attr("emulate_erase_failure", dir, + &fault_erase_failure_attr);
fault_create_debugfs_attr("emulate_power_cut", dir, &fault_power_cut_attr); @@ -278,7 +283,6 @@ int ubi_debugfs_init(void) #ifdef CONFIG_MTD_UBI_FAULT_INJECTION dfs_create_fault_entry(dfs_rootdir); #endif - return 0; }
diff --git a/drivers/mtd/ubi/debug.h b/drivers/mtd/ubi/debug.h index bb0f9b26b5c9..0623f419899d 100644 --- a/drivers/mtd/ubi/debug.h +++ b/drivers/mtd/ubi/debug.h @@ -15,10 +15,11 @@ /* Emulate bit-flips */ #define MASK_BITFLIPS (1 << 0) /* Emulates -EIO during write/erase */ -#define MASK_IO_FAILURE (1 << 1) +#define MASK_WRITE_FAILURE (1 << 1) +#define MASK_ERASE_FAILURE (1 << 2) /* Emulate a power cut when writing EC/VID header */ -#define MASK_POWER_CUT_EC (1 << 2) -#define MASK_POWER_CUT_VID (1 << 3) +#define MASK_POWER_CUT_EC (1 << 3) +#define MASK_POWER_CUT_VID (1 << 4)
void ubi_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len); void ubi_dump_ec_hdr(const struct ubi_ec_hdr *ec_hdr); @@ -79,7 +80,8 @@ static inline int ubi_dbg_is_bgt_disabled(const struct ubi_device *ubi) #ifdef CONFIG_MTD_UBI_FAULT_INJECTION
extern bool should_fail_bitflips(void); -extern bool should_fail_io_failures(void); +extern bool should_fail_write_failure(void); +extern bool should_fail_erase_failure(void); extern bool should_fail_power_cut(void);
/** @@ -104,8 +106,8 @@ static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) */ static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) { - if (ubi->dbg.emulate_failures & MASK_IO_FAILURE) - return should_fail_io_failures(); + if (ubi->dbg.emulate_failures & MASK_WRITE_FAILURE) + return should_fail_write_failure(); return false; }
@@ -118,8 +120,8 @@ static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) */ static inline bool ubi_dbg_is_erase_failure(const struct ubi_device *ubi) { - if (ubi->dbg.emulate_failures & MASK_IO_FAILURE) - return should_fail_io_failures(); + if (ubi->dbg.emulate_failures & MASK_ERASE_FAILURE) + return should_fail_erase_failure(); return false; }
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
This commit adds six fault injection type for testing to cover the abnormal path of the UBI driver.
Inject the following faults when the UBI reads the LEB: +----------------------------+-----------------------------------+ | Interface name | emulate behavior | +----------------------------+-----------------------------------+ | emulate_eccerr | ECC error | +----------------------------+-----------------------------------+ | emulate_read_failure | read failure | |----------------------------+-----------------------------------+ | emulate_io_ff | read content as all FF | |----------------------------+-----------------------------------+ | emulate_io_ff_bitflips | content FF with MTD err reported | +----------------------------+-----------------------------------+ | emulate_bad_hdr | bad leb header | |----------------------------+-----------------------------------+ | emulate_bad_hdr_ebadmsg | bad header with ECC err | +----------------------------+-----------------------------------+
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/ubi/debug.c | 30 ++++++++ drivers/mtd/ubi/debug.h | 166 ++++++++++++++++++++++++++++++++++++++-- drivers/mtd/ubi/io.c | 74 +++++++++++++++++- drivers/mtd/ubi/ubi.h | 31 +++++--- 4 files changed, 281 insertions(+), 20 deletions(-)
diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c index f88dce06b5d1..7c5c962142c0 100644 --- a/drivers/mtd/ubi/debug.c +++ b/drivers/mtd/ubi/debug.c @@ -13,10 +13,16 @@ #include <linux/fault-inject.h>
#ifdef CONFIG_MTD_UBI_FAULT_INJECTION +static DECLARE_FAULT_ATTR(fault_eccerr_attr); static DECLARE_FAULT_ATTR(fault_bitflips_attr); +static DECLARE_FAULT_ATTR(fault_read_failure_attr); static DECLARE_FAULT_ATTR(fault_write_failure_attr); static DECLARE_FAULT_ATTR(fault_erase_failure_attr); static DECLARE_FAULT_ATTR(fault_power_cut_attr); +static DECLARE_FAULT_ATTR(fault_io_ff_attr); +static DECLARE_FAULT_ATTR(fault_io_ff_bitflips_attr); +static DECLARE_FAULT_ATTR(fault_bad_hdr_attr); +static DECLARE_FAULT_ATTR(fault_bad_hdr_ebadmsg_attr);
#define FAIL_ACTION(name, fault_attr) \ bool should_fail_##name(void) \ @@ -24,10 +30,16 @@ bool should_fail_##name(void) \ return should_fail(&fault_attr, 1); \ }
+FAIL_ACTION(eccerr, fault_eccerr_attr) FAIL_ACTION(bitflips, fault_bitflips_attr) +FAIL_ACTION(read_failure, fault_read_failure_attr) FAIL_ACTION(write_failure, fault_write_failure_attr) FAIL_ACTION(erase_failure, fault_erase_failure_attr) FAIL_ACTION(power_cut, fault_power_cut_attr) +FAIL_ACTION(io_ff, fault_io_ff_attr) +FAIL_ACTION(io_ff_bitflips, fault_io_ff_bitflips_attr) +FAIL_ACTION(bad_hdr, fault_bad_hdr_attr) +FAIL_ACTION(bad_hdr_ebadmsg, fault_bad_hdr_ebadmsg_attr) #endif
@@ -246,9 +258,15 @@ static void dfs_create_fault_entry(struct dentry *parent) return; }
+ fault_create_debugfs_attr("emulate_eccerr", dir, + &fault_eccerr_attr); + fault_create_debugfs_attr("emulate_bitflips", dir, &fault_bitflips_attr);
+ fault_create_debugfs_attr("emulate_read_failure", dir, + &fault_read_failure_attr); + fault_create_debugfs_attr("emulate_write_failure", dir, &fault_write_failure_attr);
@@ -257,6 +275,18 @@ static void dfs_create_fault_entry(struct dentry *parent)
fault_create_debugfs_attr("emulate_power_cut", dir, &fault_power_cut_attr); + + fault_create_debugfs_attr("emulate_io_ff", dir, + &fault_io_ff_attr); + + fault_create_debugfs_attr("emulate_io_ff_bitflips", dir, + &fault_io_ff_bitflips_attr); + + fault_create_debugfs_attr("emulate_bad_hdr", dir, + &fault_bad_hdr_attr); + + fault_create_debugfs_attr("emulate_bad_hdr_ebadmsg", dir, + &fault_bad_hdr_ebadmsg_attr); } #endif
diff --git a/drivers/mtd/ubi/debug.h b/drivers/mtd/ubi/debug.h index 0623f419899d..5434b5f599d3 100644 --- a/drivers/mtd/ubi/debug.h +++ b/drivers/mtd/ubi/debug.h @@ -13,13 +13,34 @@ * precisely control the type and process of fault injection. */ /* Emulate bit-flips */ -#define MASK_BITFLIPS (1 << 0) -/* Emulates -EIO during write/erase */ -#define MASK_WRITE_FAILURE (1 << 1) -#define MASK_ERASE_FAILURE (1 << 2) +#define MASK_BITFLIPS (1 << 0) +/* Emulate ecc error */ +#define MASK_ECCERR (1 << 1) +/* Emulates -EIO during data read */ +#define MASK_READ_FAILURE (1 << 2) +#define MASK_READ_FAILURE_EC (1 << 3) +#define MASK_READ_FAILURE_VID (1 << 4) +/* Emulates -EIO during data write */ +#define MASK_WRITE_FAILURE (1 << 5) +/* Emulates -EIO during erase a PEB*/ +#define MASK_ERASE_FAILURE (1 << 6) /* Emulate a power cut when writing EC/VID header */ -#define MASK_POWER_CUT_EC (1 << 3) -#define MASK_POWER_CUT_VID (1 << 4) +#define MASK_POWER_CUT_EC (1 << 7) +#define MASK_POWER_CUT_VID (1 << 8) +/* Emulate a power cut when writing data*/ +#define MASK_POWER_CUT_DATA (1 << 9) +/* Return UBI_IO_FF when reading EC/VID header */ +#define MASK_IO_FF_EC (1 << 10) +#define MASK_IO_FF_VID (1 << 11) +/* Return UBI_IO_FF_BITFLIPS when reading EC/VID header */ +#define MASK_IO_FF_BITFLIPS_EC (1 << 12) +#define MASK_IO_FF_BITFLIPS_VID (1 << 13) +/* Return UBI_IO_BAD_HDR when reading EC/VID header */ +#define MASK_BAD_HDR_EC (1 << 14) +#define MASK_BAD_HDR_VID (1 << 15) +/* Return UBI_IO_BAD_HDR_EBADMSG when reading EC/VID header */ +#define MASK_BAD_HDR_EBADMSG_EC (1 << 16) +#define MASK_BAD_HDR_EBADMSG_VID (1 << 17)
void ubi_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len); void ubi_dump_ec_hdr(const struct ubi_ec_hdr *ec_hdr); @@ -79,10 +100,16 @@ static inline int ubi_dbg_is_bgt_disabled(const struct ubi_device *ubi)
#ifdef CONFIG_MTD_UBI_FAULT_INJECTION
+extern bool should_fail_eccerr(void); extern bool should_fail_bitflips(void); +extern bool should_fail_read_failure(void); extern bool should_fail_write_failure(void); extern bool should_fail_erase_failure(void); extern bool should_fail_power_cut(void); +extern bool should_fail_io_ff(void); +extern bool should_fail_io_ff_bitflips(void); +extern bool should_fail_bad_hdr(void); +extern bool should_fail_bad_hdr_ebadmsg(void);
/** * ubi_dbg_is_bitflip - if it is time to emulate a bit-flip. @@ -97,6 +124,34 @@ static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) return false; }
+/** + * ubi_dbg_is_eccerr - if it is time to emulate ECC error. + * @ubi: UBI device description object + * + * Returns true if a ECC error should be emulated, otherwise returns false. + */ +static inline bool ubi_dbg_is_eccerr(const struct ubi_device *ubi) +{ + if (ubi->dbg.emulate_failures & MASK_ECCERR) + return should_fail_eccerr(); + return false; +} + +/** + * ubi_dbg_is_read_failure - if it is time to emulate a read failure. + * @ubi: UBI device description object + * + * Returns true if a read failure should be emulated, otherwise returns + * false. + */ +static inline bool ubi_dbg_is_read_failure(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_read_failure(); + return false; +} + /** * ubi_dbg_is_write_failure - if it is time to emulate a write failure. * @ubi: UBI device description object @@ -139,6 +194,70 @@ static inline bool ubi_dbg_power_cut(const struct ubi_device *ubi, return false; }
+/** + * ubi_dbg_is_ff - if it is time to emulate that read region is only 0xFF. + * @ubi: UBI device description object + * + * Returns true if read region should be emulated 0xFF, otherwise + * returns false. + */ +static inline bool ubi_dbg_is_ff(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_io_ff(); + return false; +} + +/** + * ubi_dbg_is_ff_bitflips - if it is time to emulate that read region is only 0xFF + * with error reported by the MTD driver + * + * @ubi: UBI device description object + * + * Returns true if read region should be emulated 0xFF and error + * reported by the MTD driver, otherwise returns false. + */ +static inline bool ubi_dbg_is_ff_bitflips(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_io_ff_bitflips(); + return false; +} + +/** + * ubi_dbg_is_bad_hdr - if it is time to emulate a bad header + * @ubi: UBI device description object + * + * Returns true if a bad header error should be emulated, otherwise + * returns false. + */ +static inline bool ubi_dbg_is_bad_hdr(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_bad_hdr(); + return false; +} + +/** + * ubi_dbg_is_bad_hdr_ebadmsg - if it is time to emulate a bad header with + * ECC error. + * + * @ubi: UBI device description object + * + * Returns true if a bad header with ECC error should be emulated, otherwise + * returns false. + */ +static inline bool ubi_dbg_is_bad_hdr_ebadmsg(const struct ubi_device *ubi, + unsigned int caller) +{ + if (ubi->dbg.emulate_failures & caller) + return should_fail_bad_hdr_ebadmsg(); + return false; +} + #else /* CONFIG_MTD_UBI_FAULT_INJECTION */
static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) @@ -146,6 +265,17 @@ static inline bool ubi_dbg_is_bitflip(const struct ubi_device *ubi) return false; }
+static inline bool ubi_dbg_is_eccerr(const struct ubi_device *ubi) +{ + return false; +} + +static inline bool ubi_dbg_is_read_failure(const struct ubi_device *ubi, + unsigned int caller) +{ + return false; +} + static inline bool ubi_dbg_is_write_failure(const struct ubi_device *ubi) { return false; @@ -162,6 +292,30 @@ static inline bool ubi_dbg_power_cut(const struct ubi_device *ubi, return false; }
+static inline bool ubi_dbg_is_ff(const struct ubi_device *ubi, + unsigned int caller) +{ + return false; +} + +static inline bool ubi_dbg_is_ff_bitflips(const struct ubi_device *ubi, + unsigned int caller) +{ + return false; +} + +static inline bool ubi_dbg_is_bad_hdr(const struct ubi_device *ubi, + unsigned int caller) +{ + return false; +} + +static inline bool ubi_dbg_is_bad_hdr_ebadmsg(const struct ubi_device *ubi, + unsigned int caller) +{ + return false; +} + #endif static inline int ubi_dbg_chk_io(const struct ubi_device *ubi) { diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c index 51bc4a73eaaf..a10b0e0c77e3 100644 --- a/drivers/mtd/ubi/io.c +++ b/drivers/mtd/ubi/io.c @@ -193,6 +193,18 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, int pnum, int offset, } else { ubi_assert(len == read);
+ if (ubi_dbg_is_read_failure(ubi, MASK_READ_FAILURE)) { + ubi_warn(ubi, "cannot read %d bytes from PEB %d:%d (emulated)", + len, pnum, offset); + return -EIO; + } + + if (ubi_dbg_is_eccerr(ubi)) { + ubi_warn(ubi, "ECC error (emulated) while reading %d bytes from PEB %d:%d, read %zd bytes", + len, pnum, offset, read); + return -EBADMSG; + } + if (ubi_dbg_is_bitflip(ubi)) { dbg_gen("bit-flip (emulated)"); err = UBI_IO_BITFLIPS; @@ -775,7 +787,36 @@ int ubi_io_read_ec_hdr(struct ubi_device *ubi, int pnum, * If there was %-EBADMSG, but the header CRC is still OK, report about * a bit-flip to force scrubbing on this PEB. */ - return read_err ? UBI_IO_BITFLIPS : 0; + if (read_err) + return UBI_IO_BITFLIPS; + + if (ubi_dbg_is_read_failure(ubi, MASK_READ_FAILURE_EC)) { + ubi_warn(ubi, "cannot read EC header from PEB %d(emulated)", + pnum); + return -EIO; + } + + if (ubi_dbg_is_ff(ubi, MASK_IO_FF_EC)) { + ubi_warn(ubi, "bit-all-ff (emulated)"); + return UBI_IO_FF; + } + + if (ubi_dbg_is_ff_bitflips(ubi, MASK_IO_FF_BITFLIPS_EC)) { + ubi_warn(ubi, "bit-all-ff with error reported by MTD driver (emulated)"); + return UBI_IO_FF_BITFLIPS; + } + + if (ubi_dbg_is_bad_hdr(ubi, MASK_BAD_HDR_EC)) { + ubi_warn(ubi, "bad_hdr (emulated)"); + return UBI_IO_BAD_HDR; + } + + if (ubi_dbg_is_bad_hdr_ebadmsg(ubi, MASK_BAD_HDR_EBADMSG_EC)) { + ubi_warn(ubi, "bad_hdr with ECC error (emulated)"); + return UBI_IO_BAD_HDR_EBADMSG; + } + + return 0; }
/** @@ -1030,7 +1071,36 @@ int ubi_io_read_vid_hdr(struct ubi_device *ubi, int pnum, return -EINVAL; }
- return read_err ? UBI_IO_BITFLIPS : 0; + if (read_err) + return UBI_IO_BITFLIPS; + + if (ubi_dbg_is_read_failure(ubi, MASK_READ_FAILURE_VID)) { + ubi_warn(ubi, "cannot read VID header from PEB %d(emulated)", + pnum); + return -EIO; + } + + if (ubi_dbg_is_ff(ubi, MASK_IO_FF_VID)) { + ubi_warn(ubi, "bit-all-ff (emulated)\n"); + return UBI_IO_FF; + } + + if (ubi_dbg_is_ff_bitflips(ubi, MASK_IO_FF_BITFLIPS_VID)) { + ubi_warn(ubi, "bit-all-ff with error reported by MTD driver (emulated)\n"); + return UBI_IO_FF_BITFLIPS; + } + + if (ubi_dbg_is_bad_hdr(ubi, MASK_BAD_HDR_VID)) { + ubi_warn(ubi, "bad_hdr (emulated)\n"); + return UBI_IO_BAD_HDR; + } + + if (ubi_dbg_is_bad_hdr_ebadmsg(ubi, MASK_BAD_HDR_EBADMSG_VID)) { + ubi_warn(ubi, "bad_hdr with ECC error (emulated)\n"); + return UBI_IO_BAD_HDR_EBADMSG; + } + + return 0; }
/** diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index c8df4ad91534..2a576891fe73 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -1109,18 +1109,6 @@ static inline int ubi_io_read_data(const struct ubi_device *ubi, void *buf, return ubi_io_read(ubi, buf, pnum, offset + ubi->leb_start, len); }
-/* - * This function is equivalent to 'ubi_io_write()', but @offset is relative to - * the beginning of the logical eraseblock, not to the beginning of the - * physical eraseblock. - */ -static inline int ubi_io_write_data(struct ubi_device *ubi, const void *buf, - int pnum, int offset, int len) -{ - ubi_assert(offset >= 0); - return ubi_io_write(ubi, buf, pnum, offset + ubi->leb_start, len); -} - /** * ubi_ro_mode - switch to read-only mode. * @ubi: UBI device description object @@ -1134,6 +1122,25 @@ static inline void ubi_ro_mode(struct ubi_device *ubi) } }
+/* + * This function is equivalent to 'ubi_io_write()', but @offset is relative to + * the beginning of the logical eraseblock, not to the beginning of the + * physical eraseblock. + */ +static inline int ubi_io_write_data(struct ubi_device *ubi, const void *buf, + int pnum, int offset, int len) +{ + ubi_assert(offset >= 0); + + if (ubi_dbg_power_cut(ubi, MASK_POWER_CUT_DATA)) { + ubi_warn(ubi, "XXXXX emulating a power cut when writing data XXXXX"); + ubi_ro_mode(ubi); + return -EROFS; + } + + return ubi_io_write(ubi, buf, pnum, offset + ubi->leb_start, len); +} + /** * vol_id2idx - get table index by volume ID. * @ubi: UBI device description object
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
Because the mask received by the emulate_failures interface is a 32-bit unsigned integer, ensure that there is sufficient buffer length to receive and display this value.
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/ubi/debug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c index 7c5c962142c0..c24fd1dae458 100644 --- a/drivers/mtd/ubi/debug.c +++ b/drivers/mtd/ubi/debug.c @@ -333,7 +333,7 @@ static ssize_t dfs_file_read(struct file *file, char __user *user_buf, struct dentry *dent = file->f_path.dentry; struct ubi_device *ubi; struct ubi_debug_info *d; - char buf[8]; + char buf[12]; int val;
ubi = ubi_get_device(ubi_num); @@ -382,7 +382,7 @@ static ssize_t dfs_file_write(struct file *file, const char __user *user_buf, struct ubi_device *ubi; struct ubi_debug_info *d; size_t buf_size; - char buf[8] = {0}; + char buf[14] = {0}; int val;
ubi = ubi_get_device(ubi_num);
From: ZhaoLong Wang wangzhaolong1@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I60L5U CVE: NA
-----------------------------------------
add mtd_read() mtd_write() mtd_erase() to fail_function list for testing purpose
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/mtd/mtdcore.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index f71808d5abc3..60ba69e11416 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -28,6 +28,7 @@ #include <linux/leds.h> #include <linux/debugfs.h> #include <linux/nvmem-provider.h> +#include <linux/error-injection.h>
#include <linux/mtd/mtd.h> #include <linux/mtd/partitions.h> @@ -1160,6 +1161,7 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info *instr) return ret; } EXPORT_SYMBOL_GPL(mtd_erase); +ALLOW_ERROR_INJECTION(mtd_erase, ERRNO);
/* * This stuff for eXecute-In-Place. phys is optional and may be set to NULL. @@ -1257,6 +1259,7 @@ int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, return ret; } EXPORT_SYMBOL_GPL(mtd_read); +ALLOW_ERROR_INJECTION(mtd_read, ERRNO);
int mtd_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, const u_char *buf) @@ -1273,6 +1276,7 @@ int mtd_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, return ret; } EXPORT_SYMBOL_GPL(mtd_write); +ALLOW_ERROR_INJECTION(mtd_write, ERRNO);
/* * In blackbox flight recorder like scenarios we want to make successful writes