Backport bugfix patches for mm/fs from upstream.
Ard Biesheuvel (1): arm64: mm: account for hotplug memory when randomizing the linear region
Dan Carpenter (1): crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
Florian Fainelli (1): ARM: Qualify enabling of swiotlb_init()
Guo Xuenan (3): Revert "compiler: remove CONFIG_OPTIMIZE_INLINING entirely" disable OPTIMIZE_INLINING by default make OPTIMIZE_INLINING config editable
Johannes Weiner (1): mm: vmscan: fix missing psi annotation for node_reclaim()
John Garry (1): blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
Lu Baolu (1): iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries
Piotr Krysiuk (1): bpf, mips: Validate conditional branch offsets
Rafael Aquini (1): ipc: replace costly bailout check in sysvipc_find_ipc()
Sanjay Kumar (1): iommu/vt-d: Global devTLB flush when present context entry changed
Vlastimil Babka (1): mm: slub: fix slub_debug disabling for list of slabs
Xu Kuohai (1): bpf: Fix integer overflow in prealloc_elems_and_freelist()
yangerkun (1): ext4: flush s_error_work before journal destroy in ext4_fill_super
arch/arm/mm/init.c | 6 +- arch/arm64/kvm/sys_regs.h | 5 ++ arch/arm64/mm/init.c | 13 +++-- arch/mips/net/bpf_jit.c | 57 ++++++++++++++----- arch/x86/configs/i386_defconfig | 1 + arch/x86/configs/x86_64_defconfig | 1 + block/blk-mq-sched.c | 19 ++----- drivers/crypto/ccp/ccp-ops.c | 14 +++-- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++ .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 5 +- drivers/iommu/intel/iommu.c | 34 +++++++---- .../pci/hive_isp_css_include/print_support.h | 4 ++ fs/ext4/super.c | 5 +- include/linux/compiler_types.h | 8 +++ ipc/util.c | 16 ++---- kernel/bpf/stackmap.c | 3 +- kernel/configs/tiny.config | 1 + lib/Kconfig.debug | 13 +++++ mm/slub.c | 13 +++-- mm/vmscan.c | 3 + 20 files changed, 154 insertions(+), 72 deletions(-)
From: Guo Xuenan guoxuenan@huawei.com
hulk inclusion category: bugfix bugzilla: 182617 https://gitee.com/openeuler/kernel/issues/I4DDEL
--------------------------------
This reverts commit 889b3c1245de48ed0cacf7aebb25c489d3e4a3e9.
Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/x86/configs/i386_defconfig | 1 + arch/x86/configs/x86_64_defconfig | 1 + include/linux/compiler_types.h | 8 ++++++++ kernel/configs/tiny.config | 1 + lib/Kconfig.debug | 12 ++++++++++++ 5 files changed, 23 insertions(+)
diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig index 78210793d357..ce8a6a514afd 100644 --- a/arch/x86/configs/i386_defconfig +++ b/arch/x86/configs/i386_defconfig @@ -264,3 +264,4 @@ CONFIG_BLK_DEV_IO_TRACE=y CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_BOOT_PARAMS=y +CONFIG_OPTIMIZE_INLINING=y diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig index 9936528e1939..8ee371591a99 100644 --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -260,3 +260,4 @@ CONFIG_BLK_DEV_IO_TRACE=y CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_DEBUG_BOOT_PARAMS=y +CONFIG_OPTIMIZE_INLINING=y diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 2a1c202baa1f..273609a314c7 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -135,13 +135,21 @@ struct ftrace_likely_data { #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
/* + * Force always-inline if the user requests it so via the .config. * Prefer gnu_inline, so that extern inline functions do not emit an * externally visible function. This makes extern inline behave as per gnu89 * semantics rather than c99. This prevents multiple symbol definition errors * of extern inline functions at link time. * A lot of inline functions can cause havoc with function tracing. + * Do not use __always_inline here, since currently it expands to inline again + * (which would break users of __always_inline). */ +#if !defined(CONFIG_OPTIMIZE_INLINING) +#define inline inline __attribute__((__always_inline__)) __gnu_inline \ + __inline_maybe_unused notrace +#else #define inline inline __gnu_inline __inline_maybe_unused notrace +#endif
/* * gcc provides both __inline__ and __inline as alternate spellings of diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config index 8a44b93da0f3..7fa0c4ae6394 100644 --- a/kernel/configs/tiny.config +++ b/kernel/configs/tiny.config @@ -6,6 +6,7 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set +CONFIG_OPTIMIZE_INLINING=y # CONFIG_SLAB is not set # CONFIG_SLUB is not set CONFIG_SLOB=y diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index bc98ac1cbb0f..ebd46c8d5fb8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -333,6 +333,18 @@ config HEADERS_INSTALL user-space program samples. It is also needed by some features such as uapi header sanity checks.
+config OPTIMIZE_INLINING + def_bool y + help + This option determines if the kernel forces gcc to inline the functions + developers have marked 'inline'. Doing so takes away freedom from gcc to + do what it thinks is best, which is desirable for the gcc 3.x series of + compilers. The gcc 4.x series have a rewritten inlining algorithm and + enabling this option will generate a smaller kernel there. Hopefully + this algorithm is so good that allowing gcc 4.x and above to make the + decision will become the default in the future. Until then this option + is there to test gcc for this. + config DEBUG_SECTION_MISMATCH bool "Enable full Section mismatch analysis" help
From: Guo Xuenan guoxuenan@huawei.com
hulk inclusion category: bugfix bugzilla: 182617 https://gitee.com/openeuler/kernel/issues/I4DDEL
--------------------------------
for performance reasons, hulk 5.10 do not use OPTIMIZE_INLINING. it using gnu_inline attribute causing inline functions not really inline, which introducing performance issues,so we disable it and adapt some link conflicting functions.
Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/kvm/sys_regs.h | 5 +++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 +++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 5 ++++- .../media/atomisp/pci/hive_isp_css_include/print_support.h | 4 ++++ lib/Kconfig.debug | 2 +- 5 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h index 0f95964339b1..5f3d92b2568d 100644 --- a/arch/arm64/kvm/sys_regs.h +++ b/arch/arm64/kvm/sys_regs.h @@ -63,8 +63,13 @@ struct sys_reg_desc { #define REG_RAZ (1 << 1) /* RAZ from userspace and guest */
static __printf(2, 3) +#if defined(CONFIG_OPTIMIZE_INLINING) inline void print_sys_reg_msg(const struct sys_reg_params *p, char *fmt, ...) +#else +void print_sys_reg_msg(const struct sys_reg_params *p, + char *fmt, ...) +#endif { va_list va;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 0e64c39a2372..a0b2e4f2d43f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -168,8 +168,13 @@ static uint32_t __calc_hdr_byte_sum(struct amdgpu_ras_eeprom_control *control) return tbl_sum; }
+#if defined(CONFIG_OPTIMIZE_INLINING) +static inline uint32_t __calc_recs_byte_sum(struct eeprom_table_record *records, + int num) +#else static uint32_t __calc_recs_byte_sum(struct eeprom_table_record *records, int num) +#endif { int i, j; uint32_t tbl_sum = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h index c7a5e5c7c61e..d02019a2cbbb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h @@ -88,8 +88,11 @@ int amdgpu_ras_eeprom_process_recods(struct amdgpu_ras_eeprom_control *control, struct eeprom_table_record *records, bool write, int num); - +#if defined(CONFIG_OPTIMIZE_INLINING) inline uint32_t amdgpu_ras_eeprom_get_record_max_length(void); +#else +uint32_t amdgpu_ras_eeprom_get_record_max_length(void); +#endif
void amdgpu_ras_eeprom_test(struct amdgpu_ras_eeprom_control *control);
diff --git a/drivers/staging/media/atomisp/pci/hive_isp_css_include/print_support.h b/drivers/staging/media/atomisp/pci/hive_isp_css_include/print_support.h index 540b405cc0f7..6518eee7af83 100644 --- a/drivers/staging/media/atomisp/pci/hive_isp_css_include/print_support.h +++ b/drivers/staging/media/atomisp/pci/hive_isp_css_include/print_support.h @@ -20,7 +20,11 @@
extern int (*sh_css_printf)(const char *fmt, va_list args); /* depends on host supplied print function in ia_css_init() */ +#if defined(CONFIG_OPTIMIZE_INLINING) static inline __printf(1, 2) void ia_css_print(const char *fmt, ...) +#else +static __printf(1, 2) void ia_css_print(const char *fmt, ...) +#endif { va_list ap;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ebd46c8d5fb8..a1a835c5d6cc 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -334,7 +334,7 @@ config HEADERS_INSTALL as uapi header sanity checks.
config OPTIMIZE_INLINING - def_bool y + def_bool n help This option determines if the kernel forces gcc to inline the functions developers have marked 'inline'. Doing so takes away freedom from gcc to
From: John Garry john.garry@huawei.com
mainline inclusion from mainline-v5.14-rc1 commit b93af3055d6f32d3b0361cfdb110c9399c1241ba category: bugfix bugzilla: 177012 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
If the blk_mq_sched_alloc_tags() -> blk_mq_alloc_rqs() call fails, then we call blk_mq_sched_free_tags() -> blk_mq_free_rqs().
It is incorrect to do so, as any rqs would have already been freed in the blk_mq_alloc_rqs() call.
Fix by calling blk_mq_free_rq_map() only directly.
Fixes: 6917ff0b5bd41 ("blk-mq-sched: refactor scheduler initialization") Signed-off-by: John Garry john.garry@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/1627378373-148090-1-git-send-email-john.garry@huaw... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- block/blk-mq-sched.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 581be65a53c1..d495c5fe5e6d 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -505,19 +505,6 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, percpu_ref_put(&q->q_usage_counter); }
-static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set, - struct blk_mq_hw_ctx *hctx, - unsigned int hctx_idx) -{ - unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED; - - if (hctx->sched_tags) { - blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx); - blk_mq_free_rq_map(hctx->sched_tags, flags); - hctx->sched_tags = NULL; - } -} - static int blk_mq_sched_alloc_tags(struct request_queue *q, struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) @@ -533,8 +520,10 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q, return -ENOMEM;
ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests); - if (ret) - blk_mq_sched_free_tags(set, hctx, hctx_idx); + if (ret) { + blk_mq_free_rq_map(hctx->sched_tags, set->flags); + hctx->sched_tags = NULL; + }
return ret; }
From: Ard Biesheuvel ardb@kernel.org
mainline inclusion from mainline-v5.11-rc1 commit 97d6786e0669daa5c2f2d07a057f574e849dfd3e category: bugfix bugzilla: 175103 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
As a hardening measure, we currently randomize the placement of physical memory inside the linear region when KASLR is in effect. Since the random offset at which to place the available physical memory inside the linear region is chosen early at boot, it is based on the memblock description of memory, which does not cover hotplug memory. The consequence of this is that the randomization offset may be chosen such that any hotplugged memory located above memblock_end_of_DRAM() that appears later is pushed off the end of the linear region, where it cannot be accessed.
So let's limit this randomization of the linear region to ensure that this can no longer happen, by using the CPU's addressable PA range instead. As it is guaranteed that no hotpluggable memory will appear that falls outside of that range, we can safely put this PA range sized window anywhere in the linear region.
Signed-off-by: Ard Biesheuvel ardb@kernel.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Will Deacon will@kernel.org Cc: Steven Price steven.price@arm.com Cc: Robin Murphy robin.murphy@arm.com Link: https://lore.kernel.org/r/20201014081857.3288-1-ardb@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm64/mm/init.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 3b9401ee9c58..2da4cee7f3a3 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -575,15 +575,18 @@ void __init arm64_memblock_init(void)
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { extern u16 memstart_offset_seed; - u64 range = linear_region_size - - (memblock_end_of_DRAM() - memblock_start_of_DRAM()); + u64 mmfr0 = read_cpuid(ID_AA64MMFR0_EL1); + int parange = cpuid_feature_extract_unsigned_field( + mmfr0, ID_AA64MMFR0_PARANGE_SHIFT); + s64 range = linear_region_size - + BIT(id_aa64mmfr0_parange_to_phys_shift(parange));
/* * If the size of the linear region exceeds, by a sufficient - * margin, the size of the region that the available physical - * memory spans, randomize the linear region as well. + * margin, the size of the region that the physical memory can + * span, randomize the linear region as well. */ - if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { + if (memstart_offset_seed > 0 && range >= (s64)ARM64_MEMSTART_ALIGN) { range /= ARM64_MEMSTART_ALIGN; memstart_addr -= ARM64_MEMSTART_ALIGN * ((range * memstart_offset_seed) >> 16);
From: Florian Fainelli f.fainelli@gmail.com
mainline inclusion from mainline-v5.13-rc1 commit fcf044891c84e38fc90eb736b818781bccf94e38 category: bugfix bugzilla: 106749 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-----------------------------------------------
We do not need a SWIOTLB unless we have DRAM that is addressable beyond the arm_dma_limit. Compare max_pfn with arm_dma_pfn_limit to determine whether we do need a SWIOTLB to be initialized.
Fixes: ad3c7b18c5b3 ("arm: use swiotlb for bounce buffering on LPAE configs") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Peng Liu liupeng256@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/arm/mm/init.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index d54d69cf1732..75f3ab531bdf 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -378,7 +378,11 @@ static void __init free_highpages(void) void __init mem_init(void) { #ifdef CONFIG_ARM_LPAE - swiotlb_init(1); + if (swiotlb_force == SWIOTLB_FORCE || + max_pfn > arm_dma_pfn_limit) + swiotlb_init(1); + else + swiotlb_force = SWIOTLB_NO_FORCE; #endif
set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
From: Piotr Krysiuk piotras@gmail.com
maillist inclusion category: bugfix bugzilla: 182756 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-38300
Reference: https://lore.kernel.org/bpf/20210915160437.4080-1-piotras@gmail.com/
-------------------------------------------------
The conditional branch instructions on MIPS use 18-bit signed offsets allowing for a branch range of 128 KBytes (backward and forward). However, this limit is not observed by the cBPF JIT compiler, and so the JIT compiler emits out-of-range branches when translating certain cBPF programs. A specific example of such a cBPF program is included in the "BPF_MAXINSNS: exec all MSH" test from lib/test_bpf.c that executes anomalous machine code containing incorrect branch offsets under JIT.
Furthermore, this issue can be abused to craft undesirable machine code, where the control flow is hijacked to execute arbitrary Kernel code.
The following steps can be used to reproduce the issue:
# echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_name="BPF_MAXINSNS: exec all MSH"
This should produce multiple warnings from build_bimm() similar to:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 209 at arch/mips/mm/uasm-mips.c:210 build_insn+0x558/0x590 Micro-assembler field overflow Modules linked in: test_bpf(+) CPU: 0 PID: 209 Comm: modprobe Not tainted 5.14.3 #1 Stack : 00000000 807bb824 82b33c9c 801843c0 00000000 00000004 00000000 63c9b5ee 82b33af4 80999898 80910000 80900000 82fd6030 00000001 82b33a98 82087180 00000000 00000000 80873b28 00000000 000000fc 82b3394c 00000000 2e34312e 6d6d6f43 809a180f 809a1836 6f6d203a 80900000 00000001 82b33bac 80900000 00027f80 00000000 00000000 807bb824 00000000 804ed790 001cc317 00000001 [...] Call Trace: [<80108f44>] show_stack+0x38/0x118 [<807a7aac>] dump_stack_lvl+0x5c/0x7c [<807a4b3c>] __warn+0xcc/0x140 [<807a4c3c>] warn_slowpath_fmt+0x8c/0xb8 [<8011e198>] build_insn+0x558/0x590 [<8011e358>] uasm_i_bne+0x20/0x2c [<80127b48>] build_body+0xa58/0x2a94 [<80129c98>] bpf_jit_compile+0x114/0x1e4 [<80613fc4>] bpf_prepare_filter+0x2ec/0x4e4 [<8061423c>] bpf_prog_create+0x80/0xc4 [<c0a006e4>] test_bpf_init+0x300/0xba8 [test_bpf] [<8010051c>] do_one_initcall+0x50/0x1d4 [<801c5e54>] do_init_module+0x60/0x220 [<801c8b20>] sys_finit_module+0xc4/0xfc [<801144d0>] syscall_common+0x34/0x58 [...] ---[ end trace a287d9742503c645 ]---
Then the anomalous machine code executes:
=> 0xc0a18000: addiu sp,sp,-16 0xc0a18004: sw s3,0(sp) 0xc0a18008: sw s4,4(sp) 0xc0a1800c: sw s5,8(sp) 0xc0a18010: sw ra,12(sp) 0xc0a18014: move s5,a0 0xc0a18018: move s4,zero 0xc0a1801c: move s3,zero
# __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a18020: lui t6,0x8012 0xc0a18024: ori t4,t6,0x9e14 0xc0a18028: li a1,0 0xc0a1802c: jalr t4 0xc0a18030: move a0,s5 0xc0a18034: bnez v0,0xc0a1ffb8 # incorrect branch offset 0xc0a18038: move v0,zero 0xc0a1803c: andi s4,s3,0xf 0xc0a18040: b 0xc0a18048 0xc0a18044: sll s4,s4,0x2 [...]
# __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a1ffa0: lui t6,0x8012 0xc0a1ffa4: ori t4,t6,0x9e14 0xc0a1ffa8: li a1,0 0xc0a1ffac: jalr t4 0xc0a1ffb0: move a0,s5 0xc0a1ffb4: bnez v0,0xc0a1ffb8 # incorrect branch offset 0xc0a1ffb8: move v0,zero 0xc0a1ffbc: andi s4,s3,0xf 0xc0a1ffc0: b 0xc0a1ffc8 0xc0a1ffc4: sll s4,s4,0x2
# __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0) 0xc0a1ffc8: lui t6,0x8012 0xc0a1ffcc: ori t4,t6,0x9e14 0xc0a1ffd0: li a1,0 0xc0a1ffd4: jalr t4 0xc0a1ffd8: move a0,s5 0xc0a1ffdc: bnez v0,0xc0a3ffb8 # correct branch offset 0xc0a1ffe0: move v0,zero 0xc0a1ffe4: andi s4,s3,0xf 0xc0a1ffe8: b 0xc0a1fff0 0xc0a1ffec: sll s4,s4,0x2 [...]
# epilogue 0xc0a3ffb8: lw s3,0(sp) 0xc0a3ffbc: lw s4,4(sp) 0xc0a3ffc0: lw s5,8(sp) 0xc0a3ffc4: lw ra,12(sp) 0xc0a3ffc8: addiu sp,sp,16 0xc0a3ffcc: jr ra 0xc0a3ffd0: nop
To mitigate this issue, we assert the branch ranges for each emit call that could generate an out-of-range branch.
Fixes: 36366e367ee9 ("MIPS: BPF: Restore MIPS32 cBPF JIT") Fixes: c6610de353da ("MIPS: net: Add BPF JIT") Signed-off-by: Piotr Krysiuk piotras@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Johan Almbladh johan.almbladh@anyfinetworks.com Acked-by: Johan Almbladh johan.almbladh@anyfinetworks.com Cc: Paul Burton paulburton@kernel.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Link: https://lore.kernel.org/bpf/20210915160437.4080-1-piotras@gmail.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/mips/net/bpf_jit.c | 57 +++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 14 deletions(-)
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c index 0af88622c619..cb6d22439f71 100644 --- a/arch/mips/net/bpf_jit.c +++ b/arch/mips/net/bpf_jit.c @@ -662,6 +662,11 @@ static void build_epilogue(struct jit_ctx *ctx) ((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative : func) : \ func##_positive)
+static bool is_bad_offset(int b_off) +{ + return b_off > 0x1ffff || b_off < -0x20000; +} + static int build_body(struct jit_ctx *ctx) { const struct bpf_prog *prog = ctx->skf; @@ -728,7 +733,10 @@ static int build_body(struct jit_ctx *ctx) /* Load return register on DS for failures */ emit_reg_move(r_ret, r_zero, ctx); /* Return with error */ - emit_b(b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_b(b_off, ctx); emit_nop(ctx); break; case BPF_LD | BPF_W | BPF_IND: @@ -775,8 +783,10 @@ static int build_body(struct jit_ctx *ctx) emit_jalr(MIPS_R_RA, r_s0, ctx); emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */ /* Check the error value */ - emit_bcond(MIPS_COND_NE, r_ret, 0, - b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_bcond(MIPS_COND_NE, r_ret, 0, b_off, ctx); emit_reg_move(r_ret, r_zero, ctx); /* We are good */ /* X <- P[1:K] & 0xf */ @@ -855,8 +865,10 @@ static int build_body(struct jit_ctx *ctx) /* A /= X */ ctx->flags |= SEEN_X | SEEN_A; /* Check if r_X is zero */ - emit_bcond(MIPS_COND_EQ, r_X, r_zero, - b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_bcond(MIPS_COND_EQ, r_X, r_zero, b_off, ctx); emit_load_imm(r_ret, 0, ctx); /* delay slot */ emit_div(r_A, r_X, ctx); break; @@ -864,8 +876,10 @@ static int build_body(struct jit_ctx *ctx) /* A %= X */ ctx->flags |= SEEN_X | SEEN_A; /* Check if r_X is zero */ - emit_bcond(MIPS_COND_EQ, r_X, r_zero, - b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_bcond(MIPS_COND_EQ, r_X, r_zero, b_off, ctx); emit_load_imm(r_ret, 0, ctx); /* delay slot */ emit_mod(r_A, r_X, ctx); break; @@ -926,7 +940,10 @@ static int build_body(struct jit_ctx *ctx) break; case BPF_JMP | BPF_JA: /* pc += K */ - emit_b(b_imm(i + k + 1, ctx), ctx); + b_off = b_imm(i + k + 1, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_b(b_off, ctx); emit_nop(ctx); break; case BPF_JMP | BPF_JEQ | BPF_K: @@ -1056,12 +1073,16 @@ static int build_body(struct jit_ctx *ctx) break; case BPF_RET | BPF_A: ctx->flags |= SEEN_A; - if (i != prog->len - 1) + if (i != prog->len - 1) { /* * If this is not the last instruction * then jump to the epilogue */ - emit_b(b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_b(b_off, ctx); + } emit_reg_move(r_ret, r_A, ctx); /* delay slot */ break; case BPF_RET | BPF_K: @@ -1075,7 +1096,10 @@ static int build_body(struct jit_ctx *ctx) * If this is not the last instruction * then jump to the epilogue */ - emit_b(b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_b(b_off, ctx); emit_nop(ctx); } break; @@ -1133,8 +1157,10 @@ static int build_body(struct jit_ctx *ctx) /* Load *dev pointer */ emit_load_ptr(r_s0, r_skb, off, ctx); /* error (0) in the delay slot */ - emit_bcond(MIPS_COND_EQ, r_s0, r_zero, - b_imm(prog->len, ctx), ctx); + b_off = b_imm(prog->len, ctx); + if (is_bad_offset(b_off)) + return -E2BIG; + emit_bcond(MIPS_COND_EQ, r_s0, r_zero, b_off, ctx); emit_reg_move(r_ret, r_zero, ctx); if (code == (BPF_ANC | SKF_AD_IFINDEX)) { BUILD_BUG_ON(sizeof_field(struct net_device, ifindex) != 4); @@ -1244,7 +1270,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
/* Generate the actual JIT code */ build_prologue(&ctx); - build_body(&ctx); + if (build_body(&ctx)) { + module_memfree(ctx.target); + goto out; + } build_epilogue(&ctx);
/* Update the icache */
From: Rafael Aquini aquini@redhat.com
mainline inclusion from mainline-5.15-rc1 commit 20401d1058f3f841f35a594ac2fc1293710e55b9 bugzilla: 182755 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-3669
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
sysvipc_find_ipc() was left with a costly way to check if the offset position fed to it is bigger than the total number of IPC IDs in use. So much so that the time it takes to iterate over /proc/sysvipc/* files grows exponentially for a custom benchmark that creates "N" SYSV shm segments and then times the read of /proc/sysvipc/shm (milliseconds):
12 msecs to read 1024 segs from /proc/sysvipc/shm 18 msecs to read 2048 segs from /proc/sysvipc/shm 65 msecs to read 4096 segs from /proc/sysvipc/shm 325 msecs to read 8192 segs from /proc/sysvipc/shm 1303 msecs to read 16384 segs from /proc/sysvipc/shm 5182 msecs to read 32768 segs from /proc/sysvipc/shm
The root problem lies with the loop that computes the total amount of ids in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than "ids->in_use". That is a quite inneficient way to get to the maximum index in the id lookup table, specially when that value is already provided by struct ipc_ids.max_idx.
This patch follows up on the optimization introduced via commit 15df03c879836 ("sysvipc: make get_maxid O(1) again") and gets rid of the aforementioned costly loop replacing it by a simpler checkpoint based on ipc_get_maxidx() returned value, which allows for a smooth linear increase in time complexity for the same custom benchmark:
2 msecs to read 1024 segs from /proc/sysvipc/shm 2 msecs to read 2048 segs from /proc/sysvipc/shm 4 msecs to read 4096 segs from /proc/sysvipc/shm 9 msecs to read 8192 segs from /proc/sysvipc/shm 19 msecs to read 16384 segs from /proc/sysvipc/shm 39 msecs to read 32768 segs from /proc/sysvipc/shm
Link: https://lkml.kernel.org/r/20210809203554.1562989-1-aquini@redhat.com Signed-off-by: Rafael Aquini aquini@redhat.com Acked-by: Davidlohr Bueso dbueso@suse.de Acked-by: Manfred Spraul manfred@colorfullife.com Cc: Waiman Long llong@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Chen Jun chenjun102@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- ipc/util.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/ipc/util.c b/ipc/util.c index cfa0045e748d..cc46cfa06e04 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -754,21 +754,13 @@ struct pid_namespace *ipc_seq_pid_ns(struct seq_file *s) static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos, loff_t *new_pos) { - struct kern_ipc_perm *ipc; - int total, id; - - total = 0; - for (id = 0; id < pos && total < ids->in_use; id++) { - ipc = idr_find(&ids->ipcs_idr, id); - if (ipc != NULL) - total++; - } + struct kern_ipc_perm *ipc = NULL; + int max_idx = ipc_get_maxidx(ids);
- ipc = NULL; - if (total >= ids->in_use) + if (max_idx == -1 || pos > max_idx) goto out;
- for (; pos < ipc_mni; pos++) { + for (; pos <= max_idx; pos++) { ipc = idr_find(&ids->ipcs_idr, pos); if (ipc != NULL) { rcu_read_lock();
From: Johannes Weiner hannes@cmpxchg.org
mainline inclusion from mainline-5.14-rc7 commit 57f29762cdd4687a02f245d1b1e78de046388eac category: bugfix bugzilla: 176927 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
------------------------------------------------
In a debugging session the other day, Rik noticed that node_reclaim() was missing memstall annotations. This means we'll miss pressure and lost productivity resulting from reclaim on an overloaded local NUMA node when vm.zone_reclaim_mode is enabled.
There haven't been any reports, but that's likely because vm.zone_reclaim_mode hasn't been a commonly used feature recently, and the intersection between such setups and psi users is probably nil.
But secondary memory such as CXL-connected DIMMS, persistent memory etc, and the page demotion patches that handle them (https://lore.kernel.org/lkml/20210401183216.443C4443@viggo.jf.intel.com/) could soon make this a more common codepath again.
Link: https://lkml.kernel.org/r/20210818152457.35846-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner hannes@cmpxchg.org Reported-by: Rik van Riel riel@surriel.com Reviewed-by: Shakeel Butt shakeelb@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jing Xiangfeng jingxiangfeng@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/vmscan.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c index f5d3f920b462..2ff583edd2f8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4393,11 +4393,13 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in .may_swap = 1, .reclaim_idx = gfp_zone(gfp_mask), }; + unsigned long pflags;
trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order, sc.gfp_mask);
cond_resched(); + psi_memstall_enter(&pflags); fs_reclaim_acquire(sc.gfp_mask); /* * We need to be able to allocate from the reserves for RECLAIM_UNMAP @@ -4422,6 +4424,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in current->flags &= ~PF_SWAPWRITE; memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(sc.gfp_mask); + psi_memstall_leave(&pflags);
trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);
From: Vlastimil Babka vbabka@suse.cz
mainline inclusion from mainline-5.14-rc6 commit a7f1d48585b34730765dcda09ead6edc4ac16a5c category: bugfix bugzilla: 176508 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
------------------------------------------------
Vijayanand Jitta reports:
Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would want to disable slub_debug for few slabs. Using boot parameter with slub_debug=-,slab_name syntax doesn't work as expected i.e; only disabling debugging for the specified list of slabs. Instead it disables debugging for all slabs, which is wrong.
This patch fixes it by delaying the moment when the global slub_debug flags variable is updated. In case a "slub_debug=-,slab_name" has been passed, the global flags remain as initialized (depending on CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.
Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Reported-by: Vijayanand Jitta vjitta@codeaurora.org Reviewed-by: Vijayanand Jitta vjitta@codeaurora.org Acked-by: David Rientjes rientjes@google.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Vinayak Menon vinmenon@codeaurora.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/slub.c Signed-off-by: Jing Xiangfeng jingxiangfeng@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- mm/slub.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c index 865b5b967619..b7f2316a513a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1370,12 +1370,13 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) static int __init setup_slub_debug(char *str) { slab_flags_t flags; + slab_flags_t global_flags; char *saved_str; char *slab_list; bool global_slub_debug_changed = false; bool slab_list_specified = false;
- slub_debug = DEBUG_DEFAULT_FLAGS; + global_flags = DEBUG_DEFAULT_FLAGS; if (*str++ != '=' || !*str) /* * No options specified. Switch on full debugging. @@ -1387,7 +1388,7 @@ static int __init setup_slub_debug(char *str) str = parse_slub_debug_flags(str, &flags, &slab_list, true);
if (!slab_list) { - slub_debug = flags; + global_flags = flags; global_slub_debug_changed = true; } else { slab_list_specified = true; @@ -1396,16 +1397,18 @@ static int __init setup_slub_debug(char *str)
/* * For backwards compatibility, a single list of flags with list of - * slabs means debugging is only enabled for those slabs, so the global - * slub_debug should be 0. We can extended that to multiple lists as + * slabs means debugging is only changed for those slabs, so the global + * slub_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending + * on CONFIG_SLUB_DEBUG_ON). We can extended that to multiple lists as * long as there is no option specifying flags without a slab list. */ if (slab_list_specified) { if (!global_slub_debug_changed) - slub_debug = 0; + global_flags = slub_debug; slub_debug_string = saved_str; } out: + slub_debug = global_flags; if (slub_debug != 0 || slub_debug_string) static_branch_enable(&slub_debug_enabled); if ((static_branch_unlikely(&init_on_alloc) ||
From: Sanjay Kumar sanjay.k.kumar@intel.com
mainline inclusion from mainline-5.14-rc2 commit 37764b952e1b39053defc7ebe5dcd8c4e3e78de9 category: bugfix bugzilla: 174360 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
------------------------------------------------
This fixes a bug in context cache clear operation. The code was not following the correct invalidation flow. A global device TLB invalidation should be added after the IOTLB invalidation. At the same time, it uses the domain ID from the context entry. But in scalable mode, the domain ID is in PASID table entry, not context entry.
Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sanjay Kumar sanjay.k.kumar@intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Jing Xiangfeng jingxiangfeng@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/iommu/intel/iommu.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index bee0cff54030..eae58dc14bb5 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2527,10 +2527,11 @@ static inline int domain_pfn_mapping(struct dmar_domain *domain, unsigned long i return domain_mapping(domain, iov_pfn, NULL, phys_pfn, nr_pages, prot); }
-static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn) +static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8 devfn) { - unsigned long flags; + struct intel_iommu *iommu = info->iommu; struct context_entry *context; + unsigned long flags; u16 did_old;
if (!iommu) @@ -2542,7 +2543,16 @@ static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn spin_unlock_irqrestore(&iommu->lock, flags); return; } - did_old = context_domain_id(context); + + if (sm_supported(iommu)) { + if (hw_pass_through && domain_type_is_si(info->domain)) + did_old = FLPT_DEFAULT_DID; + else + did_old = info->domain->iommu_did[iommu->seq_id]; + } else { + did_old = context_domain_id(context); + } + context_clear_entry(context); __iommu_flush_cache(iommu, context, sizeof(*context)); spin_unlock_irqrestore(&iommu->lock, flags); @@ -2560,6 +2570,8 @@ static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 devfn 0, 0, DMA_TLB_DSI_FLUSH); + + __iommu_flush_dev_iotlb(info, 0, MAX_AGAW_PFN_WIDTH); }
static inline void unlink_domain_info(struct device_domain_info *info) @@ -5151,9 +5163,9 @@ int __init intel_iommu_init(void)
static int domain_context_clear_one_cb(struct pci_dev *pdev, u16 alias, void *opaque) { - struct intel_iommu *iommu = opaque; + struct device_domain_info *info = opaque;
- domain_context_clear_one(iommu, PCI_BUS_NUM(alias), alias & 0xff); + domain_context_clear_one(info, PCI_BUS_NUM(alias), alias & 0xff); return 0; }
@@ -5163,12 +5175,13 @@ static int domain_context_clear_one_cb(struct pci_dev *pdev, u16 alias, void *op * devices, unbinding the driver from any one of them will possibly leave * the others unable to operate. */ -static void domain_context_clear(struct intel_iommu *iommu, struct device *dev) +static void domain_context_clear(struct device_domain_info *info) { - if (!iommu || !dev || !dev_is_pci(dev)) + if (!info->iommu || !info->dev || !dev_is_pci(info->dev)) return;
- pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb, iommu); + pci_for_each_dma_alias(to_pci_dev(info->dev), + &domain_context_clear_one_cb, info); }
static void __dmar_remove_one_dev_info(struct device_domain_info *info) @@ -5192,7 +5205,7 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info)
iommu_disable_dev_iotlb(info); if (!dev_is_real_dma_subdevice(info->dev)) - domain_context_clear(iommu, info->dev); + domain_context_clear(info); intel_pasid_free_table(info->dev); }
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-5.14-rc2 commit 474dd1c6506411752a9b2f2233eec11f1733a099 category: bugfix bugzilla: 174363 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
------------------------------------------------
The commit 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") fixes an issue of "sub-device is removed where the context entry is cleared for all aliases". But this commit didn't consider the PASID entry and PASID table in VT-d scalable mode. This fix increases the coverage of scalable mode.
Suggested-by: Sanjay Kumar sanjay.k.kumar@intel.com Fixes: 8038bdb855331 ("iommu/vt-d: Only clear real DMA device's context entries") Fixes: 2b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Cc: stable@vger.kernel.org # v5.6+ Cc: Jon Derrick jonathan.derrick@intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Jing Xiangfeng jingxiangfeng@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/iommu/intel/iommu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index eae58dc14bb5..8b6d10c2add5 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5198,14 +5198,13 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info) iommu = info->iommu; domain = info->domain;
- if (info->dev) { + if (info->dev && !dev_is_real_dma_subdevice(info->dev)) { if (dev_is_pci(info->dev) && sm_supported(iommu)) intel_pasid_tear_down_entry(iommu, info->dev, PASID_RID2PASID, false);
iommu_disable_dev_iotlb(info); - if (!dev_is_real_dma_subdevice(info->dev)) - domain_context_clear(info); + domain_context_clear(info); intel_pasid_free_table(info->dev); }
From: Xu Kuohai xukuohai@huawei.com
mainline inclusion from mainline-5.15-rc4 commit 30e29a9a2bc6a4888335a6ede968b75cd329657a bugzilla: 182857 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-41864
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
In prealloc_elems_and_freelist(), the multiplication to calculate the size passed to bpf_map_area_alloc() could lead to an integer overflow. As a result, out-of-bounds write could occur in pcpu_freelist_populate() as reported by KASAN:
[...] [ 16.968613] BUG: KASAN: slab-out-of-bounds in pcpu_freelist_populate+0xd9/0x100 [ 16.969408] Write of size 8 at addr ffff888104fc6ea0 by task crash/78 [ 16.970038] [ 16.970195] CPU: 0 PID: 78 Comm: crash Not tainted 5.15.0-rc2+ #1 [ 16.970878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 16.972026] Call Trace: [ 16.972306] dump_stack_lvl+0x34/0x44 [ 16.972687] print_address_description.constprop.0+0x21/0x140 [ 16.973297] ? pcpu_freelist_populate+0xd9/0x100 [ 16.973777] ? pcpu_freelist_populate+0xd9/0x100 [ 16.974257] kasan_report.cold+0x7f/0x11b [ 16.974681] ? pcpu_freelist_populate+0xd9/0x100 [ 16.975190] pcpu_freelist_populate+0xd9/0x100 [ 16.975669] stack_map_alloc+0x209/0x2a0 [ 16.976106] __sys_bpf+0xd83/0x2ce0 [...]
The possibility of this overflow was originally discussed in [0], but was overlooked.
Fix the integer overflow by changing elem_size to u64 from u32.
[0] https://lore.kernel.org/bpf/728b238e-a481-eb50-98e9-b0f430ab01e7@gmail.com/
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation") Signed-off-by: Tatsuhiko Yasumatsu th.yasumatsu@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20210930135545.173698-1-th.yasumatsu@gmail.com Signed-off-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- kernel/bpf/stackmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index ebf60848d5eb..4477873ac3a0 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -64,7 +64,8 @@ static inline int stack_map_data_size(struct bpf_map *map)
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap) { - u32 elem_size = sizeof(struct stack_map_bucket) + smap->map.value_size; + u64 elem_size = sizeof(struct stack_map_bucket) + + (u64)smap->map.value_size; int err;
smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries,
From: Guo Xuenan guoxuenan@huawei.com
hulk inclusion category: bugfix bugzilla: 182617 https://gitee.com/openeuler/kernel/issues/I4DDEL
--------------------------------
in commit e2b3eb3c5 disable OPTIMIZE_INLINING by default, make it editable when using menuconfig
Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- lib/Kconfig.debug | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a1a835c5d6cc..6ce7350d6b69 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -334,6 +334,7 @@ config HEADERS_INSTALL as uapi header sanity checks.
config OPTIMIZE_INLINING + bool "Optimize inlining" def_bool n help This option determines if the kernel forces gcc to inline the functions
From: Dan Carpenter dan.carpenter@oracle.com
mainline inclusion from mainline-5.15-rc4 commit 505d9dcb0f7ddf9d075e729523a33d38642ae680 bugzilla: 182864 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2021-3744
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There are three bugs in this code:
1) If we ccp_init_data() fails for &src then we need to free aad. Use goto e_aad instead of goto e_ctx. 2) The label to free the &final_wa was named incorrectly as "e_tag" but it should have been "e_final_wa". One error path leaked &final_wa. 3) The &tag was leaked on one error path. In that case, I added a free before the goto because the resource was local to that block.
Fixes: 36cf515b9bbe ("crypto: ccp - Enable support for AES GCM on v5 CCPs") Reported-by: "minihanshen(沈明航)" minihanshen@tencent.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: John Allen john.allen@amd.com Tested-by: John Allen john.allen@amd.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Chen Jun chenjun102@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- drivers/crypto/ccp/ccp-ops.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/crypto/ccp/ccp-ops.c b/drivers/crypto/ccp/ccp-ops.c index d6a8f4e4b14a..c15625e8ff66 100644 --- a/drivers/crypto/ccp/ccp-ops.c +++ b/drivers/crypto/ccp/ccp-ops.c @@ -778,7 +778,7 @@ ccp_run_aes_gcm_cmd(struct ccp_cmd_queue *cmd_q, struct ccp_cmd *cmd) in_place ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE); if (ret) - goto e_ctx; + goto e_aad;
if (in_place) { dst = src; @@ -863,7 +863,7 @@ ccp_run_aes_gcm_cmd(struct ccp_cmd_queue *cmd_q, struct ccp_cmd *cmd) op.u.aes.size = 0; ret = cmd_q->ccp->vdata->perform->aes(&op); if (ret) - goto e_dst; + goto e_final_wa;
if (aes->action == CCP_AES_ACTION_ENCRYPT) { /* Put the ciphered tag after the ciphertext. */ @@ -873,17 +873,19 @@ ccp_run_aes_gcm_cmd(struct ccp_cmd_queue *cmd_q, struct ccp_cmd *cmd) ret = ccp_init_dm_workarea(&tag, cmd_q, authsize, DMA_BIDIRECTIONAL); if (ret) - goto e_tag; + goto e_final_wa; ret = ccp_set_dm_area(&tag, 0, p_tag, 0, authsize); - if (ret) - goto e_tag; + if (ret) { + ccp_dm_free(&tag); + goto e_final_wa; + }
ret = crypto_memneq(tag.address, final_wa.address, authsize) ? -EBADMSG : 0; ccp_dm_free(&tag); }
-e_tag: +e_final_wa: ccp_dm_free(&final_wa);
e_dst:
From: yangerkun yangerkun@huawei.com
mainline inclusion from mainline-v5.15-rc4 commit bb9464e08309f6befe80866f5be51778ca355ee9 category: bugfix bugzilla: 176737 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
The error path in ext4_fill_super forget to flush s_error_work before journal destroy, and it may trigger the follow bug since flush_stashed_error_work can run concurrently with journal destroy without any protection for sbi->s_journal.
[32031.740193] EXT4-fs (loop66): get root inode failed [32031.740484] EXT4-fs (loop66): mount failed [32031.759805] ------------[ cut here ]------------ [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! [32031.760075] invalid opcode: 0000 [#1] SMP PTI [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded 4.18.0 [32031.765112] Call Trace: [32031.765375] ? __switch_to_asm+0x35/0x70 [32031.765635] ? __switch_to_asm+0x41/0x70 [32031.765893] ? __switch_to_asm+0x35/0x70 [32031.766148] ? __switch_to_asm+0x41/0x70 [32031.766405] ? _cond_resched+0x15/0x40 [32031.766665] jbd2__journal_start+0xf1/0x1f0 [jbd2] [32031.766934] jbd2_journal_start+0x19/0x20 [jbd2] [32031.767218] flush_stashed_error_work+0x30/0x90 [ext4] [32031.767487] process_one_work+0x195/0x390 [32031.767747] worker_thread+0x30/0x390 [32031.768007] ? process_one_work+0x390/0x390 [32031.768265] kthread+0x10d/0x130 [32031.768521] ? kthread_flush_work_fn+0x10/0x10 [32031.768778] ret_from_fork+0x35/0x40
static int start_this_handle(...) BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this
Besides, after we enable fast commit, ext4_fc_replay can add work to s_error_work but return success, so the latter journal destroy in ext4_load_journal can trigger this problem too.
Fix this problem with two steps: 1. Call ext4_commit_super directly in ext4_handle_error for the case that called from ext4_fc_replay 2. Since it's hard to pair the init and flush for s_error_work, we'd better add a extras flush_work before journal destroy in ext4_fill_super
Besides, this patch will call ext4_commit_super in ext4_handle_error for any nojournal case too. But it seems safe since the reason we call schedule_work was that we should save error info to sb through journal if available. Conversely, for the nojournal case, it seems useless delay commit superblock to s_error_work.
Fixes: c92dc856848f ("ext4: defer saving error info from atomic context") Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") Cc: stable@kernel.org Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- fs/ext4/super.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ef8adada4649..c0f31bf40785 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -661,7 +661,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, * constraints, it may not be safe to do it right here so we * defer superblock flushing to a workqueue. */ - if (continue_fs) + if (continue_fs && journal) schedule_work(&EXT4_SB(sb)->s_error_work); else ext4_commit_super(sb); @@ -5137,12 +5137,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_ea_block_cache = NULL;
if (sbi->s_journal) { + /* flush s_error_work before journal destroy. */ + flush_work(&sbi->s_error_work); jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; } failed_mount3a: ext4_es_unregister_shrinker(sbi); failed_mount3: + /* flush s_error_work before sbi destroy */ flush_work(&sbi->s_error_work); del_timer_sync(&sbi->s_err_report); ext4_stop_mmpd(sbi);