mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 27 participants
  • 18549 discussions
[PATCH OLK-6.6] acpi: acpica: fix acpi parse and parseext cache leaks
by Zeng Heng 28 Dec '23

28 Dec '23
From: Seunghun Han <kkamagui(a)gmail.com> hulk inclusion category: bugfix bugzilla: 180974 https://gitee.com/openeuler/kernel/issues/I4DDEL CVE: CVE-2017-13694 ------------------------------------------------- I'm Seunghun Han, and I work for National Security Research Institute of South Korea. I have been doing a research on ACPI and found an ACPI cache leak in ACPI early abort cases. Boot log of ACPI cache leak is as follows: [ 0.352414] ACPI: Added _OSI(Module Device) [ 0.353182] ACPI: Added _OSI(Processor Device) [ 0.353182] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.353182] ACPI: Added _OSI(Processor Aggregator Device) [ 0.356028] ACPI: Unable to start the ACPI Interpreter [ 0.356799] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.360215] kmem_cache_destroy Acpi-State: Slab cache still has objects [ 0.360648] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #10 [ 0.361273] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 0.361873] Call Trace: [ 0.362243] ? dump_stack+0x5c/0x81 [ 0.362591] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.362944] ? acpi_sleep_proc_init+0x27/0x27 [ 0.363296] ? acpi_os_delete_cache+0xa/0x10 [ 0.363646] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.364000] ? acpi_terminate+0xa/0x14 [ 0.364000] ? acpi_init+0x2af/0x34f [ 0.364000] ? __class_create+0x4c/0x80 [ 0.364000] ? video_setup+0x7f/0x7f [ 0.364000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.364000] ? do_one_initcall+0x4e/0x1a0 [ 0.364000] ? kernel_init_freeable+0x189/0x20a [ 0.364000] ? rest_init+0xc0/0xc0 [ 0.364000] ? kernel_init+0xa/0x100 [ 0.364000] ? ret_from_fork+0x25/0x30 I analyzed this memory leak in detail. I found that “Acpi-State” cache and “Acpi-Parse” cache were merged because the size of cache objects was same slab cache size. I finally found “Acpi-Parse” cache and “Acpi-ParseExt” cache were leaked using SLAB_NEVER_MERGE flag in kmem_cache_create() function. Real ACPI cache leak point is as follows: [ 0.360101] ACPI: Added _OSI(Module Device) [ 0.360101] ACPI: Added _OSI(Processor Device) [ 0.360101] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.361043] ACPI: Added _OSI(Processor Aggregator Device) [ 0.364016] ACPI: Unable to start the ACPI Interpreter [ 0.365061] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.368174] kmem_cache_destroy Acpi-Parse: Slab cache still has objects [ 0.369332] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.371256] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 0.372000] Call Trace: [ 0.372000] ? dump_stack+0x5c/0x81 [ 0.372000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? acpi_os_delete_cache+0xa/0x10 [ 0.372000] ? acpi_ut_delete_caches+0x56/0x7b [ 0.372000] ? acpi_terminate+0xa/0x14 [ 0.372000] ? acpi_init+0x2af/0x34f [ 0.372000] ? __class_create+0x4c/0x80 [ 0.372000] ? video_setup+0x7f/0x7f [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? do_one_initcall+0x4e/0x1a0 [ 0.372000] ? kernel_init_freeable+0x189/0x20a [ 0.372000] ? rest_init+0xc0/0xc0 [ 0.372000] ? kernel_init+0xa/0x100 [ 0.372000] ? ret_from_fork+0x25/0x30 [ 0.388039] kmem_cache_destroy Acpi-ParseExt: Slab cache still has objects [ 0.389063] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.390557] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 0.392000] Call Trace: [ 0.392000] ? dump_stack+0x5c/0x81 [ 0.392000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? acpi_os_delete_cache+0xa/0x10 [ 0.392000] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.392000] ? acpi_terminate+0xa/0x14 [ 0.392000] ? acpi_init+0x2af/0x34f [ 0.392000] ? __class_create+0x4c/0x80 [ 0.392000] ? video_setup+0x7f/0x7f [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? do_one_initcall+0x4e/0x1a0 [ 0.392000] ? kernel_init_freeable+0x189/0x20a [ 0.392000] ? rest_init+0xc0/0xc0 [ 0.392000] ? kernel_init+0xa/0x100 [ 0.392000] ? ret_from_fork+0x25/0x30 When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ut_delete_caches() function to delete local caches (acpi_gbl_namespace_ cache, state_cache, operand_cache, ps_node_cache, ps_node_ext_cache). But the deletion codes in acpi_ut_delete_caches() function only delete slab caches using kmem_cache_destroy() function, therefore the cache objects should be flushed before acpi_ut_delete_caches() function. “Acpi-Parse” cache and “Acpi-ParseExt” cache are used in an AML parse function, acpi_ps_parse_loop(). The function should have flush codes to handle an error state due to invalid AML codes. This cache leak has a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR. To fix ACPI cache leak for enhancing security, I made a patch which has flush codes in acpi_ps_parse_loop() function. I hope that this patch improves the security of Linux kernel. Thank you. Signed-off-by: Seunghun Han <kkamagui(a)gmail.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Signed-off-by: Chen Jun <chenjun102(a)huawei.com> Reviewed-by: Hanjun Guo <guohanjun(a)huawei.com> Signed-off-by: Chen Jun <chenjun102(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- drivers/acpi/acpica/psobject.c | 53 ++++++++++------------------------ 1 file changed, 16 insertions(+), 37 deletions(-) diff --git a/drivers/acpi/acpica/psobject.c b/drivers/acpi/acpica/psobject.c index 54471083ba54..a22d5c0008d6 100644 --- a/drivers/acpi/acpica/psobject.c +++ b/drivers/acpi/acpica/psobject.c @@ -636,7 +636,8 @@ acpi_status acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, union acpi_parse_object *op, acpi_status status) { - acpi_status status2; + acpi_status return_status = AE_OK; + u8 ascending = TRUE; ACPI_FUNCTION_TRACE_PTR(ps_complete_final_op, walk_state); @@ -650,7 +651,8 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, op)); do { if (op) { - if (walk_state->ascending_callback != NULL) { + if (ascending && + walk_state->ascending_callback != NULL) { walk_state->op = op; walk_state->op_info = acpi_ps_get_opcode_info(op->common. @@ -672,49 +674,26 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, } if (status == AE_CTRL_TERMINATE) { - status = AE_OK; - - /* Clean up */ - do { - if (op) { - status2 = - acpi_ps_complete_this_op - (walk_state, op); - if (ACPI_FAILURE - (status2)) { - return_ACPI_STATUS - (status2); - } - } - - acpi_ps_pop_scope(& - (walk_state-> - parser_state), - &op, - &walk_state-> - arg_types, - &walk_state-> - arg_count); - - } while (op); - - return_ACPI_STATUS(status); + ascending = FALSE; + return_status = AE_CTRL_TERMINATE; } else if (ACPI_FAILURE(status)) { /* First error is most important */ - (void) - acpi_ps_complete_this_op(walk_state, - op); - return_ACPI_STATUS(status); + ascending = FALSE; + return_status = status; } } - status2 = acpi_ps_complete_this_op(walk_state, op); - if (ACPI_FAILURE(status2)) { - return_ACPI_STATUS(status2); + status = acpi_ps_complete_this_op(walk_state, op); + if (ACPI_FAILURE(status)) { + ascending = FALSE; + if (ACPI_SUCCESS(return_status) || + return_status == AE_CTRL_TERMINATE) { + return_status = status; + } } } @@ -724,5 +703,5 @@ acpi_ps_complete_final_op(struct acpi_walk_state *walk_state, } while (op); - return_ACPI_STATUS(status); + return_ACPI_STATUS(return_status); } -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] pci: Enable acs for QLogic HBA cards
by Zeng Heng 28 Dec '23

28 Dec '23
From: Xishi Qiu <qiuxishi(a)huawei.com> euler inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4K05G?from=project-issue CVE: N/A ------------------------------------------------- Add support of port isolation for QLogic HBA cards. Signed-off-by: Xishi Qiu <qiuxishi(a)huawei.com> Signed-off-by: Fang Ying <fangying1(a)huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: Hui Wang <john.wanghui(a)huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com> Confilicts: drivers/pci/quirks.c Signed-off-by: Xuefeng Wang <wxf.wang(a)hisilicon.com> Reviewed-by: Yang Yingliang <yangyingliang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> Confilicts: drivers/pci/quirks.c Reviewed-by: wangxiongfeng <wangxiongfeng2(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ae95d0950772..83a875b53111 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5050,6 +5050,8 @@ static const struct pci_dev_acs_enabled { { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_intel_spt_pch_acs }, { 0x19a2, 0x710, pci_quirk_mf_endpoint_acs }, /* Emulex BE3-R */ { 0x10df, 0x720, pci_quirk_mf_endpoint_acs }, /* Emulex Skyhawk-R */ + { 0x1077, 0x2031, pci_quirk_mf_endpoint_acs}, /* QLogic QL2672 */ + { 0x1077, 0x2532, pci_quirk_mf_endpoint_acs}, /* Cavium ThunderX */ { PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, pci_quirk_cavium_acs }, /* Cavium multi-function devices */ -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] Revert "posix-cpu-timers: Make timespec to nsec conversion safe"
by Zeng Heng 28 Dec '23

28 Dec '23
From: Yuyao Lin <linyuyao1(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61XP8 -------------------------------- This reverts commit 098b0e01a91c42aaaf0425605cd126b03fcb0bcf. Function timespec64_to_ns() Add the upper and lower limits check in commit cb47755725da ("time: Prevent undefined behaviour in timespec64_to_ns()"), timespec64_to_ktime() only check the upper limits,so revert this patch can fix overflow. Signed-off-by: Yuyao Lin <linyuyao1(a)huawei.com> Reviewed-by: Wei Li <liwei391(a)huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai(a)huawei.com> --- kernel/time/posix-cpu-timers.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index e9c6f9d0e42c..5996e7323c81 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -642,11 +642,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags, return -ESRCH; } - /* - * Use the to_ktime conversion because that clamps the maximum - * value to KTIME_MAX and avoid multiplication overflows. - */ - new_expires = ktime_to_ns(timespec64_to_ktime(new->it_value)); + new_expires = timespec64_to_ns(&new->it_value); /* * Protect against sighand release/switch in exit/exec and p->cpu_timers -- 2.25.1
2 1
0 0
[PATCH OLK-6.6 0/2] KABI reservation for IMA and crypto
by GUO Zihua 28 Dec '23

28 Dec '23
KABI reservation for IMA and crypto module. GUO Zihua (2): crypto: kabi: KABI reservation for crypto ima: kabi: KABI reservation for IMA include/crypto/aead.h | 3 +++ include/crypto/akcipher.h | 4 ++++ include/crypto/algapi.h | 4 ++++ include/crypto/hash.h | 2 ++ include/crypto/if_alg.h | 3 +++ include/crypto/public_key.h | 3 +++ include/crypto/rng.h | 2 ++ include/crypto/skcipher.h | 3 +++ include/linux/crypto.h | 2 ++ include/linux/fs.h | 2 ++ include/linux/kernel_read_file.h | 3 +++ include/linux/kexec.h | 5 +++++ include/linux/user_namespace.h | 2 ++ 13 files changed, 38 insertions(+) -- 2.34.1
2 3
0 0
[PATCH OLK-6.6 0/2] Make the cpuinfo_cur_freq interface read correctly
by Zeng Heng 28 Dec '23

28 Dec '23
Patch 1~2: It is required to ensure that amu can continue to work normally during the sample period while cstate is enabled. Zeng Heng (2): arm64: cpufeature: Export cpu_has_amu_feat() cpufreq: CPPC: Keep the target core awake when reading its cpufreq rate arch/arm64/kernel/cpufeature.c | 1 + drivers/cpufreq/cppc_cpufreq.c | 39 ++++++++++++++++++++++++++-------- 2 files changed, 31 insertions(+), 9 deletions(-) -- 2.25.1
2 3
0 0
[PATCH OLK-6.6] x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes
by Zeng Heng 28 Dec '23

28 Dec '23
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I69VF6 CVE: NA -------------------------------- If kdump is enabled, when using mce_inject to inject errors, EFI boot loader would decompress & load second kernel for saving the vmcore file. For normal errors that is fine. However, in the MCE case, the panic CPU that firstly enters into mce_panic() is running within NMI interrupt context, and the processor blocks delivery of subsequent NMIs until the next execution of the IRET instruction. When the panic CPU takes long time in the panic processing route, and causes the watchdog timeout, at this moment, the processor already receives NMI interrupt in the background. In the reproducer sequence below, panic CPU would run into EFI loader and raise page fault exception (like visiting `vidmem` variable when attempting to call debug_putstr()), the CPU would execute IRET instruction when it exits from the page fault handler. But the loader never registers handler for NMI vector in IDT, lack of vector handler would cause reboot, which interrupts kdump procedure and fails to save the vmcore file. Here is steps to reproduce the above issue (it's sporadic): 1. # cat uncorrected CPU 1 BANK 4 STATUS uncorrected 0xc0 MCGSTATUS EIPV MCIP ADDR 0x1234 RIP 0xdeadbabe RAISINGCPU 0 MCGCAP SER CMCI TES 0x6 2. # modprobe mce_inject 3. # mce-inject uncorrected For increasing the probability of reproduction of this issue, there are two ways to increase the probability of the bug: 1. modify the threshold value of watchdog (increase NMI frequency); 2. and/or add delays before panic() in mce_panic() and modify PANIC_TIMEOUT macro; Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") Signed-off-by: Zeng Heng <zengheng4(a)huawei.com> [ Tidy up changelog, add comments. ] Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Link: https://lore.kernel.org/r/20230110102745.2514694-1-zengheng4@huawei.com --- arch/x86/boot/compressed/ident_map_64.c | 13 +++++++++++++ arch/x86/boot/compressed/idt_64.c | 1 + arch/x86/boot/compressed/idt_handlers_64.S | 1 + arch/x86/boot/compressed/misc.h | 1 + 4 files changed, 16 insertions(+) diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c index 08f93b0401bb..dc0ec27ff1b1 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -385,3 +385,16 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code) */ kernel_add_identity_map(address, end); } + +void do_boot_nmi_fault(struct pt_regs *regs, unsigned long error_code) +{ + /* + * Default boot loader placeholder fault handler - there's no real + * kernel running yet, so there's not much we can do - but NMIs + * can arrive in a kdump scenario, for example by the NMI watchdog. + * + * Not having any handler would cause the CPU to silently reboot, + * so we do the second-worst thing here and ignore the NMI. + */ + debug_putstr("nmi fault raise\n"); +} diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c index 3cdf94b41456..0f3318bdf262 100644 --- a/arch/x86/boot/compressed/idt_64.c +++ b/arch/x86/boot/compressed/idt_64.c @@ -60,6 +60,7 @@ void load_stage2_idt(void) { boot_idt_desc.address = (unsigned long)boot_idt; + set_idt_entry(X86_TRAP_NMI, boot_nmi_fault); set_idt_entry(X86_TRAP_PF, boot_page_fault); #ifdef CONFIG_AMD_MEM_ENCRYPT diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S index 22890e199f5b..2aef8e1b515b 100644 --- a/arch/x86/boot/compressed/idt_handlers_64.S +++ b/arch/x86/boot/compressed/idt_handlers_64.S @@ -69,6 +69,7 @@ SYM_FUNC_END(\name) .text .code64 +EXCEPTION_HANDLER boot_nmi_fault do_boot_nmi_fault error_code=0 EXCEPTION_HANDLER boot_page_fault do_boot_page_fault error_code=1 #ifdef CONFIG_AMD_MEM_ENCRYPT diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index cc70d3fb9049..f06fb86b83bc 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -196,6 +196,7 @@ static inline void cleanup_exception_handling(void) { } #endif /* IDT Entry Points */ +void boot_nmi_fault(void); void boot_page_fault(void); void boot_stage1_vc(void); void boot_stage2_vc(void); -- 2.25.1
2 1
0 0
[PATCH OLK-6.6 0/3] memcg: support OOM priority for memcg
by Jinjiang Tu 28 Dec '23

28 Dec '23
Support memcg oom priority. Changelog: * rename CONFIG_MEMCG_QOS to CONFIG_MEMCG_OOM_PRIORITY * move CONFIG_MEMCG_OOM_PRIORITY to init/Kconfig * set CONFIG_MEMCG_OOM_PRIORITY default from y to n, and enable it in openeuler_defconfig * add rcu_read_lock() protection for mem_cgroup_from_task() * Instead of static key, use variable to check if the feature is enabled * use READ_ONCE/WRITE_ONCE protection when read/write oom_prio * cleanup Jing Xiangfeng (3): memcg: support priority for oom memcg: Add sysctl memcg_qos_enable memcg: enable CONFIG_MEMCG_OOM_PRIORITY by default arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + include/linux/memcontrol.h | 25 +++ init/Kconfig | 14 ++ mm/memcontrol.c | 222 +++++++++++++++++++++++++ mm/oom_kill.c | 61 ++++++- 6 files changed, 319 insertions(+), 5 deletions(-) -- 2.25.1
2 4
0 0
[PATCH OLK-5.10] jbd2: fix soft lockup in journal_finish_inode_data_buffers()
by Zizhi Wo 28 Dec '23

28 Dec '23
From: Ye Bin <yebin10(a)huawei.com> mainline inclusion from mainline-v6.7-rc6 commit 6c02757c936063f0631b4e43fe156f8c8f1f351f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8RRXU?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- There's issue when do io test: WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170] CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x24/0x30 dump_stack+0xb0/0x100 watchdog_timer_fn+0x254/0x3f8 __hrtimer_run_queues+0x11c/0x380 hrtimer_interrupt+0xfc/0x2f8 arch_timer_handler_phys+0x38/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_irq+0x3c/0x58 __handle_domain_irq+0x68/0xc0 gic_handle_irq+0x90/0x320 el1_irq+0xcc/0x180 queued_spin_lock_slowpath+0x1d8/0x320 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2] kjournald2+0xec/0x2f0 [jbd2] kthread+0x134/0x138 ret_from_fork+0x10/0x18 Analyzed informations from vmcore as follows: (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list'; (2) Now is processing the 855th jbd2_inode; (3) JBD2 task has TIF_NEED_RESCHED flag; (4) There's no pags in address_space around the 855th jbd2_inode; (5) There are some process is doing drop caches; (6) Mounted with 'nodioread_nolock' option; (7) 128 CPUs; According to informations from vmcore we know 'journal->j_list_lock' spin lock competition is fierce. So journal_finish_inode_data_buffers() maybe process slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors(). However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK, will not call cond_resched(). So may lead to soft lockup. journal_finish_inode_data_buffers filemap_fdatawait_range_keep_errors __filemap_fdatawait_range while (index <= end) nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK); if (!nr_pages) break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched() for (i = 0; i < nr_pages; i++) wait_on_page_writeback(page); cond_resched(); To solve above issue, add scheduling point in the journal_finish_inode_data_buffers(); Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/jbd2/commit.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index f082836e6d40..e782f1d42dd7 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -300,6 +300,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal, if (!ret) ret = err; } + cond_resched(); spin_lock(&journal->j_list_lock); jinode->i_flags &= ~JI_COMMIT_RUNNING; smp_mb(); -- 2.39.2
2 1
0 0
[PATCH OLK-6.6 0/8] Introduce qos smt expeller for co-location
by Xia Fukun 27 Dec '23

27 Dec '23
We introduce the qos smt expeller, which lets online tasks to expel offline tasks on the smt sibling cpus, and exclusively occupy CPU resources.In this way we are able to improve QOS of online tasks in co-location. Guan Jing (8): sched: Introduce qos smt expeller for co-location sched: Implement the function of qos smt expeller sched: Add statistics for qos smt expeller sched: Add tracepoint for qos smt expeller config: Enable CONFIG_QOS_SCHED_SMT_EXPELLER sched/fair: Start tracking qos_offline tasks count in cfs_rq sched/fair: Introduce QOS_SMT_EXPELL priority reversion mechanism sched/fair: Add cmdline nosmtexpell arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + include/linux/sched.h | 13 + include/trace/events/sched.h | 55 ++++ init/Kconfig | 9 + kernel/sched/debug.c | 5 + kernel/sched/fair.c | 345 +++++++++++++++++++++++-- kernel/sched/sched.h | 27 ++ 8 files changed, 440 insertions(+), 16 deletions(-) -- 2.34.1
2 9
0 0
[PATCH OLK-6.6 0/1] iommu: Enable smmu-v3 when 3408iMR/3416iMRraid card exist
by zhangnaichuan 27 Dec '23

27 Dec '23
arm64 cannot support virtualization pass-through feature when using the 3408iMR/3416iMR raid card. For solving the problem, we add two fuctions: 1.we prepare init bypass level2 entry for specific devices when smmu uses 2 level streamid entry. 2.we add smmu.bypassdev cmdline to allow SMMU bypass streams for some specific devices. usage: 3408iMRraid: smmu.bypassdev=0x1000:0x17 3416iMRraid: smmu.bypassdev=0x1000:0x15 drivers/iommu/Kconfig | 9 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 120 +++++++++++++++++++- 2 files changed, 128 insertions(+), 1 deletion(-) -- 2.33.0
2 2
0 0
  • ← Newer
  • 1
  • ...
  • 1367
  • 1368
  • 1369
  • 1370
  • 1371
  • 1372
  • 1373
  • ...
  • 1855
  • Older →

HyperKitty Powered by HyperKitty