[PATCH OLK-5.10 v2 0/9] arm64: Support xcall prefetch

Support arm64 xcall prefetch. Changes in v2: - Remove is_read_cache. - Simplify the ksys_read(). - Use percpu cache_hit/miss_wait and remove cache_queued and sync_mode. - Remove keep_running in prefetch_item struct. - Add /proc/xcall/stats dir. - Remove pfi in file struct and use hash table. Jinjie Ruan (3): arm64: Introduce Xint software solution arm64: Add debugfs dir for xint xcall: Add /proc/xcall/stats dir for performance tuning Liao Chen (1): revert kpti bypass Yipeng Zou (5): arm64: Introduce xcall a faster svc exception handling arm64: Faster SVC exception handler with xcall xcall: introduce xcall_select to implement a custom xcall function eventpoll: xcall: Support sync and async prefetch data in epoll xcall: eventpoll: add tracepoint arch/Kconfig | 64 +++++++ arch/arm64/Kconfig | 2 + arch/arm64/configs/openeuler_defconfig | 4 + arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/exception.h | 6 + arch/arm64/include/asm/syscall.h | 14 ++ arch/arm64/include/asm/syscall_wrapper.h | 25 +++ arch/arm64/kernel/asm-offsets.c | 3 + arch/arm64/kernel/cpufeature.c | 54 ++++++ arch/arm64/kernel/entry-common.c | 22 +++ arch/arm64/kernel/entry.S | 183 +++++++++++++++++- arch/arm64/kernel/sys.c | 20 ++ arch/arm64/kernel/syscall.c | 69 +++++++ drivers/irqchip/irq-gic-v3.c | 123 ++++++++++++ fs/eventpoll.c | 232 +++++++++++++++++++++++ fs/open.c | 4 + fs/proc/base.c | 214 +++++++++++++++++++++ fs/read_write.c | 95 +++++++++- include/linux/fs.h | 31 +++ include/linux/hardirq.h | 5 + include/linux/irqchip/arm-gic-v3.h | 13 ++ include/linux/sched.h | 5 + include/linux/syscalls.h | 19 ++ include/trace/events/fs.h | 91 +++++++++ include/uapi/asm-generic/unistd.h | 11 ++ kernel/fork.c | 32 ++++ kernel/irq/debugfs.c | 33 ++++ kernel/irq/internals.h | 18 ++ kernel/irq/irqdesc.c | 19 ++ kernel/irq/proc.c | 10 + kernel/softirq.c | 73 +++++++ kernel/sysctl.c | 36 ++++ 32 files changed, 1529 insertions(+), 3 deletions(-) -- 2.34.1

From: Yipeng Zou <zouyipeng@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IB6JLE -------------------------------- The svc exception handling process in ARM64, which includes auxiliary functions for debug/trace and core functions like KPTI, has been identified as overly "lengthy". This inefficiency is particularly noticeable in short syscalls such as lseek() and getpid(), where the syscall function itself comprises a small percentage of the total instructions executed. To address this, we introduce the concept of xcall, a fast svc exception handling path that only considers necessary features such as security, context saving, and recovery. This approach can be seen as a high-speed syscall processing mechanism bridging the gap between vdso and traditional syscalls. We've implemented a per-task bitmap to enable or disable xcall for specific syscalls. Users can enable a syscall with the following command: echo $syscall_nr > /proc/$PID/xcall To disable a syscall, use: echo \!$syscall_nr > /proc/$PID/xcall The current status of enabled syscalls can be viewed by: cat /proc/$PID/xcall Finally, we've added a kernel boot parameter to control the xcall feature. To enable xcall, include "xcall" in the kernel boot command. By default, xcall is disabled. This patch introduces basic framework and have not modified to syscall path only copy to xcall. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> --- arch/Kconfig | 19 +++++ arch/arm64/Kconfig | 1 + arch/arm64/configs/openeuler_defconfig | 2 + arch/arm64/include/asm/cpucaps.h | 1 + arch/arm64/kernel/asm-offsets.c | 3 + arch/arm64/kernel/cpufeature.c | 28 +++++++ arch/arm64/kernel/entry.S | 59 +++++++++++++ fs/proc/base.c | 112 +++++++++++++++++++++++++ include/linux/sched.h | 4 + kernel/fork.c | 20 +++++ 10 files changed, 249 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 0fc9c6d591b8..48ef789de86b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1186,4 +1186,23 @@ source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" +config ARCH_SUPPORTS_FAST_SYSCALL + bool + +config FAST_SYSCALL + bool "Fast Syscall support" + depends on ARCH_SUPPORTS_FAST_SYSCALL + default n + help + This enable support Fast syscall feature. + The svc exception handling process, which includes auxiliary + functions for debug/trace and core functions like + KPTI, has been identified as overly "lengthy". + This inefficiency is particularly noticeable in short syscalls + such as lseek() and getpid(), where the syscall function itself + comprises a small percentage of the total instructions executed. + To address this, we introduce the concept of fast syscall, a fast svc + exception handling path that only considers necessary features + such as security, context saving, and recovery. + endmenu diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 93ced97f8c6c..d19304745fee 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -219,6 +219,7 @@ config ARM64 select THREAD_INFO_IN_TASK select HAVE_LIVEPATCH_WO_FTRACE select THP_NUMA_CONTROL if ARM64_64K_PAGES && NUMA_BALANCING && TRANSPARENT_HUGEPAGE + select ARCH_SUPPORTS_FAST_SYSCALL if !ARM64_MTE && !KASAN_HW_TAGS help ARM 64-bit (AArch64) Linux support. diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 5449de73fbbc..8f649fdedfea 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -921,6 +921,8 @@ CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y # end of GCOV-based kernel profiling CONFIG_HAVE_GCC_PLUGINS=y +CONFIG_ARCH_SUPPORTS_FAST_SYSCALL=y +# CONFIG_FAST_SYSCALL is not set # end of General architecture-dependent options CONFIG_RT_MUTEXES=y diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index ce9fbf260a3c..5c4e78ffa264 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -81,6 +81,7 @@ #define ARM64_HAS_PBHA_STAGE2 73 #define ARM64_SME 74 #define ARM64_SME_FA64 75 +#define ARM64_HAS_XCALL 76 #define ARM64_NCAPS 80 diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index c247e11130db..7c6ad4b1667b 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -26,6 +26,9 @@ int main(void) { +#ifdef CONFIG_FAST_SYSCALL + DEFINE(TSK_XCALL, offsetof(struct task_struct, xcall_enable)); +#endif DEFINE(TSK_ACTIVE_MM, offsetof(struct task_struct, active_mm)); BLANK(); DEFINE(TSK_TI_CPU, offsetof(struct task_struct, thread_info.cpu)); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index dee049d27c74..f5ef453593ff 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2155,6 +2155,26 @@ static bool can_clearpage_use_stnp(const struct arm64_cpu_capabilities *entry, return use_clearpage_stnp && has_mor_nontemporal(entry); } +#ifdef CONFIG_FAST_SYSCALL +static bool is_xcall_support; +static int __init xcall_setup(char *str) +{ + is_xcall_support = true; + return 1; +} +__setup("xcall", xcall_setup); + +bool fast_syscall_enabled(void) +{ + return is_xcall_support; +} + +static bool has_xcall_support(const struct arm64_cpu_capabilities *entry, int __unused) +{ + return is_xcall_support; +} +#endif + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -2701,6 +2721,14 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .cpu_enable = fa64_kernel_enable, }, #endif /* CONFIG_ARM64_SME */ +#ifdef CONFIG_FAST_SYSCALL + { + .desc = "Xcall Support", + .capability = ARM64_HAS_XCALL, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = has_xcall_support, + }, +#endif {}, }; diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 1290f36c8371..e49b5569bb97 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -675,11 +675,70 @@ SYM_CODE_START_LOCAL_NOALIGN(el1_irq) kernel_exit 1 SYM_CODE_END(el1_irq) +#ifdef CONFIG_FAST_SYSCALL + .macro check_esr_el1_ec_svc64 + /* Only support SVC64 for now */ + mrs x20, esr_el1 + lsr w20, w20, #ESR_ELx_EC_SHIFT + cmp x20, #ESR_ELx_EC_SVC64 + .endm + + .macro check_syscall_nr + cmp x8, __NR_syscalls + .endm + + .macro check_xcall_enable + /* x21 = task_struct->xcall_enable */ + ldr_this_cpu x20, __entry_task, x21 + ldr x21, [x20, #TSK_XCALL] + /* x20 = sc_no / 8 */ + lsr x20, x8, 3 + ldr x21, [x21, x20] + /* x8 = sc_no % 8 */ + and x8, x8, 7 + mov x20, 1 + lsl x20, x20, x8 + and x21, x21, x20 + cmp x21, 0 + .endm + + .macro check_xcall_pre_kernel_entry + stp x20, x21, [sp, #0] + /* is ESR_ELx_EC_SVC64 */ + check_esr_el1_ec_svc64 + bne .Lskip_xcall\@ + /* x8 >= __NR_syscalls */ + check_syscall_nr + bhs .Lskip_xcall\@ + str x8, [sp, #16] + /* is xcall enabled */ + check_xcall_enable + ldr x8, [sp, #16] + beq .Lskip_xcall\@ + ldp x20, x21, [sp, #0] + /* do xcall */ + kernel_entry 0, 64 + mov x0, sp + bl el0t_64_sync_handler + b ret_to_user +.Lskip_xcall\@: + ldp x20, x21, [sp, #0] + .endm +#endif + /* * EL0 mode handlers. */ .align 6 SYM_CODE_START_LOCAL_NOALIGN(el0_sync) +#ifdef CONFIG_FAST_SYSCALL + /* Only support el0 aarch64 sync exception */ + alternative_if_not ARM64_HAS_XCALL + b .Lret_to_kernel_entry + alternative_else_nop_endif + check_xcall_pre_kernel_entry + .Lret_to_kernel_entry: +#endif kernel_entry 0 mov x0, sp bl el0_sync_handler diff --git a/fs/proc/base.c b/fs/proc/base.c index 4e0054a37c4c..3206960c4bd7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3589,6 +3589,115 @@ static const struct file_operations proc_pid_sg_level_operations = { }; #endif +#ifdef CONFIG_FAST_SYSCALL +bool fast_syscall_enabled(void); + +static int xcall_show(struct seq_file *m, void *v) +{ + struct inode *inode = m->private; + struct task_struct *p; + unsigned int rs, re; + + if (!fast_syscall_enabled()) + return -EACCES; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + if (!p->xcall_enable) + goto out; + + seq_printf(m, "Enabled Total[%d/%d]:", bitmap_weight(p->xcall_enable, __NR_syscalls), + __NR_syscalls); + + for (rs = 0, bitmap_next_set_region(p->xcall_enable, &rs, &re, __NR_syscalls); + rs < re; rs = re + 1, + bitmap_next_set_region(p->xcall_enable, &rs, &re, __NR_syscalls)) { + rs == (re - 1) ? seq_printf(m, "%d,", rs) : + seq_printf(m, "%d-%d,", rs, re - 1); + } + seq_puts(m, "\n"); +out: + put_task_struct(p); + + return 0; +} + +static int xcall_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, xcall_show, inode); +} + +static int xcall_enable_one(struct task_struct *p, unsigned int sc_no) +{ + bitmap_set(p->xcall_enable, sc_no, 1); + return 0; +} + +static int xcall_disable_one(struct task_struct *p, unsigned int sc_no) +{ + bitmap_clear(p->xcall_enable, sc_no, 1); + return 0; +} + +static ssize_t xcall_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file_inode(file); + struct task_struct *p; + char buffer[TASK_COMM_LEN]; + const size_t maxlen = sizeof(buffer) - 1; + unsigned int sc_no = __NR_syscalls; + int ret = 0; + int is_clear = 0; + + if (!fast_syscall_enabled()) + return -EACCES; + + memset(buffer, 0, sizeof(buffer)); + if (!count || copy_from_user(buffer, buf, count > maxlen ? maxlen : count)) + return -EFAULT; + + p = get_proc_task(inode); + if (!p || !p->xcall_enable) + return -ESRCH; + + if (buffer[0] == '!') + is_clear = 1; + + if (kstrtouint(buffer + is_clear, 10, &sc_no)) { + ret = -EINVAL; + goto out; + } + + if (sc_no >= __NR_syscalls) { + ret = -EINVAL; + goto out; + } + + if (!is_clear && !test_bit(sc_no, p->xcall_enable)) + ret = xcall_enable_one(p, sc_no); + else if (is_clear && test_bit(sc_no, p->xcall_enable)) + ret = xcall_disable_one(p, sc_no); + else + ret = -EINVAL; + +out: + put_task_struct(p); + + return ret ? ret : count; +} + +static const struct file_operations proc_pid_xcall_operations = { + .open = xcall_open, + .read = seq_read, + .write = xcall_write, + .llseek = seq_lseek, + .release = single_release, +}; +#endif + /* * Thread groups */ @@ -3615,6 +3724,9 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_QOS_SCHED_SMART_GRID REG("smart_grid_level", 0644, proc_pid_sg_level_operations), #endif +#ifdef CONFIG_FAST_SYSCALL + REG("xcall", 0644, proc_pid_xcall_operations), +#endif #ifdef CONFIG_SCHED_AUTOGROUP REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations), #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index e3170b7f81fa..18361e35a377 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1477,7 +1477,11 @@ struct task_struct { #else KABI_RESERVE(14) #endif +#if defined(CONFIG_FAST_SYSCALL) + KABI_USE(15, unsigned long *xcall_enable) +#else KABI_RESERVE(15) +#endif KABI_RESERVE(16) KABI_AUX_PTR(task_struct) diff --git a/kernel/fork.c b/kernel/fork.c index 9b1ea79deaa5..bd7afeb364ab 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -479,6 +479,12 @@ void free_task(struct task_struct *tsk) #endif if (task_relationship_used()) sched_relationship_free(tsk); + +#ifdef CONFIG_FAST_SYSCALL + if (tsk->xcall_enable) + bitmap_free(tsk->xcall_enable); +#endif + free_task_struct(tsk); } EXPORT_SYMBOL(free_task); @@ -1007,6 +1013,11 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; #endif + +#ifdef CONFIG_FAST_SYSCALL + tsk->xcall_enable = NULL; +#endif + return tsk; free_stack: @@ -2085,6 +2096,15 @@ static __latent_entropy struct task_struct *copy_process( rt_mutex_init_task(p); +#ifdef CONFIG_FAST_SYSCALL + p->xcall_enable = bitmap_zalloc(__NR_syscalls, GFP_KERNEL); + if (!p->xcall_enable) + goto bad_fork_free; + + if (current->xcall_enable) + bitmap_copy(p->xcall_enable, current->xcall_enable, __NR_syscalls); +#endif + #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY retval = sched_prefer_cpus_fork(p, current->prefer_cpus); if (retval) -- 2.34.1

From: Yipeng Zou <zouyipeng@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IB6JLE -------------------------------- This patch is designed to optimize the performance of the SVC exception handler by simplifying its operation, which can lead to faster execution times. However, this optimization comes with the trade-off of reduced functionality, particularly in areas related to security and maintenance. When a task is executed with xcall, certain features that are crucial for robust system operation may not be available, which could impact the system's ability to perform essential tasks. Here's a breakdown of the potential impacts: 1. Memory Tagging Extension (MTE) 2. Process Trace (PTRACE) 3. System Call Trace (STRACE) 4. GNU Debugger (GDB) 5. Software single-stepping 6. Secure State Buffer Descriptor (SSBD) 7. Shadow Call Stack 8. Software Translation Table Buffer Zero Protection (SW_TTBR0_PAN) 9. Unmap Kernel at Exception Level 0 (UNMAP_KERNEL_AT_EL0) 10.ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD 11. GCC Plugin Stack Leak Detection (GCC_PLUGIN_STACKLEAK) 12. SYSCALL Trace Point In conclusion, while the patch is intended to enhance the performance of the SVC exception handler, it does so by sacrificing several important features that contribute to security, debugging, and overall system stability. It is imperative for developers and system administrators to be cognizant of these trade-offs and to plan for the potential effects on their applications and operational workflows. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: l00862714 <liaochen4@huawei.com> --- arch/Kconfig | 24 +++++++++++++ arch/arm64/include/asm/exception.h | 3 ++ arch/arm64/kernel/entry-common.c | 20 +++++++++++ arch/arm64/kernel/entry.S | 51 ++++++++++++++++++++++++-- arch/arm64/kernel/syscall.c | 57 ++++++++++++++++++++++++++++++ 5 files changed, 152 insertions(+), 3 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 48ef789de86b..b80a22ed5545 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1205,4 +1205,28 @@ config FAST_SYSCALL exception handling path that only considers necessary features such as security, context saving, and recovery. +config DEBUG_FEATURE_BYPASS + bool "Bypass debug feature in fast syscall" + depends on FAST_SYSCALL + default y + help + This to bypass debug feature in fast syscall. + The svc exception handling process, which includes auxiliary + functions for debug/trace and core functions like + KPTI, has been identified as overly "lengthy". + In fast syscall we only considers necessary features. + Disable this config to keep debug feature in fast syscall. + +config SECURITY_FEATURE_BYPASS + bool "Bypass security feature in fast syscall" + depends on FAST_SYSCALL + default y + help + This to bypass security feature in fast syscall. + The svc exception handling process, which includes auxiliary + functions for debug/trace and core functions like + KPTI, has been identified as overly "lengthy". + In fast syscall we only considers necessary features. + Disable this config to keep security feature in fast syscall. + endmenu diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h index d38d526d084e..4b7994bd2b94 100644 --- a/arch/arm64/include/asm/exception.h +++ b/arch/arm64/include/asm/exception.h @@ -48,6 +48,9 @@ void do_el0_sys(unsigned long esr, struct pt_regs *regs); void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs); void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr); void do_el0_cp15(unsigned long esr, struct pt_regs *regs); +#ifdef CONFIG_FAST_SYSCALL +void do_el0_xcall(struct pt_regs *regs); +#endif void do_el0_svc(struct pt_regs *regs); void do_el0_svc_compat(struct pt_regs *regs); void do_el0_fpac(struct pt_regs *regs, unsigned long esr); diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index 02cd5d57edb6..e567484fc662 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -388,6 +388,26 @@ static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr) do_el0_fpac(regs, esr); } +#ifdef CONFIG_FAST_SYSCALL +asmlinkage void noinstr fast_enter_from_user_mode(void) +{ +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + lockdep_hardirqs_off(CALLER_ADDR0); + CT_WARN_ON(ct_state() != CONTEXT_USER); +#endif + user_exit_irqoff(); +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + trace_hardirqs_off_finish(); +#endif +} + +asmlinkage void noinstr el0_xcall_handler(struct pt_regs *regs) +{ + fast_enter_from_user_mode(); + do_el0_xcall(regs); +} +#endif + asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs) { unsigned long esr = read_sysreg(esr_el1); diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index e49b5569bb97..e99a60fce105 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -182,7 +182,7 @@ alternative_else_nop_endif #endif .endm - .macro kernel_entry, el, regsize = 64 + .macro kernel_entry, el, regsize = 64, fast_mode = std .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 .endif @@ -212,12 +212,19 @@ alternative_else_nop_endif * Ensure MDSCR_EL1.SS is clear, since we can unmask debug exceptions * when scheduling. */ + .if \fast_mode == std ldr x19, [tsk, #TSK_TI_FLAGS] disable_step_tsk x19, x20 + .endif /* Check for asynchronous tag check faults in user space */ + .if \fast_mode == std check_mte_async_tcf x22, x23 + .endif + + .if \fast_mode == std apply_ssbd 1, x22, x23 + .endif ptrauth_keys_install_kernel tsk, x20, x22, x23 @@ -243,9 +250,11 @@ alternative_else_nop_endif add x29, sp, #S_STACKFRAME #ifdef CONFIG_ARM64_SW_TTBR0_PAN +.if \fast_mode == std alternative_if_not ARM64_HAS_PAN bl __swpan_entry_el\el alternative_else_nop_endif +.endif #endif stp x22, x23, [sp, #S_PC] @@ -268,9 +277,11 @@ alternative_else_nop_endif /* Re-enable tag checking (TCO set on exception entry) */ #ifdef CONFIG_ARM64_MTE +.if \fast_mode == std alternative_if ARM64_MTE SET_PSTATE_TCO(0) alternative_else_nop_endif +.endif #endif /* @@ -283,7 +294,7 @@ alternative_else_nop_endif */ .endm - .macro kernel_exit, el + .macro kernel_exit, el, fast_mode = std .if \el != 0 disable_daif .endif @@ -303,14 +314,18 @@ alternative_else_nop_endif ldp x21, x22, [sp, #S_PC] // load ELR, SPSR #ifdef CONFIG_ARM64_SW_TTBR0_PAN +.if \fast_mode == std alternative_if_not ARM64_HAS_PAN bl __swpan_exit_el\el alternative_else_nop_endif +.endif #endif .if \el == 0 ldr x23, [sp, #S_SP] // load return stack pointer msr sp_el0, x23 + + .if \fast_mode == std tst x22, #PSR_MODE32_BIT // native task? b.eq 3f @@ -325,13 +340,17 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: + .endif + scs_save tsk, x0 /* No kernel C function calls after this as user keys are set. */ ptrauth_keys_install_user tsk, x0, x1, x2 + .if \fast_mode == std apply_ssbd 0, x0, x1 .endif + .endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 @@ -352,6 +371,11 @@ alternative_else_nop_endif ldp x28, x29, [sp, #16 * 14] .if \el == 0 + .if \fast_mode != std + ldr lr, [sp, #S_LR] + add sp, sp, #S_FRAME_SIZE // restore sp + eret + .endif alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0 ldr lr, [sp, #S_LR] add sp, sp, #S_FRAME_SIZE // restore sp @@ -717,10 +741,31 @@ SYM_CODE_END(el1_irq) beq .Lskip_xcall\@ ldp x20, x21, [sp, #0] /* do xcall */ +#ifdef CONFIG_SECURITY_FEATURE_BYPASS + kernel_entry 0, 64, xcall +#else kernel_entry 0, 64 +#endif mov x0, sp - bl el0t_64_sync_handler + bl el0_xcall_handler +#ifdef CONFIG_SECURITY_FEATURE_BYPASS + disable_daif + gic_prio_kentry_setup tmp=x3 + ldr x19, [tsk, #TSK_TI_FLAGS] + and x2, x19, #_TIF_WORK_MASK + cbnz x2, fast_work_pending\@ +fast_finish_ret_to_user\@: + user_enter_irqoff + kernel_exit 0 xcall +fast_work_pending\@: + mov x0, sp // 'regs' + mov x1, x19 + bl do_notify_resume + ldr x19, [tsk, #TSK_TI_FLAGS] // re-check for single-step + b fast_finish_ret_to_user\@ +#else b ret_to_user +#endif .Lskip_xcall\@: ldp x20, x21, [sp, #0] .endm diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 9bd304568d90..72a84fe4a2a7 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -106,6 +106,46 @@ static void cortex_a76_erratum_1463225_svc_handler(void) static void cortex_a76_erratum_1463225_svc_handler(void) { } #endif /* CONFIG_ARM64_ERRATUM_1463225 */ +#ifdef CONFIG_FAST_SYSCALL +static void el0_xcall_common(struct pt_regs *regs, int scno, int sc_nr, + const syscall_fn_t syscall_table[]) +{ + unsigned long flags = read_thread_flags(); + + regs->orig_x0 = regs->regs[0]; + regs->syscallno = scno; + +#ifndef CONFIG_SECURITY_FEATURE_BYPASS + cortex_a76_erratum_1463225_svc_handler(); +#endif + local_daif_restore(DAIF_PROCCTX); + + if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) { + syscall_set_return_value(current, regs, -ERESTARTNOINTR, 0); + return; + } + + if (has_syscall_work(flags)) { + if (scno == NO_SYSCALL) + syscall_set_return_value(current, regs, -ENOSYS, 0); + scno = syscall_trace_enter(regs); + if (scno == NO_SYSCALL) + goto trace_exit; + } + + invoke_syscall(regs, scno, sc_nr, syscall_table); + + if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { + flags = read_thread_flags(); + if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) + return; + } + +trace_exit: + syscall_trace_exit(regs); +} +#endif + static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { @@ -237,6 +277,23 @@ static inline void delouse_pt_regs(struct pt_regs *regs) } #endif +#ifdef CONFIG_FAST_SYSCALL +void do_el0_xcall(struct pt_regs *regs) +{ + const syscall_fn_t *t = sys_call_table; + +#ifdef CONFIG_ARM64_ILP32 + if (is_ilp32_compat_task()) { + t = ilp32_sys_call_table; + delouse_pt_regs(regs); + } +#endif + + fp_user_discard(); + el0_xcall_common(regs, regs->regs[8], __NR_syscalls, t); +} +#endif + void do_el0_svc(struct pt_regs *regs) { const syscall_fn_t *t = sys_call_table; -- 2.34.1

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IB6JLE -------------------------------- Introduce xint software solution for kernel, it provides a lightweight interrupt processing framework for latency-sensitive interrupts, and enabled dynamically for each irq by /proc/irq/<irq>/xint interface. The main implementation schemes are as follows: 1. For a small number of latency-sensitive interrupts, it could be configured as xint state, and process irq by xint framework instead of the kernel general interrupt framework, so improve performance by remove unnecessary processes. It is not recommended to configure too many interrupts as xint in the system, as this will affect system stability to some extent. 2. For each SGI/PPI/SPI interrupts whoes irq numbers are consecutive and limited, use a bitmap to check whether a hwirq is xint. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: l00862714 <liaochen4@huawei.com> --- arch/Kconfig | 25 +++++- arch/arm64/Kconfig | 1 + arch/arm64/configs/openeuler_defconfig | 2 + arch/arm64/include/asm/cpucaps.h | 1 + arch/arm64/kernel/cpufeature.c | 26 ++++++ arch/arm64/kernel/entry-common.c | 4 +- arch/arm64/kernel/entry.S | 80 +++++++++++++++++ drivers/irqchip/irq-gic-v3.c | 117 +++++++++++++++++++++++++ include/linux/hardirq.h | 5 ++ include/linux/irqchip/arm-gic-v3.h | 13 +++ kernel/irq/irqdesc.c | 19 ++++ kernel/irq/proc.c | 10 +++ kernel/softirq.c | 73 +++++++++++++++ 13 files changed, 373 insertions(+), 3 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index b80a22ed5545..6dc501a4afb1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1205,9 +1205,30 @@ config FAST_SYSCALL exception handling path that only considers necessary features such as security, context saving, and recovery. +config ARCH_SUPPORTS_FAST_IRQ + bool + +config FAST_IRQ + bool "Fast irq support" + depends on ARCH_SUPPORTS_FAST_IRQ + default n + help + The irq handling process, which includes auxiliary + functions for debug/trace and core functions like + KPTI, interrupt time record, interrupt processing as + a random number source, interrupt affinity + modification and interrupt processing race, as well as + spurious and unhandled interrupt debugging, has been + identified as overly "lengthy". + To address this, we introduce the concept of fast irq, + a fast interrupt handling path that only considers + necessary features such as security, context saving + and recovery, which adds an lightweight interrupt processing + framework for latency-sensitive interrupts. + config DEBUG_FEATURE_BYPASS bool "Bypass debug feature in fast syscall" - depends on FAST_SYSCALL + depends on FAST_SYSCALL || FAST_IRQ default y help This to bypass debug feature in fast syscall. @@ -1219,7 +1240,7 @@ config DEBUG_FEATURE_BYPASS config SECURITY_FEATURE_BYPASS bool "Bypass security feature in fast syscall" - depends on FAST_SYSCALL + depends on FAST_SYSCALL || FAST_IRQ default y help This to bypass security feature in fast syscall. diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index d19304745fee..ad38ba5be590 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -220,6 +220,7 @@ config ARM64 select HAVE_LIVEPATCH_WO_FTRACE select THP_NUMA_CONTROL if ARM64_64K_PAGES && NUMA_BALANCING && TRANSPARENT_HUGEPAGE select ARCH_SUPPORTS_FAST_SYSCALL if !ARM64_MTE && !KASAN_HW_TAGS + select ARCH_SUPPORTS_FAST_IRQ if ARM_GIC_V3 && !ARM64_MTE && !KASAN_HW_TAGS help ARM 64-bit (AArch64) Linux support. diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 8f649fdedfea..576da6c47caf 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -923,6 +923,8 @@ CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y CONFIG_HAVE_GCC_PLUGINS=y CONFIG_ARCH_SUPPORTS_FAST_SYSCALL=y # CONFIG_FAST_SYSCALL is not set +CONFIG_ARCH_SUPPORTS_FAST_IRQ=y +# CONFIG_FAST_IRQ is not set # end of General architecture-dependent options CONFIG_RT_MUTEXES=y diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 5c4e78ffa264..e2a2b3e40c94 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -82,6 +82,7 @@ #define ARM64_SME 74 #define ARM64_SME_FA64 75 #define ARM64_HAS_XCALL 76 +#define ARM64_HAS_XINT 77 #define ARM64_NCAPS 80 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index f5ef453593ff..9b4a315e96bc 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2175,6 +2175,24 @@ static bool has_xcall_support(const struct arm64_cpu_capabilities *entry, int __ } #endif +#ifdef CONFIG_FAST_IRQ +bool is_xint_support; +static int __init xint_setup(char *str) +{ + if (!cpus_have_cap(ARM64_HAS_SYSREG_GIC_CPUIF)) + return 1; + + is_xint_support = true; + return 1; +} +__setup("xint", xint_setup); + +static bool has_xint_support(const struct arm64_cpu_capabilities *entry, int __unused) +{ + return is_xint_support; +} +#endif + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -2728,6 +2746,14 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .type = ARM64_CPUCAP_SYSTEM_FEATURE, .matches = has_xcall_support, }, +#endif +#ifdef CONFIG_FAST_IRQ + { + .desc = "Xint Support", + .capability = ARM64_HAS_XINT, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = has_xint_support, + }, #endif {}, }; diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index e567484fc662..3e59cbdedc4c 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -388,7 +388,7 @@ static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr) do_el0_fpac(regs, esr); } -#ifdef CONFIG_FAST_SYSCALL +#if defined(CONFIG_FAST_SYSCALL) || defined(CONFIG_FAST_IRQ) asmlinkage void noinstr fast_enter_from_user_mode(void) { #ifndef CONFIG_DEBUG_FEATURE_BYPASS @@ -400,7 +400,9 @@ asmlinkage void noinstr fast_enter_from_user_mode(void) trace_hardirqs_off_finish(); #endif } +#endif +#ifdef CONFIG_FAST_SYSCALL asmlinkage void noinstr el0_xcall_handler(struct pt_regs *regs) { fast_enter_from_user_mode(); diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index e99a60fce105..664a0d3059ab 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -809,10 +809,90 @@ SYM_CODE_START_LOCAL_NOALIGN(el0_error_compat) kernel_entry 0, 32 b el0_error_naked SYM_CODE_END(el0_error_compat) +#endif + +#ifdef CONFIG_FAST_IRQ +.macro el0_xint_handler, handler:req +#if defined(CONFIG_CONTEXT_TRACKING) || defined(CONFIG_TRACE_IRQFLAGS) + bl fast_enter_from_user_mode +#endif + enable_da_f +#ifndef CONFIG_SECURITY_FEATURE_BYPASS + tbz x22, #55, 1f + bl do_el0_irq_bp_hardening +1: +#endif + irq_handler \handler +.endm + +.macro check_xint_pre_kernel_entry + stp x0, x1, [sp, #0] + stp x2, x3, [sp, #16] + + ldr x0, =irqnr_xint_map + /* get hpp irqnr */ + mrs_s x1, SYS_ICC_HPPIR1_EL1 + + /* xint hwirq can not exceed 1020 */ + cmp x1, 1020 + b.ge .Lskip_xint\@ + + /* x2 = irqnr % 8 */ + and x2, x1, #7 + /* x3 = irqnr / 8 */ + lsr x3, x1, #3 + /* x1 is the byte of irqnr in irqnr_xint_map */ + ldr x1, [x0, x3] + + /* Get the check mask */ + mov x3, #1 + /* x3 = 1 << (irqnr % 8) */ + lsl x3, x3, x2 + + /* x1 = x1 & x3 */ + ands x1, x1, x3 + b.eq .Lskip_xint\@ + + ldp x0, x1, [sp, #0] + ldp x2, x3, [sp, #16] +#ifdef CONFIG_SECURITY_FEATURE_BYPASS + kernel_entry 0, 64, xint + el0_xint_handler handle_arch_irq + disable_daif + gic_prio_kentry_setup tmp=x3 + ldr x19, [tsk, #TSK_TI_FLAGS] + and x2, x19, #_TIF_WORK_MASK + cbnz x2, xint_fast_work_pending\@ +xint_fast_finish_ret_to_user\@: + user_enter_irqoff + kernel_exit 0 xint +xint_fast_work_pending\@: + mov x0, sp // 'regs' + mov x1, x19 + bl do_notify_resume + b xint_fast_finish_ret_to_user\@ +#else + kernel_entry 0, 64 + el0_xint_handler handle_arch_irq + b ret_to_user +#endif + +.Lskip_xint\@: + ldp x0, x1, [sp, #0] + ldp x2, x3, [sp, #16] +.endm #endif .align 6 SYM_CODE_START_LOCAL_NOALIGN(el0_irq) +#ifdef CONFIG_FAST_IRQ + /* Only support el0 aarch64 irq */ + alternative_if_not ARM64_HAS_XINT + b .Lskip_check_xint + alternative_else_nop_endif + check_xint_pre_kernel_entry +.Lskip_check_xint: +#endif kernel_entry 0 el0_irq_naked: el0_interrupt_handler handle_arch_irq diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 87af452d82dc..e2aef650a884 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -720,6 +720,123 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs } } +#ifdef CONFIG_FAST_IRQ +DECLARE_BITMAP(irqnr_xint_map, 1024); + +static bool can_set_xint(unsigned int hwirq) +{ + if (__get_intid_range(hwirq) == SGI_RANGE || + __get_intid_range(hwirq) == SPI_RANGE) + return true; + + return false; +} + +static bool xint_transform(int irqno, enum xint_op op) +{ + struct irq_data *data = irq_get_irq_data(irqno); + int hwirq; + + while (data->parent_data) + data = data->parent_data; + + hwirq = data->hwirq; + + if (!can_set_xint(hwirq)) + return false; + + switch (op) { + case IRQ_TO_XINT: + set_bit(hwirq, irqnr_xint_map); + return true; + case XINT_TO_IRQ: + clear_bit(hwirq, irqnr_xint_map); + return false; + case XINT_SET_CHECK: + return test_bit(hwirq, irqnr_xint_map); + case XINT_RANGE_CHECK: + return true; + default: + return false; + } +} + +static ssize_t xint_proc_write(struct file *file, + const char __user *buffer, size_t count, loff_t *pos) +{ + int irq = (int)(long)PDE_DATA(file_inode(file)); + bool xint_state = false; + unsigned long val; + char *buf = NULL; + + if (!xint_transform(irq, XINT_RANGE_CHECK)) + return -EPERM; + + buf = memdup_user_nul(buffer, count); + if (IS_ERR(buf)) + return PTR_ERR(buf); + + if (kstrtoul(buf, 0, &val) || (val != 0 && val != 1)) { + kfree(buf); + return -EINVAL; + } + + xint_state = xint_transform(irq, XINT_SET_CHECK); + if (xint_state == val) { + kfree(buf); + return -EBUSY; + } + + local_irq_disable(); + disable_irq(irq); + + xint_transform(irq, xint_state ? XINT_TO_IRQ : IRQ_TO_XINT); + + enable_irq(irq); + local_irq_enable(); + + kfree(buf); + + return count; +} + +static int xint_proc_show(struct seq_file *m, void *v) +{ + seq_printf(m, "%d\n", xint_transform((long)m->private, XINT_SET_CHECK)); + return 0; +} + +static int xint_proc_open(struct inode *inode, struct file *file) +{ + return single_open(file, xint_proc_show, PDE_DATA(inode)); +} + +static const struct proc_ops xint_proc_ops = { + .proc_open = xint_proc_open, + .proc_read = seq_read, + .proc_lseek = seq_lseek, + .proc_release = single_release, + .proc_write = xint_proc_write, +}; + +void register_irqchip_proc(struct irq_desc *desc, void *irqp) +{ + if (!is_xint_support) + return; + + /* create /proc/irq/<irq>/xint */ + proc_create_data("xint", 0644, desc->dir, &xint_proc_ops, irqp); +} + +void unregister_irqchip_proc(struct irq_desc *desc) +{ + if (!is_xint_support) + return; + + remove_proc_entry("xint", desc->dir); +} +#endif /* CONFIG_FAST_IRQ */ + static u32 gic_get_pribits(void) { u32 pribits; diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 754f67ac4326..ad08a37f3bc0 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -86,6 +86,11 @@ void irq_exit(void); */ void irq_exit_rcu(void); +#ifdef CONFIG_FAST_IRQ +void xint_enter(void); +void xint_exit(void); +#endif + #ifndef arch_nmi_enter #define arch_nmi_enter() do { } while (0) #define arch_nmi_exit() do { } while (0) diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 88b02e3b81da..d94b013a091c 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -762,6 +762,19 @@ static inline enum gic_intid_range __get_intid_range(irq_hw_number_t hwirq) } } +#ifdef CONFIG_FAST_IRQ +extern bool is_xint_support; + +enum xint_op { + XINT_TO_IRQ, + IRQ_TO_XINT, + XINT_SET_CHECK, + XINT_RANGE_CHECK, +}; + +void register_irqchip_proc(struct irq_desc *desc, void *irqp); +void unregister_irqchip_proc(struct irq_desc *desc); +#endif #endif #endif diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 6c009a033c73..5dc976d32c74 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -658,6 +658,10 @@ int generic_handle_irq(unsigned int irq) EXPORT_SYMBOL_GPL(generic_handle_irq); #ifdef CONFIG_HANDLE_DOMAIN_IRQ +#ifdef CONFIG_FAST_IRQ +extern DECLARE_BITMAP(irqnr_xint_map, 1024); +#endif + /** * __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain * @domain: The domain where to perform the lookup @@ -673,8 +677,16 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, struct pt_regs *old_regs = set_irq_regs(regs); unsigned int irq = hwirq; int ret = 0; +#ifdef CONFIG_FAST_IRQ + int is_xint = test_bit(hwirq, irqnr_xint_map); + if (is_xint) + xint_enter(); + else + irq_enter(); +#else irq_enter(); +#endif #ifdef CONFIG_IRQ_DOMAIN if (lookup) @@ -692,7 +704,14 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, generic_handle_irq(irq); } +#ifdef CONFIG_FAST_IRQ + if (is_xint) + xint_exit(); + else + irq_exit(); +#else irq_exit(); +#endif set_irq_regs(old_regs); return ret; } diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index 0df62a3a1f37..64805a37c576 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -13,6 +13,10 @@ #include <linux/kernel_stat.h> #include <linux/mutex.h> +#ifdef CONFIG_FAST_IRQ +#include <linux/irqchip/arm-gic-v3.h> +#endif + #include "internals.h" /* @@ -331,6 +335,9 @@ void register_handler_proc(unsigned int irq, struct irqaction *action) action->dir = proc_mkdir(name, desc->dir); } +void __weak register_irqchip_proc(struct irq_desc *desc, void *irqp) { } +void __weak unregister_irqchip_proc(struct irq_desc *desc) { } + #undef MAX_NAMELEN #define MAX_NAMELEN 10 @@ -385,6 +392,7 @@ void register_irq_proc(unsigned int irq, struct irq_desc *desc) #endif proc_create_single_data("spurious", 0444, desc->dir, irq_spurious_proc_show, (void *)(long)irq); + register_irqchip_proc(desc, irqp); out_unlock: mutex_unlock(®ister_lock); @@ -408,6 +416,8 @@ void unregister_irq_proc(unsigned int irq, struct irq_desc *desc) #endif remove_proc_entry("spurious", desc->dir); + unregister_irqchip_proc(desc); + sprintf(name, "%u", irq); remove_proc_entry(name, root_irq_dir); } diff --git a/kernel/softirq.c b/kernel/softirq.c index 4196b9f84690..8f12a820a574 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -345,6 +345,42 @@ asmlinkage __visible void do_softirq(void) local_irq_restore(flags); } +#ifdef CONFIG_FAST_IRQ +/** + * xint_enter_rcu - Copy from irq_enter_rcu + */ +void xint_enter_rcu(void) +{ + if (tick_nohz_full_cpu(smp_processor_id()) || + (is_idle_task(current) && !in_interrupt())) { + /* + * Prevent raise_softirq from needlessly waking up ksoftirqd + * here, as softirq will be serviced on return from interrupt. + */ + local_bh_disable(); + tick_irq_enter(); + _local_bh_enable(); + } + +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + account_irq_enter_time(current); +#endif + preempt_count_add(HARDIRQ_OFFSET); +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + lockdep_hardirq_enter(); +#endif +} + +/** + * irq_enter - Copy from irq_enter + */ +void xint_enter(void) +{ + rcu_irq_enter(); + xint_enter_rcu(); +} +#endif + /** * irq_enter_rcu - Enter an interrupt context with RCU watching */ @@ -411,6 +447,43 @@ static inline void tick_irq_exit(void) #endif } +#ifdef CONFIG_FAST_IRQ +static inline void __xint_exit_rcu(void) +{ +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED + local_irq_disable(); +#else +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + lockdep_assert_irqs_disabled(); +#endif +#endif + +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + account_irq_exit_time(current); +#endif + preempt_count_sub(HARDIRQ_OFFSET); + if (!in_interrupt() && local_softirq_pending()) + invoke_softirq(); + + tick_irq_exit(); +} + +/** + * xint_exit - Copy from irq_exit + * + * Also processes softirqs if needed and possible. + */ +void xint_exit(void) +{ + __xint_exit_rcu(); + rcu_irq_exit(); + /* must be last! */ +#ifndef CONFIG_DEBUG_FEATURE_BYPASS + lockdep_hardirq_exit(); +#endif +} +#endif + static inline void __irq_exit_rcu(void) { #ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED -- 2.34.1

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IB6JLE -------------------------------- Add a debugfs dir for xint, so we can get the xint irq information such as 'which interrupts are currently in xint state' with following cmd: # ls /sys/kernel/debug/irq/xints Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- drivers/irqchip/irq-gic-v3.c | 6 ++++++ kernel/irq/debugfs.c | 33 +++++++++++++++++++++++++++++++++ kernel/irq/internals.h | 18 ++++++++++++++++++ 3 files changed, 57 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index e2aef650a884..7293732b5f72 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -31,6 +31,10 @@ #include "irq-gic-common.h" +#ifdef CONFIG_FAST_IRQ +#include "../../../kernel/irq/internals.h" +#endif + #define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80) #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0) @@ -748,9 +752,11 @@ static bool xint_transform(int irqno, enum xint_op op) switch (op) { case IRQ_TO_XINT: set_bit(hwirq, irqnr_xint_map); + xint_add_debugfs_entry(irqno); return true; case XINT_TO_IRQ: clear_bit(hwirq, irqnr_xint_map); + xint_remove_debugfs_entry(irqno); return false; case XINT_SET_CHECK: return test_bit(hwirq, irqnr_xint_map); diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c index e4cff358b437..a4a7f87eab39 100644 --- a/kernel/irq/debugfs.c +++ b/kernel/irq/debugfs.c @@ -236,6 +236,34 @@ void irq_add_debugfs_entry(unsigned int irq, struct irq_desc *desc) &dfs_irq_ops); } +#ifdef CONFIG_FAST_IRQ +static struct dentry *xint_dir; + +void xint_add_debugfs_entry(unsigned int irq) +{ + char name[10]; + char buf[100]; + + if (!xint_dir) + return; + + sprintf(name, "%d", irq); + sprintf(buf, "../irqs/%d", irq); + debugfs_create_symlink(name, xint_dir, buf); +} + +void xint_remove_debugfs_entry(unsigned int irq) +{ + char name[10]; + + if (!xint_dir) + return; + + sprintf(name, "%d", irq); + debugfs_lookup_and_remove(name, xint_dir); +} +#endif + static int __init irq_debugfs_init(void) { struct dentry *root_dir; @@ -247,6 +275,11 @@ static int __init irq_debugfs_init(void) irq_dir = debugfs_create_dir("irqs", root_dir); +#ifdef CONFIG_FAST_IRQ + if (is_xint_support) + xint_dir = debugfs_create_dir("xints", root_dir); +#endif + irq_lock_sparse(); for_each_active_irq(irq) irq_add_debugfs_entry(irq, irq_to_desc(irq)); diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index 48d6aa8cdbed..d725d8ef5ce7 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -492,6 +492,14 @@ static inline void irq_remove_debugfs_entry(struct irq_desc *desc) debugfs_remove(desc->debugfs_file); kfree(desc->dev_name); } + +#ifdef CONFIG_FAST_IRQ +extern bool is_xint_support; + +void xint_add_debugfs_entry(unsigned int irq); +void xint_remove_debugfs_entry(unsigned int irq); +#endif + void irq_debugfs_copy_devname(int irq, struct device *dev); # ifdef CONFIG_IRQ_DOMAIN void irq_domain_debugfs_init(struct dentry *root); @@ -507,6 +515,16 @@ static inline void irq_add_debugfs_entry(unsigned int irq, struct irq_desc *d) static inline void irq_remove_debugfs_entry(struct irq_desc *d) { } + +#ifdef CONFIG_FAST_IRQ +static inline void xint_add_debugfs_entry(unsigned int irq) +{ +} +static inline void xint_remove_debugfs_entry(unsigned int irq) +{ +} +#endif + static inline void irq_debugfs_copy_devname(int irq, struct device *dev) { } -- 2.34.1

From: Liao Chen <liaochen4@huawei.com> revert kpti bypass. Liao Chen <liaochen4@huawei.com> --- arch/arm64/kernel/entry.S | 5 ----- 1 file changed, 5 deletions(-) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 664a0d3059ab..5ed8b8e8e58e 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -371,11 +371,6 @@ alternative_else_nop_endif ldp x28, x29, [sp, #16 * 14] .if \el == 0 - .if \fast_mode != std - ldr lr, [sp, #S_LR] - add sp, sp, #S_FRAME_SIZE // restore sp - eret - .endif alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0 ldr lr, [sp, #S_LR] add sp, sp, #S_FRAME_SIZE // restore sp -- 2.34.1

From: Yipeng Zou <zouyipeng@huawei.com> Since xcall has been supported, we can switch all syscall to xcall by using libxcall or set to it /proc/$PID/xcall. This patch introduces xcall_select to support a special syscall implemented. First you need define one XCALL_DEFINEx function in kernel. Then you can switch one syscall to this function by: echo @$syscall_nr > /proc/$PID/xcall Make sure it has been enabled in xcall before switch to xcall function. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/exception.h | 3 ++ arch/arm64/include/asm/syscall.h | 14 +++++++ arch/arm64/include/asm/syscall_wrapper.h | 25 ++++++++++++ arch/arm64/kernel/sys.c | 20 +++++++++ arch/arm64/kernel/syscall.c | 14 ++++++- fs/proc/base.c | 52 +++++++++++++++++++++--- include/linux/sched.h | 3 +- include/linux/syscalls.h | 12 ++++++ include/uapi/asm-generic/unistd.h | 10 +++++ kernel/fork.c | 12 ++++++ 10 files changed, 157 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h index 4b7994bd2b94..b427bd61dd31 100644 --- a/arch/arm64/include/asm/exception.h +++ b/arch/arm64/include/asm/exception.h @@ -55,4 +55,7 @@ void do_el0_svc(struct pt_regs *regs); void do_el0_svc_compat(struct pt_regs *regs); void do_el0_fpac(struct pt_regs *regs, unsigned long esr); void do_el1_fpac(struct pt_regs *regs, unsigned long esr); +#ifdef CONFIG_FAST_SYSCALL +void do_el0_xcall(struct pt_regs *regs); +#endif #endif /* __ASM_EXCEPTION_H */ diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index 45ac51ba05fc..9b108442c162 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -12,6 +12,9 @@ typedef long (*syscall_fn_t)(const struct pt_regs *regs); extern const syscall_fn_t sys_call_table[]; +#ifdef CONFIG_FAST_SYSCALL +extern const syscall_fn_t x_call_table[]; +#endif #ifdef CONFIG_AARCH32_EL0 extern const syscall_fn_t a32_sys_call_table[]; @@ -99,4 +102,15 @@ static inline int syscall_get_arch(struct task_struct *task) return AUDIT_ARCH_AARCH64; } +#ifdef CONFIG_FAST_SYSCALL +asmlinkage long __arm64_sys_ni_syscall(const struct pt_regs *__unused); + +static inline int syscall_is_xcall_register(unsigned int sc_no) +{ + if (x_call_table[sc_no] == __arm64_sys_ni_syscall) + return 0; + return 1; +} +#endif + #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h index d30217c21eff..70150380a01f 100644 --- a/arch/arm64/include/asm/syscall_wrapper.h +++ b/arch/arm64/include/asm/syscall_wrapper.h @@ -48,6 +48,31 @@ #endif /* CONFIG_COMPAT */ +#ifdef CONFIG_FAST_SYSCALL +#define __XCALL_DEFINEx(x, name, ...) \ + asmlinkage long __arm64_xcall_sys##name(const struct pt_regs *regs); \ + ALLOW_ERROR_INJECTION(__arm64_xcall_sys##name, ERRNO); \ + static long __se_xcall_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ + static inline long __do_xcall_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ + asmlinkage long __arm64_xcall_sys##name(const struct pt_regs *regs) \ + { \ + return __se_xcall_sys##name(SC_ARM64_REGS_TO_ARGS(x,__VA_ARGS__)); \ + } \ + static long __se_xcall_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ + { \ + long ret = __do_xcall_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ + __MAP(x,__SC_TEST,__VA_ARGS__); \ + __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ + return ret; \ + } \ + static inline long __do_xcall_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) + +#define XCALL_DEFINE0(sname) \ + asmlinkage long __arm64_xcall_sys_##sname(const struct pt_regs *__unused); \ + ALLOW_ERROR_INJECTION(__arm64_xcall_sys_##sname, ERRNO); \ + asmlinkage long __arm64_xcall_sys_##sname(const struct pt_regs *__unused) +#endif + #define __SYSCALL_DEFINEx(x, name, ...) \ asmlinkage long __arm64_sys##name(const struct pt_regs *regs); \ ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO); \ diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c index d5ffaaab31a7..6508f6130bbf 100644 --- a/arch/arm64/kernel/sys.c +++ b/arch/arm64/kernel/sys.c @@ -48,6 +48,10 @@ asmlinkage long __arm64_sys_ni_syscall(const struct pt_regs *__unused) */ #define __arm64_sys_personality __arm64_sys_arm64_personality +#ifdef CONFIG_FAST_SYSCALL +#undef __XCALL +#endif + #undef __SYSCALL #define __SYSCALL(nr, sym) asmlinkage long __arm64_##sym(const struct pt_regs *); #include <asm/unistd.h> @@ -59,3 +63,19 @@ const syscall_fn_t sys_call_table[__NR_syscalls] = { [0 ... __NR_syscalls - 1] = __arm64_sys_ni_syscall, #include <asm/unistd.h> }; + +#ifdef CONFIG_FAST_SYSCALL +#undef __SYSCALL + +#undef __XCALL +#define __XCALL(nr, sym) asmlinkage long __arm64_xcall_##sym(const struct pt_regs *); +#include <asm/unistd.h> + +#undef __XCALL +#define __XCALL(nr, sym) [nr] = __arm64_xcall_##sym, + +const syscall_fn_t x_call_table[__NR_syscalls] = { + [0 ... __NR_syscalls - 1] = __arm64_sys_ni_syscall, +#include <asm/unistd.h> +}; +#endif diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 72a84fe4a2a7..b8ed5d8c24d6 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -280,7 +280,9 @@ static inline void delouse_pt_regs(struct pt_regs *regs) #ifdef CONFIG_FAST_SYSCALL void do_el0_xcall(struct pt_regs *regs) { - const syscall_fn_t *t = sys_call_table; + unsigned int scno, scno_nr; + const syscall_fn_t *t; + int xcall_nr; #ifdef CONFIG_ARM64_ILP32 if (is_ilp32_compat_task()) { @@ -289,6 +291,16 @@ void do_el0_xcall(struct pt_regs *regs) } #endif + scno = regs->regs[8]; + scno_nr = __NR_syscalls; + + xcall_nr = array_index_nospec(scno, scno_nr); + if (scno < scno_nr && current->xcall_select && + test_bit(xcall_nr, current->xcall_select)) + t = x_call_table; + else + t = sys_call_table; + fp_user_discard(); el0_xcall_common(regs, regs->regs[8], __NR_syscalls, t); } diff --git a/fs/proc/base.c b/fs/proc/base.c index 3206960c4bd7..4c6fdda92fa4 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3596,7 +3596,7 @@ static int xcall_show(struct seq_file *m, void *v) { struct inode *inode = m->private; struct task_struct *p; - unsigned int rs, re; + unsigned int rs, re, sc_no; if (!fast_syscall_enabled()) return -EACCES; @@ -3617,7 +3617,19 @@ static int xcall_show(struct seq_file *m, void *v) rs == (re - 1) ? seq_printf(m, "%d,", rs) : seq_printf(m, "%d-%d,", rs, re - 1); } - seq_puts(m, "\n"); + seq_printf(m, "\nAvailable:\n"); + + for (sc_no = 0; sc_no < __NR_syscalls; sc_no++) { + if (!syscall_is_xcall_register(sc_no)) + continue; + + seq_printf(m, "NR_syscall: %3d: enabled: %d ", sc_no, test_bit(sc_no, p->xcall_enable)); + + if (p->xcall_select) + seq_printf(m, "xcall_select: %d\n", test_bit(sc_no, p->xcall_select)); + else + seq_printf(m, "xcall_select: NULL\n"); + } out: put_task_struct(p); @@ -3631,6 +3643,15 @@ static int xcall_open(struct inode *inode, struct file *filp) static int xcall_enable_one(struct task_struct *p, unsigned int sc_no) { + /* Alloc in First */ + if (!bitmap_weight(p->xcall_enable, __NR_syscalls)) { + BUG_ON(p->xcall_select); + p->xcall_select = bitmap_zalloc(__NR_syscalls, GFP_KERNEL); + if (!p->xcall_select) + return -EINVAL; + } + + bitmap_clear(p->xcall_select, sc_no, 1); bitmap_set(p->xcall_enable, sc_no, 1); return 0; } @@ -3638,6 +3659,21 @@ static int xcall_enable_one(struct task_struct *p, unsigned int sc_no) static int xcall_disable_one(struct task_struct *p, unsigned int sc_no) { bitmap_clear(p->xcall_enable, sc_no, 1); + bitmap_clear(p->xcall_select, sc_no, 1); + + /* Free in Last */ + if (!bitmap_weight(p->xcall_enable, __NR_syscalls)) { + BUG_ON(!p->xcall_select); + bitmap_free(p->xcall_select); + p->xcall_select = NULL; + } + return 0; +} + +static int xcall_select_table(struct task_struct *p, unsigned int sc_no) +{ + BUG_ON(!p->xcall_select); + test_and_change_bit(sc_no, p->xcall_select); return 0; } @@ -3650,7 +3686,7 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, const size_t maxlen = sizeof(buffer) - 1; unsigned int sc_no = __NR_syscalls; int ret = 0; - int is_clear = 0; + int is_clear = 0, is_switch = 0; if (!fast_syscall_enabled()) return -EACCES; @@ -3665,8 +3701,10 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, if (buffer[0] == '!') is_clear = 1; + else if ((buffer[0] == '@')) + is_switch = 1; - if (kstrtouint(buffer + is_clear, 10, &sc_no)) { + if (kstrtouint(buffer + is_clear + is_switch, 10, &sc_no)) { ret = -EINVAL; goto out; } @@ -3676,9 +3714,11 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, goto out; } - if (!is_clear && !test_bit(sc_no, p->xcall_enable)) + if (is_switch && syscall_is_xcall_register(sc_no) && test_bit(sc_no, p->xcall_enable)) + ret = xcall_select_table(p, sc_no); + else if (!is_switch && !is_clear && !test_bit(sc_no, p->xcall_enable)) ret = xcall_enable_one(p, sc_no); - else if (is_clear && test_bit(sc_no, p->xcall_enable)) + else if (!is_switch && is_clear && test_bit(sc_no, p->xcall_enable)) ret = xcall_disable_one(p, sc_no); else ret = -EINVAL; diff --git a/include/linux/sched.h b/include/linux/sched.h index 18361e35a377..a377bae2064e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1479,10 +1479,11 @@ struct task_struct { #endif #if defined(CONFIG_FAST_SYSCALL) KABI_USE(15, unsigned long *xcall_enable) + KABI_USE(16, unsigned long *xcall_select) #else KABI_RESERVE(15) -#endif KABI_RESERVE(16) +#endif KABI_AUX_PTR(task_struct) /* CPU-specific state of this task: */ diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 509733f9214f..0e379bcd8194 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -226,6 +226,18 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) SYSCALL_METADATA(sname, x, __VA_ARGS__) \ __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) +#ifdef CONFIG_FAST_SYSCALL +#define XCALL_DEFINE1(name, ...) XCALL_DEFINEx(1, _##name, __VA_ARGS__) +#define XCALL_DEFINE2(name, ...) XCALL_DEFINEx(2, _##name, __VA_ARGS__) +#define XCALL_DEFINE3(name, ...) XCALL_DEFINEx(3, _##name, __VA_ARGS__) +#define XCALL_DEFINE4(name, ...) XCALL_DEFINEx(4, _##name, __VA_ARGS__) +#define XCALL_DEFINE5(name, ...) XCALL_DEFINEx(5, _##name, __VA_ARGS__) +#define XCALL_DEFINE6(name, ...) XCALL_DEFINEx(6, _##name, __VA_ARGS__) + +#define XCALL_DEFINEx(x, sname, ...) \ + __XCALL_DEFINEx(x, sname, __VA_ARGS__) +#endif + #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__) /* diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 064f512fefd9..9b38861d9ea8 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -16,18 +16,28 @@ #define __SYSCALL(x, y) #endif +#ifndef __XCALL +#define __XCALL(x, y) +#endif + #if __BITS_PER_LONG == 32 || defined(__SYSCALL_COMPAT) #define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _32) +#define __XCALL_SC_3264(_nr, _32, _64) __XCALL(_nr, _32) #else #define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _64) +#define __XCALL_SC_3264(_nr, _32, _64) __XCALL(_nr, _64) #endif #ifdef __SYSCALL_COMPAT #define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _comp) #define __SC_COMP_3264(_nr, _32, _64, _comp) __SYSCALL(_nr, _comp) +#define __XCALL_SC_COMP(_nr, _sys, _comp) __XCALL(_nr, _comp) +#define __XCALL_SC_COMP_3264(_nr, _32, _64, _comp) __XCALL(_nr, _comp) #else #define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _sys) #define __SC_COMP_3264(_nr, _32, _64, _comp) __SC_3264(_nr, _32, _64) +#define __XCALL_SC_COMP(_nr, _sys, _comp) __XCALL(_nr, _sys) +#define __XCALL_SC_COMP_3264(_nr, _32, _64, _comp) __XCALL_SC_3264(_nr, _32, _64) #endif #define __NR_io_setup 0 diff --git a/kernel/fork.c b/kernel/fork.c index bd7afeb364ab..b884ac9cdece 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -483,6 +483,9 @@ void free_task(struct task_struct *tsk) #ifdef CONFIG_FAST_SYSCALL if (tsk->xcall_enable) bitmap_free(tsk->xcall_enable); + + if (tsk->xcall_select) + bitmap_free(tsk->xcall_select); #endif free_task_struct(tsk); @@ -1016,6 +1019,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_FAST_SYSCALL tsk->xcall_enable = NULL; + tsk->xcall_select = NULL; #endif return tsk; @@ -2103,6 +2107,14 @@ static __latent_entropy struct task_struct *copy_process( if (current->xcall_enable) bitmap_copy(p->xcall_enable, current->xcall_enable, __NR_syscalls); + + if (current->xcall_select) { + p->xcall_select = bitmap_zalloc(__NR_syscalls, GFP_KERNEL); + if (!p->xcall_select) + goto bad_fork_free; + + bitmap_copy(p->xcall_select, current->xcall_select, __NR_syscalls); + } #endif #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY -- 2.34.1

From: Yipeng Zou <zouyipeng@huawei.com> Add cache mode for fd wakeup in epoll_pwait. In epoll_pwait, read data from ready fd into pre-allocated kernel cache buffer. And then, in sys_read, read from cache buffer and copy to user. So, we can async prefetch read data in epoll_pwait. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- fs/eventpoll.c | 229 ++++++++++++++++++++++++++++++ fs/open.c | 4 + fs/read_write.c | 92 +++++++++++- include/linux/fs.h | 31 ++++ include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 1 + kernel/sysctl.c | 36 +++++ 7 files changed, 399 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 5ce1ea1f452b..be34d94d26bd 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -128,6 +128,8 @@ struct nested_calls { spinlock_t lock; }; +static struct workqueue_struct *rc_work; + /* * Each file descriptor added to the eventpoll interface will * have an entry of this type linked to the "rbr" RB tree. @@ -768,6 +770,45 @@ static void epi_rcu_free(struct rcu_head *head) kmem_cache_free(epi_cache, epi); } +#ifdef CONFIG_FAST_SYSCALL +#define PREFETCH_ITEM_HASH_BITS 6 +#define PREFETCH_ITEM_TABLE_SIZE (1 << PREFETCH_ITEM_HASH_BITS) +DEFINE_HASHTABLE(xcall_item_table, PREFETCH_ITEM_HASH_BITS); +DEFINE_RAW_SPINLOCK(xcall_table_lock); + +struct prefetch_item *find_prefetch_item(struct file *file) +{ + struct prefetch_item *found = NULL; + unsigned hash = 0; + + hash = hash_64((u64)file, PREFETCH_ITEM_HASH_BITS); + raw_spin_lock(&xcall_table_lock); + hash_for_each_possible(xcall_item_table, found, node, hash) { + if (found->f == file) + break; + } + raw_spin_unlock(&xcall_table_lock); + + return found; +} + +void free_prefetch_item(struct file *file) +{ + struct prefetch_item *pfi = find_prefetch_item(file); + if (pfi) { + raw_spin_lock(&xcall_table_lock); + hlist_del_init(&pfi->node); + raw_spin_unlock(&xcall_table_lock); + + if (pfi->cache) { + kfree(pfi->cache); + pfi->cache = NULL; + } + kfree(pfi); + } +} +#endif + /* * Removes a "struct epitem" from the eventpoll RB tree and deallocates * all the associated resources. Must be called with "mtx" held. @@ -783,6 +824,15 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi) */ ep_unregister_pollwait(ep, epi); +#ifdef CONFIG_FAST_SYSCALL + if (current->xcall_select && + test_bit(__NR_epoll_pwait, current->xcall_select)) { + struct prefetch_item *pfi = find_prefetch_item(file); + if (pfi) + cancel_work_sync(&pfi->work); + } +#endif + /* Remove the current item from the list of epoll hooks */ spin_lock(&file->f_lock); list_del_rcu(&epi->fllink); @@ -1191,6 +1241,150 @@ static inline bool chain_epi_lockless(struct epitem *epi) return true; } + +#ifdef CONFIG_FAST_SYSCALL +int max_fd_cache_pages = 1; +static void do_prefetch_item(struct prefetch_item *pfi) +{ + if (pfi && (pfi->state != EPOLL_FILE_CACHE_QUEUED)) + return; + + if (pfi->len > 0) + return; + + pfi->len = kernel_read(pfi->f, pfi->cache, + max_fd_cache_pages * PAGE_SIZE, &pfi->f->f_pos); + pfi->state = EPOLL_FILE_CACHE_READY; +} + +struct cpumask xcall_numa_cpumask[4] __read_mostly; +unsigned long *xcall_numa_cpumask_bits0 = cpumask_bits(&xcall_numa_cpumask[0]); +unsigned long *xcall_numa_cpumask_bits1 = cpumask_bits(&xcall_numa_cpumask[1]); +unsigned long *xcall_numa_cpumask_bits2 = cpumask_bits(&xcall_numa_cpumask[2]); +unsigned long *xcall_numa_cpumask_bits3 = cpumask_bits(&xcall_numa_cpumask[3]); + +#ifdef CONFIG_SYSCTL +static void proc_xcall_update(void) +{ + int i; + + /* Remove impossible cpus to keep sysctl output clean. */ + for (i = 0; i < 4; i++) + cpumask_and(&xcall_numa_cpumask[i], &xcall_numa_cpumask[i], cpu_possible_mask); +} + +int proc_xcall_numa_cpumask(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int err; + + // todo: add lock + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); + if (!err && write) + proc_xcall_update(); + + return err; +} +#endif /* CONFIG_SYSCTL */ + +static void prefetch_work_fn(struct work_struct *work) +{ + struct prefetch_item *pfi = container_of(work, struct prefetch_item, work); + + spin_lock(&pfi->pfi_lock); + do_prefetch_item(pfi); + spin_unlock(&pfi->pfi_lock); +} + +static int get_nth_cpu_in_cpumask(const struct cpumask *mask, int n) +{ + int count = 0; + int cpu; + + for_each_cpu(cpu, mask) { + if (count == n) + return cpu; + count++; + } + + return cpumask_first(mask); +} + +static struct prefetch_item *alloc_prefetch_item(struct epitem *epi) +{ + struct file *tfile = epi->ffd.file; + struct prefetch_item *pfi; + int fd = epi->ffd.fd; + int cpu, nid; + + if (!current->xcall_select || + !test_bit(__NR_epoll_pwait, current->xcall_select)) + return NULL; + + /* Initialization prefetch item */ + pfi = kmalloc(sizeof(struct prefetch_item), GFP_KERNEL); + if (!pfi) + return NULL; + + pfi->cache = kzalloc(max_fd_cache_pages * PAGE_SIZE, GFP_KERNEL); + if (!pfi->cache) { + kfree(pfi); + return NULL; + } + + /* Init Read Cache mode */ + pfi->state = EPOLL_FILE_CACHE_NONE; + INIT_WORK(&pfi->work, prefetch_work_fn); + INIT_HLIST_NODE(&pfi->node); + spin_lock_init(&pfi->pfi_lock); + pfi->fd = fd; + pfi->f = tfile; + pfi->len = 0; + pfi->pos = 0; + cpu = smp_processor_id(); + nid = numa_node_id(); + cpumask_and(&pfi->related_cpus, cpu_cpu_mask(cpu), cpu_online_mask); + if (nid <= 3 && !cpumask_empty(&xcall_numa_cpumask[nid]) && + cpumask_subset(&xcall_numa_cpumask[nid], cpu_cpu_mask(cpu))) + cpumask_and(&pfi->related_cpus, &pfi->related_cpus, &xcall_numa_cpumask[nid]); + pfi->cpu = get_nth_cpu_in_cpumask(&pfi->related_cpus, fd % cpumask_weight(&pfi->related_cpus)); + + raw_spin_lock(&xcall_table_lock); + hash_add(xcall_item_table, &pfi->node, hash_64((u64)tfile, PREFETCH_ITEM_HASH_BITS)); + raw_spin_unlock(&xcall_table_lock); + + return pfi; +} + +static void ep_prefetch_item_enqueue(struct eventpoll *ep, struct epitem *epi) +{ + struct prefetch_item *pfi = find_prefetch_item(epi->ffd.file); + int t_cpu; + + if (!pfi) { + pfi = alloc_prefetch_item(epi); + if (pfi == NULL) + return; + } + + if (!pfi->cache || !(epi->event.events & EPOLLIN) || + pfi->state != EPOLL_FILE_CACHE_NONE) + return; + + if (pfi->cpu == smp_processor_id()) { + t_cpu = cpumask_next(pfi->cpu, &pfi->related_cpus); + if (t_cpu > cpumask_last(&pfi->related_cpus)) + t_cpu = cpumask_first(&pfi->related_cpus); + } else + t_cpu = pfi->cpu; + + spin_lock(&pfi->pfi_lock); + pfi->state = EPOLL_FILE_CACHE_QUEUED; + queue_work_on(t_cpu, rc_work, &pfi->work); + spin_unlock(&pfi->pfi_lock); +} +#endif + /* * This is the callback that is passed to the wait queue wakeup * mechanism. It is called by the stored file descriptors when they @@ -1751,6 +1945,12 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head if (!revents) continue; +#ifdef CONFIG_FAST_SYSCALL + if (current->xcall_select && + test_bit(__NR_epoll_pwait, current->xcall_select)) + ep_prefetch_item_enqueue(ep, epi); +#endif + if (__put_user(revents, &uevent->events) || __put_user(epi->event.data, &uevent->data)) { list_add(&epi->rdllink, head); @@ -2383,6 +2583,26 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, size_t, sigsetsize) { int error; + /* + * If the caller wants a certain signal mask to be set during the wait, + * we apply it here. + */ + error = set_user_sigmask(sigmask, sigsetsize); + if (error) + return error; + + error = do_epoll_wait(epfd, events, maxevents, timeout); + restore_saved_sigmask_unless(error == -EINTR); + + return error; +} + +#ifdef CONFIG_FAST_SYSCALL +XCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, + int, maxevents, int, timeout, const sigset_t __user *, sigmask, + size_t, sigsetsize) +{ + int error; /* * If the caller wants a certain signal mask to be set during the wait, @@ -2397,6 +2617,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, return error; } +#endif #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd, @@ -2454,6 +2675,14 @@ static int __init eventpoll_init(void) pwq_cache = kmem_cache_create("eventpoll_pwq", sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); +#ifdef CONFIG_FAST_SYSCALL + rc_work = alloc_workqueue("eventpoll_rc", 0, 0); + if (!rc_work) + return -ENOMEM; + + hash_init(xcall_item_table); +#endif + return 0; } fs_initcall(eventpoll_init); diff --git a/fs/open.c b/fs/open.c index 96de0d3f1a8b..46308348a774 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1287,6 +1287,10 @@ int filp_close(struct file *filp, fl_owner_t id) return 0; } +#ifdef CONFIG_FAST_SYSCALL + free_prefetch_item(filp); +#endif + if (filp->f_op->flush) retval = filp->f_op->flush(filp, id); diff --git a/fs/read_write.c b/fs/read_write.c index da03b3e65cf3..81ca30ff069c 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -617,13 +617,103 @@ static inline loff_t *file_ppos(struct file *file) return file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; } +#ifdef CONFIG_FAST_SYSCALL +DEFINE_PER_CPU_ALIGNED(unsigned long, xcall_cache_hit); +EXPORT_PER_CPU_SYMBOL(xcall_cache_hit); + +DEFINE_PER_CPU_ALIGNED(unsigned long, xcall_cache_miss); +EXPORT_PER_CPU_SYMBOL(xcall_cache_miss); + +DEFINE_PER_CPU_ALIGNED(unsigned long, xcall_cache_wait); +EXPORT_PER_CPU_SYMBOL(xcall_cache_wait); + +static int xcall_read(struct prefetch_item *pfi, struct fd *f, unsigned int fd, + char __user *buf, size_t count) +{ + ssize_t copy_ret = -1; + ssize_t copy_len; + + if (!spin_trylock(&pfi->pfi_lock)) { + this_cpu_inc(xcall_cache_wait); + spin_lock(&pfi->pfi_lock); + } + + copy_len = pfi->len; + if (pfi->state != EPOLL_FILE_CACHE_READY || copy_len < 0) + goto reset_pfi; + + if (copy_len == 0) { + copy_ret = 0; + goto hit_return; + } + + if (copy_len >= count) + copy_len = count; + + copy_ret = copy_to_user(buf, (void *)(pfi->cache + pfi->pos), copy_len); + pfi->len -= copy_len; + if (pfi->len <= 0) { + pfi->len = 0; + pfi->state = EPOLL_FILE_CACHE_NONE; + } + + pfi->pos += copy_len; + if (pfi->pos >= (max_fd_cache_pages * PAGE_SIZE) || pfi->len == 0) + pfi->pos = 0; + +hit_return: + this_cpu_inc(xcall_cache_hit); + fdput_pos(*f); + spin_unlock(&pfi->pfi_lock); + + /* + * 1. copy_len = 0. + * 2. copy_len > 0 && copy_to_user() works fine. + */ + if (copy_ret == 0) + return copy_len; + else + return -EBADF; + +reset_pfi: + /* Always reset cache state to none */ + pfi->len = 0; + pfi->state = EPOLL_FILE_CACHE_NONE; + this_cpu_inc(xcall_cache_miss); + cancel_work(&pfi->work); + spin_unlock(&pfi->pfi_lock); + + return -EAGAIN; +} +#endif + ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) { struct fd f = fdget_pos(fd); ssize_t ret = -EBADF; + loff_t pos, *ppos; +#ifdef CONFIG_FAST_SYSCALL + struct prefetch_item *pfi; + + if (!current->xcall_select || + !test_bit(__NR_epoll_pwait, current->xcall_select)) + goto vfs_read; + + if (!f.file) + goto vfs_read; + + pfi = find_prefetch_item(f.file); + if (!pfi || !pfi->cache) + goto vfs_read; + + ret = xcall_read(pfi, &f, fd, buf, count); + if (ret != -EAGAIN) + return ret; +vfs_read: +#endif if (f.file) { - loff_t pos, *ppos = file_ppos(f.file); + ppos = file_ppos(f.file); if (ppos) { pos = *ppos; ppos = &pos; diff --git a/include/linux/fs.h b/include/linux/fs.h index a0ea6b64c45d..097b27291044 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -947,6 +947,28 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index) index < ra->start + ra->size); } +#define EPOLL_FILE_CACHE_NONE 0 +#define EPOLL_FILE_CACHE_QUEUED 1 +#define EPOLL_FILE_CACHE_READY 2 + +struct prefetch_item { + struct file *f; + int fd; + struct work_struct work; + int cpu; + cpumask_t related_cpus; + char *cache; + ssize_t len; + /* cache state in epoll_wait */ + int state; + spinlock_t pfi_lock; + loff_t pos; + struct hlist_node node; +}; + +#define MAX_FD_CACHE 1024 +extern int max_fd_cache_pages; + struct file { union { struct llist_node fu_llist; @@ -3750,4 +3772,13 @@ static inline bool cachefiles_ondemand_is_enabled(void) } #endif +#ifdef CONFIG_FAST_SYSCALL +DECLARE_PER_CPU_ALIGNED(unsigned long, xcall_cache_hit); +DECLARE_PER_CPU_ALIGNED(unsigned long, xcall_cache_miss); +DECLARE_PER_CPU_ALIGNED(unsigned long, xcall_cache_wait); + +struct prefetch_item *find_prefetch_item(struct file *file); +void free_prefetch_item(struct file *file); +#endif + #endif /* _LINUX_FS_H */ diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0e379bcd8194..2527c32adad1 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -236,6 +236,13 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) #define XCALL_DEFINEx(x, sname, ...) \ __XCALL_DEFINEx(x, sname, __VA_ARGS__) + +extern unsigned long *xcall_numa_cpumask_bits0; +extern unsigned long *xcall_numa_cpumask_bits1; +extern unsigned long *xcall_numa_cpumask_bits2; +extern unsigned long *xcall_numa_cpumask_bits3; +int proc_xcall_numa_cpumask(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos); #endif #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__) diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 9b38861d9ea8..41ed441c3c3a 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -98,6 +98,7 @@ __SYSCALL(__NR_epoll_create1, sys_epoll_create1) __SYSCALL(__NR_epoll_ctl, sys_epoll_ctl) #define __NR_epoll_pwait 22 __SC_COMP(__NR_epoll_pwait, sys_epoll_pwait, compat_sys_epoll_pwait) +__XCALL_SC_COMP(__NR_epoll_pwait, sys_epoll_pwait, compat_sys_epoll_pwait) /* fs/fcntl.c */ #define __NR_dup 23 diff --git a/kernel/sysctl.c b/kernel/sysctl.c index b4b36f8a3149..02b55955b725 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2861,6 +2861,42 @@ static struct ctl_table kern_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = &hundred_thousand, }, +#endif +#ifdef CONFIG_FAST_SYSCALL + { + .procname = "xcall_numa0_cpumask", + .data = &xcall_numa_cpumask_bits0, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_xcall_numa_cpumask, + }, + { + .procname = "xcall_numa1_cpumask", + .data = &xcall_numa_cpumask_bits1, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_xcall_numa_cpumask, + }, + { + .procname = "xcall_numa2_cpumask", + .data = &xcall_numa_cpumask_bits2, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_xcall_numa_cpumask, + }, + { + .procname = "xcall_numa3_cpumask", + .data = &xcall_numa_cpumask_bits3, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_xcall_numa_cpumask, + }, + { .procname = "max_xcall_cache_pages", + .data = &max_fd_cache_pages, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + }, #endif { } }; -- 2.34.1

Add "/proc/xcall/stats" dir, so we can get the xcall prefetch hit ratio on each CPU that initiates a read system call, which is important for performance tuning. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- fs/proc/base.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4c6fdda92fa4..2c05089bfe53 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3592,6 +3592,68 @@ static const struct file_operations proc_pid_sg_level_operations = { #ifdef CONFIG_FAST_SYSCALL bool fast_syscall_enabled(void); +static ssize_t xcall_stats_write(struct file *file, const char __user *buf, + size_t count, loff_t *pos) +{ + int cpu; + + for_each_cpu(cpu, cpu_online_mask) { + *per_cpu_ptr(&xcall_cache_hit, cpu) = 0; + *per_cpu_ptr(&xcall_cache_miss, cpu) = 0; + *per_cpu_ptr(&xcall_cache_wait, cpu) = 0; + } + + return count; +} + +static int xcall_stats_show(struct seq_file *m, void *v) +{ + unsigned long hit = 0, miss = 0, wait = 0; + unsigned int cpu; + u64 percent; + + for_each_cpu(cpu, cpu_online_mask) { + hit = *per_cpu_ptr(&xcall_cache_hit, cpu); + miss = *per_cpu_ptr(&xcall_cache_miss, cpu); + wait = *per_cpu_ptr(&xcall_cache_wait, cpu); + + if (hit == 0 && miss == 0) + continue; + + percent = (hit * 10000ULL) / (hit + miss); + seq_printf(m, "cpu%d epoll cache_{hit,miss,wait}: %ld,%ld,%ld, hit ratio: %3llu.%02llu%%\n", + cpu, hit, miss, wait, percent / 100, percent % 100); + } + return 0; +} + +static int xcall_stats_open(struct inode *inode, struct file *file) +{ + return single_open(file, xcall_stats_show, NULL); +} + +static const struct proc_ops xcall_stats_fops = { + .proc_open = xcall_stats_open, + .proc_read = seq_read, + .proc_write = xcall_stats_write, + .proc_lseek = seq_lseek, + .proc_release = single_release +}; + +static int __init init_xcall_stats_procfs(void) +{ + struct proc_dir_entry *xcall_proc_dir; + + if (!fast_syscall_enabled()) + return 0; + + xcall_proc_dir = proc_mkdir("xcall", NULL); + proc_create("stats", 0444, xcall_proc_dir, &xcall_stats_fops); + return 0; +} + +device_initcall(init_xcall_stats_procfs); + static int xcall_show(struct seq_file *m, void *v) { struct inode *inode = m->private; -- 2.34.1

From: Yipeng Zou <zouyipeng@huawei.com> Add tracepoint. Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- fs/eventpoll.c | 3 ++ fs/read_write.c | 3 ++ include/trace/events/fs.h | 91 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 97 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index be34d94d26bd..8378250cbb4b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -38,6 +38,7 @@ #include <linux/compat.h> #include <linux/rculist.h> #include <net/busy_poll.h> +#include <trace/events/fs.h> /* * LOCKING: @@ -1255,6 +1256,7 @@ static void do_prefetch_item(struct prefetch_item *pfi) pfi->len = kernel_read(pfi->f, pfi->cache, max_fd_cache_pages * PAGE_SIZE, &pfi->f->f_pos); pfi->state = EPOLL_FILE_CACHE_READY; + trace_epoll_rc_ready(pfi->fd, pfi->len); } struct cpumask xcall_numa_cpumask[4] __read_mostly; @@ -1380,6 +1382,7 @@ static void ep_prefetch_item_enqueue(struct eventpoll *ep, struct epitem *epi) spin_lock(&pfi->pfi_lock); pfi->state = EPOLL_FILE_CACHE_QUEUED; + trace_epoll_rc_queue(epi->ffd.fd, t_cpu); queue_work_on(t_cpu, rc_work, &pfi->work); spin_unlock(&pfi->pfi_lock); } diff --git a/fs/read_write.c b/fs/read_write.c index 81ca30ff069c..55e3f4b0ad3e 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -635,6 +635,7 @@ static int xcall_read(struct prefetch_item *pfi, struct fd *f, unsigned int fd, if (!spin_trylock(&pfi->pfi_lock)) { this_cpu_inc(xcall_cache_wait); + trace_epoll_rc_wait(fd); spin_lock(&pfi->pfi_lock); } @@ -663,6 +664,7 @@ static int xcall_read(struct prefetch_item *pfi, struct fd *f, unsigned int fd, hit_return: this_cpu_inc(xcall_cache_hit); + trace_epoll_rc_hit(fd, copy_len); fdput_pos(*f); spin_unlock(&pfi->pfi_lock); @@ -681,6 +683,7 @@ static int xcall_read(struct prefetch_item *pfi, struct fd *f, unsigned int fd, pfi->state = EPOLL_FILE_CACHE_NONE; this_cpu_inc(xcall_cache_miss); cancel_work(&pfi->work); + trace_epoll_rc_miss(fd); spin_unlock(&pfi->pfi_lock); return -EAGAIN; diff --git a/include/trace/events/fs.h b/include/trace/events/fs.h index ee82dad9d9da..d5bfd22647dd 100644 --- a/include/trace/events/fs.h +++ b/include/trace/events/fs.h @@ -29,5 +29,96 @@ DECLARE_TRACE(fs_file_release, #endif /* _TRACE_FS_H */ +TRACE_EVENT(epoll_rc_ready, + + TP_PROTO(int fd, int len), + + TP_ARGS(fd, len), + + TP_STRUCT__entry( + __field(int, fd) + __field(int, len) + ), + + TP_fast_assign( + __entry->fd = fd; + __entry->len = len; + ), + + TP_printk("%d, len %d", __entry->fd, __entry->len) +); + +TRACE_EVENT(epoll_rc_queue, + + TP_PROTO(int fd, int cpu), + + TP_ARGS(fd, cpu), + + TP_STRUCT__entry( + __field(int, fd) + __field(int, cpu) + ), + + TP_fast_assign( + __entry->fd = fd; + __entry->cpu = cpu; + ), + + TP_printk("%d on cpu %d", __entry->fd, __entry->cpu) +); + +TRACE_EVENT(epoll_rc_hit, + + TP_PROTO(int fd, int len), + + TP_ARGS(fd, len), + + TP_STRUCT__entry( + __field(int, fd) + __field(int, len) + ), + + TP_fast_assign( + __entry->fd = fd; + __entry->len = len; + ), + + TP_printk("%d, len: %d", __entry->fd, __entry->len) +); + +TRACE_EVENT(epoll_rc_miss, + + TP_PROTO(int fd), + + TP_ARGS(fd), + + TP_STRUCT__entry( + __field(int, fd) + ), + + TP_fast_assign( + __entry->fd = fd; + ), + + TP_printk("%d", __entry->fd) +); + +TRACE_EVENT(epoll_rc_wait, + + TP_PROTO(int fd), + + TP_ARGS(fd), + + TP_STRUCT__entry( + __field(int, fd) + ), + + TP_fast_assign( + __entry->fd = fd; + ), + + TP_printk("%d", __entry->fd) +); + /* This part must be outside protection */ #include <trace/define_trace.h> -- 2.34.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/16228 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BMR... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/16228 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/BMR...
participants (2)
-
Jinjie Ruan
-
patchwork bot