Currently, x86, Riscv, Loongarch use the Generic Entry which makes maintainers' work easier and codes more elegant. arm64 has already successfully switched to the Generic IRQ Entry in commit b3cf07851b6c ("arm64: entry: Switch to generic IRQ entry"), it is time to completely convert arm64 to Generic Entry. The goal is to bring arm64 in line with other architectures that already use the generic entry infrastructure, reducing duplicated code and making it easier to share future changes in entry/exit paths, such as "Syscall User Dispatch". This patch set is rebased on "sched/core". And the performance was measured on Kunpeng 920 using "perf bench basic syscall" with "arm64.nopauth selinux=0 audit=1". On the basis of optimizing syscall_get_arguments()[1], el0_svc_common(), the performance are below: | Metric | W/O Generic Entry | With Generic Entry opt| Change | | ---------- | ----------------- | ------------------ | ------ | | Total time | 2.487 [sec] | 2.264 [sec] | ↓9.0% | | usecs/op | 0.248780 | 0.226481 | ↓9.0% | | ops/sec | 4,019,620 | 4,415,383 | ↑9.8% | Therefore, after the optimization, ARM64 System Call performance improved by approximately 9%. It was tested ok with following test cases on kunpeng920 and QEMU virt platform: - Perf tests. - Different `dynamic preempt` mode switch. - Pseudo NMI tests. - Stress-ng CPU stress test. - Hackbench stress test. - MTE test case in Documentation/arch/arm64/memory-tagging-extension.rst and all test cases in tools/testing/selftests/arm64/mte/*. - "sud" selftest testcase. - get_set_sud, get_syscall_info, set_syscall_info, peeksiginfo in tools/testing/selftests/ptrace. - breakpoint_test_arm64 in selftests/breakpoints. - syscall-abi and ptrace in tools/testing/selftests/arm64/abi - fp-ptrace, sve-ptrace, za-ptrace in selftests/arm64/fp. - vdso_test_getrandom in tools/testing/selftests/vDSO - Strace tests. The test QEMU configuration is as follows: qemu-system-aarch64 \ -M virt,gic-version=3,virtualization=on,mte=on \ -cpu max,pauth-impdef=on \ -kernel Image \ -smp 8,sockets=1,cores=4,threads=2 \ -m 512m \ -nographic \ -no-reboot \ -device virtio-rng-pci \ -append "root=/dev/vda rw console=ttyAMA0 kgdboc=ttyAMA0,115200 \ earlycon preempt=voluntary irqchip.gicv3_pseudo_nmi=1" \ -drive if=none,file=images/rootfs.ext4,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 \ [1]: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm/+/89bf683c9... Changes in v12: - Rebased on "sched/core", so remove the four generic entry patches. - Move "Expand secure_computing() in place" and "Use syscall_get_arguments() helper" patch forward, which will group all non-functional cleanups at the front. - Adjust the explanation for moving rseq_syscall() before audit_syscall_exit(). - Link to v11: https://lore.kernel.org/all/20260128031934.3906955-1-ruanjinjie@huawei.com/ Changes in v11: - Remove unused syscall in syscall_trace_enter(). - Update and provide a detailed explanation of the differences after moving rseq_syscall() before audit_syscall_exit(). - Rebased on arm64 (for-next/entry), and remove the first applied 3 patchs. - syscall_exit_to_user_mode_work() for arch reuse instead of adding new syscall_exit_to_user_mode_work_prepare() helper. - Link to v10: https://lore.kernel.org/all/20251222114737.1334364-1-ruanjinjie@huawei.com/ Changes in v10: - Rebased on v6.19-rc1, rename syscall_exit_to_user_mode_prepare() to syscall_exit_to_user_mode_work_prepare() to avoid conflict. - Also inline syscall_trace_enter(). - Support aarch64 for sud_benchmark. - Update and correct the commit message. - Add Reviewed-by. - Link to v9: https://lore.kernel.org/all/20251204082123.2792067-1-ruanjinjie@huawei.com/ Changes in v9: - Move "Return early for ptrace_report_syscall_entry() error" patch ahead to make it not introduce a regression. - Not check _TIF_SECCOMP/SYSCALL_EMU for syscall_exit_work() in a separate patch. - Do not report_syscall_exit() for PTRACE_SYSEMU_SINGLESTEP in a separate patch. - Add two performance patch to improve the arm64 performance. - Add Reviewed-by. - Link to v8: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ Changes in v8: - Rename "report_syscall_enter()" to "report_syscall_entry()". - Add ptrace_save_reg() to avoid duplication. - Remove unused _TIF_WORK_MASK in a standalone patch. - Align syscall_trace_enter() return value with the generic version. - Use "scno" instead of regs->syscallno in el0_svc_common(). - Move rseq_syscall() ahead in a standalone patch to clarify it clearly. - Rename "syscall_trace_exit()" to "syscall_exit_work()". - Keep the goto in el0_svc_common(). - No argument was passed to __secure_computing() and check -1 not -1L. - Remove "Add has_syscall_work() helper" patch. - Move "Add syscall_exit_to_user_mode_prepare() helper" patch later. - Add miss header for asm/entry-common.h. - Update the implementation of arch_syscall_is_vdso_sigreturn(). - Add "ARCH_SYSCALL_WORK_EXIT" to be defined as "SECCOMP | SYSCALL_EMU" to keep the behaviour unchanged. - Add more testcases test. - Add Reviewed-by. - Update the commit message. - Link to v7: https://lore.kernel.org/all/20251117133048.53182-1-ruanjinjie@huawei.com/ Jinjie Ruan (12): arm64: Remove unused _TIF_WORK_MASK arm64/ptrace: Split report_syscall() arm64/ptrace: Return early for ptrace_report_syscall_entry() error arm64/ptrace: Refactor syscall_trace_enter/exit() arm64/ptrace: Expand secure_computing() in place arm64/ptrace: Use syscall_get_arguments() helper arm64: ptrace: Move rseq_syscall() before audit_syscall_exit() arm64: syscall: Rework el0_svc_common() arm64/ptrace: Not check _TIF_SECCOMP/SYSCALL_EMU for syscall_exit_work() arm64/ptrace: Do not report_syscall_exit() for PTRACE_SYSEMU_SINGLESTEP arm64: entry: Convert to generic entry arm64: Inline el0_svc_common() kemal (1): selftests: sud_test: Support aarch64 arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/entry-common.h | 76 +++++++++++++++ arch/arm64/include/asm/syscall.h | 19 +++- arch/arm64/include/asm/thread_info.h | 22 +---- arch/arm64/kernel/debug-monitors.c | 7 ++ arch/arm64/kernel/ptrace.c | 94 ------------------- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 29 ++---- .../syscall_user_dispatch/sud_benchmark.c | 2 +- .../syscall_user_dispatch/sud_test.c | 4 + 10 files changed, 115 insertions(+), 142 deletions(-) -- 2.34.1
Since commit b3cf07851b6c ("arm64: entry: Switch to generic IRQ entry"), _TIF_WORK_MASK is never used, so remove it. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/thread_info.h | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index a803b887b0b4..24fcd6adaa33 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -106,12 +106,6 @@ void arch_setup_new_exec(void); #define _TIF_NOTIFY_SIGNAL (1 << TIF_NOTIFY_SIGNAL) #define _TIF_TSC_SIGSEGV (1 << TIF_TSC_SIGSEGV) -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | \ - _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \ - _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \ - _TIF_NOTIFY_SIGNAL | _TIF_SIGPENDING | \ - _TIF_PATCH_PENDING) - #define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ _TIF_SYSCALL_EMU) -- 2.34.1
The generic syscall entry code has the form: | syscall_trace_enter() | { | ptrace_report_syscall_entry() | } | | syscall_exit_work() | { | ptrace_report_syscall_exit() | } In preparation for moving arm64 over to the generic entry code, split report_syscall() to two separate enter and exit functions to align the structure of the arm64 code with syscall_trace_enter() and syscall_exit_work() from the generic entry code. No functional changes. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kernel/ptrace.c | 41 +++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index b9bdd83fbbca..03fe2f8a4d54 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2343,9 +2343,10 @@ enum ptrace_syscall_dir { PTRACE_SYSCALL_EXIT, }; -static void report_syscall(struct pt_regs *regs, enum ptrace_syscall_dir dir) +static __always_inline unsigned long ptrace_save_reg(struct pt_regs *regs, + enum ptrace_syscall_dir dir, + int *regno) { - int regno; unsigned long saved_reg; /* @@ -2364,15 +2365,31 @@ static void report_syscall(struct pt_regs *regs, enum ptrace_syscall_dir dir) * - Syscall stops behave differently to seccomp and pseudo-step traps * (the latter do not nobble any registers). */ - regno = (is_compat_task() ? 12 : 7); - saved_reg = regs->regs[regno]; - regs->regs[regno] = dir; + *regno = (is_compat_task() ? 12 : 7); + saved_reg = regs->regs[*regno]; + regs->regs[*regno] = dir; - if (dir == PTRACE_SYSCALL_ENTER) { - if (ptrace_report_syscall_entry(regs)) - forget_syscall(regs); - regs->regs[regno] = saved_reg; - } else if (!test_thread_flag(TIF_SINGLESTEP)) { + return saved_reg; +} + +static void report_syscall_entry(struct pt_regs *regs) +{ + unsigned long saved_reg; + int regno; + + saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_ENTER, ®no); + if (ptrace_report_syscall_entry(regs)) + forget_syscall(regs); + regs->regs[regno] = saved_reg; +} + +static void report_syscall_exit(struct pt_regs *regs) +{ + unsigned long saved_reg; + int regno; + + saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_EXIT, ®no); + if (!test_thread_flag(TIF_SINGLESTEP)) { ptrace_report_syscall_exit(regs, 0); regs->regs[regno] = saved_reg; } else { @@ -2392,7 +2409,7 @@ int syscall_trace_enter(struct pt_regs *regs) unsigned long flags = read_thread_flags(); if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - report_syscall(regs, PTRACE_SYSCALL_ENTER); + report_syscall_entry(regs); if (flags & _TIF_SYSCALL_EMU) return NO_SYSCALL; } @@ -2420,7 +2437,7 @@ void syscall_trace_exit(struct pt_regs *regs) trace_sys_exit(regs, syscall_get_return_value(current, regs)); if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) - report_syscall(regs, PTRACE_SYSCALL_EXIT); + report_syscall_exit(regs); rseq_syscall(regs); } -- 2.34.1
The generic entry abort the syscall_trace_enter() sequence if ptrace_report_syscall_entry() errors out, but arm64 not. When ptrace requests interception, it should prevent all subsequent system-call processing, including audit and seccomp. In preparation for moving arm64 over to the generic entry code, return early if ptrace_report_syscall_entry() encounters an error. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kernel/ptrace.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 03fe2f8a4d54..f333791ffba6 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2372,15 +2372,18 @@ static __always_inline unsigned long ptrace_save_reg(struct pt_regs *regs, return saved_reg; } -static void report_syscall_entry(struct pt_regs *regs) +static int report_syscall_entry(struct pt_regs *regs) { unsigned long saved_reg; - int regno; + int regno, ret; saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_ENTER, ®no); - if (ptrace_report_syscall_entry(regs)) + ret = ptrace_report_syscall_entry(regs); + if (ret) forget_syscall(regs); regs->regs[regno] = saved_reg; + + return ret; } static void report_syscall_exit(struct pt_regs *regs) @@ -2407,10 +2410,11 @@ static void report_syscall_exit(struct pt_regs *regs) int syscall_trace_enter(struct pt_regs *regs) { unsigned long flags = read_thread_flags(); + int ret; if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - report_syscall_entry(regs); - if (flags & _TIF_SYSCALL_EMU) + ret = report_syscall_entry(regs); + if (ret || (flags & _TIF_SYSCALL_EMU)) return NO_SYSCALL; } -- 2.34.1
The generic syscall entry code has the following form, which use the input syscall work flag: | syscall_trace_enter(struct pt_regs *regs, unsigned long work) | | syscall_exit_work(struct pt_regs *regs, unsigned long work) In preparation for moving arm64 over to the generic entry code, refactor syscall_trace_enter/exit() to also pass thread flags, and get syscall number by syscall_get_nr() helper. No functional changes. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/syscall.h | 4 ++-- arch/arm64/kernel/ptrace.c | 26 +++++++++++++++++--------- arch/arm64/kernel/syscall.c | 5 +++-- 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index 712daa90e643..7feb3cda6530 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -114,7 +114,7 @@ static inline int syscall_get_arch(struct task_struct *task) return AUDIT_ARCH_AARCH64; } -int syscall_trace_enter(struct pt_regs *regs); -void syscall_trace_exit(struct pt_regs *regs); +int syscall_trace_enter(struct pt_regs *regs, unsigned long flags); +void syscall_trace_exit(struct pt_regs *regs, unsigned long flags); #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index f333791ffba6..9f9aa3087c09 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2407,9 +2407,9 @@ static void report_syscall_exit(struct pt_regs *regs) } } -int syscall_trace_enter(struct pt_regs *regs) +int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) { - unsigned long flags = read_thread_flags(); + long syscall; int ret; if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { @@ -2422,19 +2422,27 @@ int syscall_trace_enter(struct pt_regs *regs) if (secure_computing() == -1) return NO_SYSCALL; - if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) - trace_sys_enter(regs, regs->syscallno); + /* Either of the above might have changed the syscall number */ + syscall = syscall_get_nr(current, regs); - audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1], + if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) { + trace_sys_enter(regs, syscall); + + /* + * Probes or BPF hooks in the tracepoint may have changed the + * system call number as well. + */ + syscall = syscall_get_nr(current, regs); + } + + audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1], regs->regs[2], regs->regs[3]); - return regs->syscallno; + return syscall; } -void syscall_trace_exit(struct pt_regs *regs) +void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) { - unsigned long flags = read_thread_flags(); - audit_syscall_exit(regs); if (flags & _TIF_SYSCALL_TRACEPOINT) diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index c062badd1a56..e8fd0d60ab09 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -124,7 +124,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, */ if (scno == NO_SYSCALL) syscall_set_return_value(current, regs, -ENOSYS, 0); - scno = syscall_trace_enter(regs); + scno = syscall_trace_enter(regs, flags); if (scno == NO_SYSCALL) goto trace_exit; } @@ -143,7 +143,8 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, } trace_exit: - syscall_trace_exit(regs); + flags = read_thread_flags(); + syscall_trace_exit(regs, flags); } void do_el0_svc(struct pt_regs *regs) -- 2.34.1
The generic entry expand secure_computing() in place and call __secure_computing() directly. In order to switch to the generic entry for arm64, refactor secure_computing() for syscall_trace_enter(). No functional changes. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/ptrace.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 9f9aa3087c09..5a7e97e94358 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2419,8 +2419,11 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) } /* Do the secure computing after ptrace; failures should be fast. */ - if (secure_computing() == -1) - return NO_SYSCALL; + if (flags & _TIF_SECCOMP) { + ret = __secure_computing(); + if (ret == -1) + return NO_SYSCALL; + } /* Either of the above might have changed the syscall number */ syscall = syscall_get_nr(current, regs); @@ -2438,7 +2441,7 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1], regs->regs[2], regs->regs[3]); - return syscall; + return ret ? : syscall; } void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) -- 2.34.1
The generic entry check audit context first and use syscall_get_arguments() helper. In order to switch to the generic entry for arm64, - Also use syscall_get_arguments() to get audit_syscall_entry()'s last four parameters. - Extract the syscall_enter_audit() helper to make it clear. - Check audit context first, which saves an unnecessary memcpy when current process's audit_context is NULL. Overall these changes make syscall_enter_audit() exactly equivalent to the generic one. No functional changes. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/ptrace.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 5a7e97e94358..e3dcadf13e99 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2407,6 +2407,16 @@ static void report_syscall_exit(struct pt_regs *regs) } } +static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) +{ + if (unlikely(audit_context())) { + unsigned long args[6]; + + syscall_get_arguments(current, regs, args); + audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); + } +} + int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) { long syscall; @@ -2438,8 +2448,7 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) syscall = syscall_get_nr(current, regs); } - audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1], - regs->regs[2], regs->regs[3]); + syscall_enter_audit(regs, syscall); return ret ? : syscall; } -- 2.34.1
commit a9f3a74a29af ("entry: Provide generic syscall exit function") introduce generic syscall exit function and call rseq_syscall() before audit_syscall_exit() and arch_syscall_exit_tracehook(). And commit b74406f37737 ("arm: Add syscall detection for restartable sequences") add rseq support for arm32, which also call rseq_syscall() before audit_syscall_exit() and tracehook_report_syscall(). However, commit 409d5db49867c ("arm64: rseq: Implement backend rseq calls and select HAVE_RSEQ") implement arm64 rseq and call rseq_syscall() after audit_syscall_exit() and tracehook_report_syscall(). So compared to the generic entry and arm32 code, arm64 calls rseq_syscall() a bit later. But as commit b74406f37737 ("arm: Add syscall detection for restartable sequences") said, syscalls are not allowed inside restartable sequences, so should call rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. To align the order of the calls with generic entry, move rseq_syscall() ahead before audit_syscall_exit(). As for the impact of raising SIGSEGV via rseq_syscall(), it makes no practical difference to signal delivery because signals are processed in arm64_exit_to_user_mode() at the very end. As for the "regs", rseq_syscall() only checks and update instruction_pointer(regs) when CONFIG_DEBUG_RSEQ=y, audit_syscall_exit() only checks the return value (x0 for arm64), so calling rseq_syscall() before or after audit syscall exit makes no difference. trace_sys_exit() only uses syscallno and the return value, so calling rseq_syscall() before or after trace_sys_exit() makes no difference. But ptrace can also modify the "pc" on syscall exit path. So when CONFIG_DEBUG_RSEQ=y, before this patch, rseq could observe the PC modified by ptrace on the syscall exit path; after this patch, rseq no longer sees modifications made by ptrace. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/ptrace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index e3dcadf13e99..50344a2d7b88 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2455,6 +2455,8 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) { + rseq_syscall(regs); + audit_syscall_exit(regs); if (flags & _TIF_SYSCALL_TRACEPOINT) @@ -2462,8 +2464,6 @@ void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) report_syscall_exit(regs); - - rseq_syscall(regs); } /* -- 2.34.1
The generic syscall syscall_exit_work() has the following content: | audit_syscall_exit(regs) | trace_sys_exit(regs, ...) | ptrace_report_syscall_exit(regs, step) The generic syscall syscall_exit_to_user_mode_work() has the following form: | unsigned long work = READ_ONCE(current_thread_info()->syscall_work) | rseq_syscall() | if (unlikely(work & SYSCALL_WORK_EXIT)) | syscall_exit_work(regs, work) In preparation for moving arm64 over to the generic entry code, rework el0_svc_common() as below: - Rename syscall_trace_exit() to syscall_exit_work(). - Add syscall_exit_to_user_mode_work() function to replace the combination of read_thread_flags() and syscall_exit_work(), also move the syscall exit check logic into it. Move has_syscall_work() helper into asm/syscall.h for reuse. - As currently rseq_syscall() is always called and itself is controlled by the CONFIG_DEBUG_RSEQ macro, so the CONFIG_DEBUG_RSEQ check is removed. Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/syscall.h | 7 ++++++- arch/arm64/kernel/ptrace.c | 14 +++++++++++--- arch/arm64/kernel/syscall.c | 20 +------------------- 3 files changed, 18 insertions(+), 23 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index 7feb3cda6530..bee5208c0006 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -114,7 +114,12 @@ static inline int syscall_get_arch(struct task_struct *task) return AUDIT_ARCH_AARCH64; } +static inline bool has_syscall_work(unsigned long flags) +{ + return unlikely(flags & _TIF_SYSCALL_WORK); +} + int syscall_trace_enter(struct pt_regs *regs, unsigned long flags); -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags); +void syscall_exit_to_user_mode_work(struct pt_regs *regs); #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 50344a2d7b88..499ee0a5b9cd 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2453,10 +2453,8 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) return ret ? : syscall; } -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) +static void syscall_exit_work(struct pt_regs *regs, unsigned long flags) { - rseq_syscall(regs); - audit_syscall_exit(regs); if (flags & _TIF_SYSCALL_TRACEPOINT) @@ -2466,6 +2464,16 @@ void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) report_syscall_exit(regs); } +void syscall_exit_to_user_mode_work(struct pt_regs *regs) +{ + unsigned long flags = read_thread_flags(); + + rseq_syscall(regs); + + if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP) + syscall_exit_work(regs, flags); +} + /* * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a. * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which is diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index e8fd0d60ab09..66d4da641d97 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -65,11 +65,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, choose_random_kstack_offset(get_random_u16()); } -static inline bool has_syscall_work(unsigned long flags) -{ - return unlikely(flags & _TIF_SYSCALL_WORK); -} - static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { @@ -130,21 +125,8 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, } invoke_syscall(regs, scno, sc_nr, syscall_table); - - /* - * The tracing status may have changed under our feet, so we have to - * check again. However, if we were tracing entry, then we always trace - * exit regardless, as the old entry assembly did. - */ - if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { - flags = read_thread_flags(); - if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) - return; - } - trace_exit: - flags = read_thread_flags(); - syscall_trace_exit(regs, flags); + syscall_exit_to_user_mode_work(regs); } void do_el0_svc(struct pt_regs *regs) -- 2.34.1
As syscall_exit_work() do not handle seccomp, so not check _TIF_SECCOMP for syscall_exit_work(). And as the man manual of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP said, "For PTRACE_SYSEMU, continue and stop on entry to the next system call, which will not be executed. For PTRACE_SYSEMU_SINGLESTEP, do the same but also singlestep if not a system call.". So only the syscall entry need to be reported for SYSCALL_EMU, so not check _TIF_SYSCALL_EMU for syscall_exit_work(). After this, audit_syscall_exit() and report_syscall_exit() will no longer be called if only SECCOMP and/or SYSCALL_EMU is set. And remove has_syscall_work() by the way as currently it is only used in el0_svc_common(). This is another preparation for moving arm64 over to the generic entry code. Link:https://man7.org/linux/man-pages/man2/ptrace.2.html Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/syscall.h | 5 ----- arch/arm64/include/asm/thread_info.h | 3 +++ arch/arm64/kernel/ptrace.c | 2 +- arch/arm64/kernel/syscall.c | 2 +- 4 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index bee5208c0006..85ed31dfe293 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -114,11 +114,6 @@ static inline int syscall_get_arch(struct task_struct *task) return AUDIT_ARCH_AARCH64; } -static inline bool has_syscall_work(unsigned long flags) -{ - return unlikely(flags & _TIF_SYSCALL_WORK); -} - int syscall_trace_enter(struct pt_regs *regs, unsigned long flags); void syscall_exit_to_user_mode_work(struct pt_regs *regs); diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 24fcd6adaa33..ef1462b9b00b 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -110,6 +110,9 @@ void arch_setup_new_exec(void); _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ _TIF_SYSCALL_EMU) +#define _TIF_SYSCALL_EXIT_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ + _TIF_SYSCALL_TRACEPOINT) + #ifdef CONFIG_SHADOW_CALL_STACK #define INIT_SCS \ .scs_base = init_shadow_call_stack, \ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 499ee0a5b9cd..e68aa386ea48 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2470,7 +2470,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs) rseq_syscall(regs); - if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP) + if (unlikely(flags & _TIF_SYSCALL_EXIT_WORK) || flags & _TIF_SINGLESTEP) syscall_exit_work(regs, flags); } diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 66d4da641d97..ec478fc37a9f 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -101,7 +101,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, return; } - if (has_syscall_work(flags)) { + if (unlikely(flags & _TIF_SYSCALL_WORK)) { /* * The de-facto standard way to skip a system call using ptrace * is to set the system call to -1 (NO_SYSCALL) and set x0 to a -- 2.34.1
The generic report_single_step() always returns false if SYSCALL_EMU is set, but arm64 only checks _TIF_SINGLESTEP and does not check _TIF_SYSCALL_EMU, which means that if both _TIF_SINGLESTEP and _TIF_SYSCALL_EMU are set, the generic entry will not report a single-step, whereas arm64 will do it. As the man manual of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP said, "For PTRACE_SYSEMU, continue and stop on entry to the next system call, which will not be executed. For PTRACE_SYSEMU_SINGLESTEP, do the same but also singlestep if not a system call.". And as the generic entry report_single_step() comment said, If SYSCALL_EMU is set, then the only reason to report is when SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). Because this syscall instruction has been already reported in syscall_trace_enter(), there is no need to report the syscall again in syscall_exit_work(). In preparation for moving arm64 over to the generic entry code, - Add report_single_step() helper for arm64 to make it clear. - Do not report_syscall_exit() if both _TIF_SYSCALL_EMU and _TIF_SINGLESTEP set. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/ptrace.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index e68aa386ea48..6e86aec8d607 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2453,14 +2453,25 @@ int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) return ret ? : syscall; } +static inline bool report_single_step(unsigned long flags) +{ + if (flags & _TIF_SYSCALL_EMU) + return false; + + return flags & _TIF_SINGLESTEP; +} + static void syscall_exit_work(struct pt_regs *regs, unsigned long flags) { + bool step; + audit_syscall_exit(regs); if (flags & _TIF_SYSCALL_TRACEPOINT) trace_sys_exit(regs, syscall_get_return_value(current, regs)); - if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) + step = report_single_step(flags); + if (step || flags & _TIF_SYSCALL_TRACE) report_syscall_exit(regs); } -- 2.34.1
Currently, x86, Riscv, Loongarch use the generic entry which makes maintainers' work easier and codes more elegant. arm64 has already switched to the generic IRQ entry, so completely convert arm64 to use the generic entry infrastructure from kernel/entry/*. The changes are below: - Remove TIF_SYSCALL_* flag. - Remove _TIF_SYSCALL_WORK/EXIT_WORK as they are equal with SYSCALL_WORK_ENTER/EXIT. - Implement arch_ptrace_report_syscall_entry/exit() with report_syscall_entry/exit() to do arm64-specific save/restore during syscall entry/exit. - Remove arm64 syscall_trace_enter() and related sub-functions including syscall_enter_audit(), by calling generic entry's functions with similar functionality. - Set/clear SYSCALL_EXIT_TRAP flag when enable/disable single_step, So _TIF_SINGLESTEP can be replaced with generic SYSCALL_EXIT_TRAP, _TIF_SYSCALL_EXIT_WORK and _TIF_SINGLESTEP can be replaced with generic SYSCALL_WORK_EXIT, arm64's report_single_step() can be replaced with generic version. - Remove arm64's syscall_exit_to_user_mode_work() and syscall_exit_work() etc. by using generic entry's similar same name functions. - Implement arch_syscall_is_vdso_sigreturn() to support "Syscall User Dispatch". Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/entry-common.h | 76 +++++++++++++ arch/arm64/include/asm/syscall.h | 19 +++- arch/arm64/include/asm/thread_info.h | 19 +--- arch/arm64/kernel/debug-monitors.c | 7 ++ arch/arm64/kernel/ptrace.c | 154 -------------------------- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 6 +- 8 files changed, 107 insertions(+), 178 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 93173f0a09c7..f50b49ce8b65 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -153,9 +153,9 @@ config ARM64 select GENERIC_CPU_DEVICES select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP + select GENERIC_ENTRY select GENERIC_IDLE_POLL_SETUP select GENERIC_IOREMAP - select GENERIC_IRQ_ENTRY select GENERIC_IRQ_IPI select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD select GENERIC_IRQ_PROBE diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h index cab8cd78f693..d8bf4bf342e8 100644 --- a/arch/arm64/include/asm/entry-common.h +++ b/arch/arm64/include/asm/entry-common.h @@ -3,14 +3,21 @@ #ifndef _ASM_ARM64_ENTRY_COMMON_H #define _ASM_ARM64_ENTRY_COMMON_H +#include <linux/ptrace.h> #include <linux/thread_info.h> +#include <asm/compat.h> #include <asm/cpufeature.h> #include <asm/daifflags.h> #include <asm/fpsimd.h> #include <asm/mte.h> #include <asm/stacktrace.h> +enum ptrace_syscall_dir { + PTRACE_SYSCALL_ENTER = 0, + PTRACE_SYSCALL_EXIT, +}; + #define ARCH_EXIT_TO_USER_MODE_WORK (_TIF_MTE_ASYNC_FAULT | _TIF_FOREIGN_FPSTATE) static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs, @@ -54,4 +61,73 @@ static inline bool arch_irqentry_exit_need_resched(void) #define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched +static __always_inline unsigned long ptrace_save_reg(struct pt_regs *regs, + enum ptrace_syscall_dir dir, + int *regno) +{ + unsigned long saved_reg; + + /* + * We have some ABI weirdness here in the way that we handle syscall + * exit stops because we indicate whether or not the stop has been + * signalled from syscall entry or syscall exit by clobbering a general + * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee + * and restoring its old value after the stop. This means that: + * + * - Any writes by the tracer to this register during the stop are + * ignored/discarded. + * + * - The actual value of the register is not available during the stop, + * so the tracer cannot save it and restore it later. + * + * - Syscall stops behave differently to seccomp and pseudo-step traps + * (the latter do not nobble any registers). + */ + *regno = (is_compat_task() ? 12 : 7); + saved_reg = regs->regs[*regno]; + regs->regs[*regno] = dir; + + return saved_reg; +} + +static __always_inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs) +{ + unsigned long saved_reg; + int regno, ret; + + saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_ENTER, ®no); + ret = ptrace_report_syscall_entry(regs); + if (ret) + forget_syscall(regs); + regs->regs[regno] = saved_reg; + + return ret; +} + +#define arch_ptrace_report_syscall_entry arch_ptrace_report_syscall_entry + +static __always_inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs, + int step) +{ + unsigned long saved_reg; + int regno; + + saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_EXIT, ®no); + if (!step) { + ptrace_report_syscall_exit(regs, 0); + regs->regs[regno] = saved_reg; + } else { + regs->regs[regno] = saved_reg; + + /* + * Signal a pseudo-step exception since we are stepping but + * tracer modifications to the registers may have rewound the + * state machine. + */ + ptrace_report_syscall_exit(regs, 1); + } +} + +#define arch_ptrace_report_syscall_exit arch_ptrace_report_syscall_exit + #endif /* _ASM_ARM64_ENTRY_COMMON_H */ diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index 85ed31dfe293..b6021ff19eee 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -9,6 +9,9 @@ #include <linux/compat.h> #include <linux/err.h> +#include <asm/compat.h> +#include <asm/vdso.h> + typedef long (*syscall_fn_t)(const struct pt_regs *regs); extern const syscall_fn_t sys_call_table[]; @@ -114,7 +117,19 @@ static inline int syscall_get_arch(struct task_struct *task) return AUDIT_ARCH_AARCH64; } -int syscall_trace_enter(struct pt_regs *regs, unsigned long flags); -void syscall_exit_to_user_mode_work(struct pt_regs *regs); +static inline bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs) +{ + unsigned long sigtramp; + +#ifdef CONFIG_COMPAT + if (is_compat_task()) { + unsigned long sigpage = (unsigned long)current->mm->context.sigpage; + + return regs->pc >= sigpage && regs->pc < (sigpage + PAGE_SIZE); + } +#endif + sigtramp = (unsigned long)VDSO_SYMBOL(current->mm->context.vdso, sigtramp); + return regs->pc == (sigtramp + 8); +} #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index ef1462b9b00b..90be0c590b86 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -43,6 +43,7 @@ struct thread_info { void *scs_sp; #endif u32 cpu; + unsigned long syscall_work; /* SYSCALL_WORK_ flags */ }; #define thread_saved_pc(tsk) \ @@ -65,11 +66,6 @@ void arch_setup_new_exec(void); #define TIF_UPROBE 5 /* uprobe breakpoint or singlestep */ #define TIF_MTE_ASYNC_FAULT 6 /* MTE Asynchronous Tag Check Fault */ #define TIF_NOTIFY_SIGNAL 7 /* signal notifications exist */ -#define TIF_SYSCALL_TRACE 8 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 9 /* syscall auditing */ -#define TIF_SYSCALL_TRACEPOINT 10 /* syscall tracepoint for ftrace */ -#define TIF_SECCOMP 11 /* syscall secure computing */ -#define TIF_SYSCALL_EMU 12 /* syscall emulation active */ #define TIF_PATCH_PENDING 13 /* pending live patching update */ #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ #define TIF_FREEZE 19 @@ -92,27 +88,14 @@ void arch_setup_new_exec(void); #define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE) -#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) -#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) -#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) -#define _TIF_SECCOMP (1 << TIF_SECCOMP) -#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING) #define _TIF_UPROBE (1 << TIF_UPROBE) -#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) #define _TIF_32BIT (1 << TIF_32BIT) #define _TIF_SVE (1 << TIF_SVE) #define _TIF_MTE_ASYNC_FAULT (1 << TIF_MTE_ASYNC_FAULT) #define _TIF_NOTIFY_SIGNAL (1 << TIF_NOTIFY_SIGNAL) #define _TIF_TSC_SIGSEGV (1 << TIF_TSC_SIGSEGV) -#define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ - _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ - _TIF_SYSCALL_EMU) - -#define _TIF_SYSCALL_EXIT_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ - _TIF_SYSCALL_TRACEPOINT) - #ifdef CONFIG_SHADOW_CALL_STACK #define INIT_SCS \ .scs_base = init_shadow_call_stack, \ diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index 29307642f4c9..e67643a70405 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -385,11 +385,18 @@ void user_enable_single_step(struct task_struct *task) if (!test_and_set_ti_thread_flag(ti, TIF_SINGLESTEP)) set_regs_spsr_ss(task_pt_regs(task)); + + /* + * Ensure that a trap is triggered once stepping out of a system + * call prior to executing any user instruction. + */ + set_task_syscall_work(task, SYSCALL_EXIT_TRAP); } NOKPROBE_SYMBOL(user_enable_single_step); void user_disable_single_step(struct task_struct *task) { clear_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP); + clear_task_syscall_work(task, SYSCALL_EXIT_TRAP); } NOKPROBE_SYMBOL(user_disable_single_step); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 6e86aec8d607..f575b30b2dc4 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -8,7 +8,6 @@ * Copyright (C) 2012 ARM Ltd. */ -#include <linux/audit.h> #include <linux/compat.h> #include <linux/kernel.h> #include <linux/sched/signal.h> @@ -18,7 +17,6 @@ #include <linux/smp.h> #include <linux/ptrace.h> #include <linux/user.h> -#include <linux/seccomp.h> #include <linux/security.h> #include <linux/init.h> #include <linux/signal.h> @@ -28,7 +26,6 @@ #include <linux/hw_breakpoint.h> #include <linux/regset.h> #include <linux/elf.h> -#include <linux/rseq.h> #include <asm/compat.h> #include <asm/cpufeature.h> @@ -38,13 +35,9 @@ #include <asm/mte.h> #include <asm/pointer_auth.h> #include <asm/stacktrace.h> -#include <asm/syscall.h> #include <asm/traps.h> #include <asm/system_misc.h> -#define CREATE_TRACE_POINTS -#include <trace/events/syscalls.h> - struct pt_regs_offset { const char *name; int offset; @@ -2338,153 +2331,6 @@ long arch_ptrace(struct task_struct *child, long request, return ptrace_request(child, request, addr, data); } -enum ptrace_syscall_dir { - PTRACE_SYSCALL_ENTER = 0, - PTRACE_SYSCALL_EXIT, -}; - -static __always_inline unsigned long ptrace_save_reg(struct pt_regs *regs, - enum ptrace_syscall_dir dir, - int *regno) -{ - unsigned long saved_reg; - - /* - * We have some ABI weirdness here in the way that we handle syscall - * exit stops because we indicate whether or not the stop has been - * signalled from syscall entry or syscall exit by clobbering a general - * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee - * and restoring its old value after the stop. This means that: - * - * - Any writes by the tracer to this register during the stop are - * ignored/discarded. - * - * - The actual value of the register is not available during the stop, - * so the tracer cannot save it and restore it later. - * - * - Syscall stops behave differently to seccomp and pseudo-step traps - * (the latter do not nobble any registers). - */ - *regno = (is_compat_task() ? 12 : 7); - saved_reg = regs->regs[*regno]; - regs->regs[*regno] = dir; - - return saved_reg; -} - -static int report_syscall_entry(struct pt_regs *regs) -{ - unsigned long saved_reg; - int regno, ret; - - saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_ENTER, ®no); - ret = ptrace_report_syscall_entry(regs); - if (ret) - forget_syscall(regs); - regs->regs[regno] = saved_reg; - - return ret; -} - -static void report_syscall_exit(struct pt_regs *regs) -{ - unsigned long saved_reg; - int regno; - - saved_reg = ptrace_save_reg(regs, PTRACE_SYSCALL_EXIT, ®no); - if (!test_thread_flag(TIF_SINGLESTEP)) { - ptrace_report_syscall_exit(regs, 0); - regs->regs[regno] = saved_reg; - } else { - regs->regs[regno] = saved_reg; - - /* - * Signal a pseudo-step exception since we are stepping but - * tracer modifications to the registers may have rewound the - * state machine. - */ - ptrace_report_syscall_exit(regs, 1); - } -} - -static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) -{ - if (unlikely(audit_context())) { - unsigned long args[6]; - - syscall_get_arguments(current, regs, args); - audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); - } -} - -int syscall_trace_enter(struct pt_regs *regs, unsigned long flags) -{ - long syscall; - int ret; - - if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - ret = report_syscall_entry(regs); - if (ret || (flags & _TIF_SYSCALL_EMU)) - return NO_SYSCALL; - } - - /* Do the secure computing after ptrace; failures should be fast. */ - if (flags & _TIF_SECCOMP) { - ret = __secure_computing(); - if (ret == -1) - return NO_SYSCALL; - } - - /* Either of the above might have changed the syscall number */ - syscall = syscall_get_nr(current, regs); - - if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) { - trace_sys_enter(regs, syscall); - - /* - * Probes or BPF hooks in the tracepoint may have changed the - * system call number as well. - */ - syscall = syscall_get_nr(current, regs); - } - - syscall_enter_audit(regs, syscall); - - return ret ? : syscall; -} - -static inline bool report_single_step(unsigned long flags) -{ - if (flags & _TIF_SYSCALL_EMU) - return false; - - return flags & _TIF_SINGLESTEP; -} - -static void syscall_exit_work(struct pt_regs *regs, unsigned long flags) -{ - bool step; - - audit_syscall_exit(regs); - - if (flags & _TIF_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, syscall_get_return_value(current, regs)); - - step = report_single_step(flags); - if (step || flags & _TIF_SYSCALL_TRACE) - report_syscall_exit(regs); -} - -void syscall_exit_to_user_mode_work(struct pt_regs *regs) -{ - unsigned long flags = read_thread_flags(); - - rseq_syscall(regs); - - if (unlikely(flags & _TIF_SYSCALL_EXIT_WORK) || flags & _TIF_SINGLESTEP) - syscall_exit_work(regs, flags); -} - /* * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a. * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which is diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 1110eeb21f57..d3ec1892b3c7 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -8,8 +8,8 @@ #include <linux/cache.h> #include <linux/compat.h> +#include <linux/entry-common.h> #include <linux/errno.h> -#include <linux/irq-entry-common.h> #include <linux/kernel.h> #include <linux/signal.h> #include <linux/freezer.h> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index ec478fc37a9f..77d00a5cf0e9 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -2,6 +2,7 @@ #include <linux/compiler.h> #include <linux/context_tracking.h> +#include <linux/entry-common.h> #include <linux/errno.h> #include <linux/nospec.h> #include <linux/ptrace.h> @@ -68,6 +69,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { + unsigned long work = READ_ONCE(current_thread_info()->syscall_work); unsigned long flags = read_thread_flags(); regs->orig_x0 = regs->regs[0]; @@ -101,7 +103,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, return; } - if (unlikely(flags & _TIF_SYSCALL_WORK)) { + if (unlikely(work & SYSCALL_WORK_ENTER)) { /* * The de-facto standard way to skip a system call using ptrace * is to set the system call to -1 (NO_SYSCALL) and set x0 to a @@ -119,7 +121,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, */ if (scno == NO_SYSCALL) syscall_set_return_value(current, regs, -ENOSYS, 0); - scno = syscall_trace_enter(regs, flags); + scno = syscall_trace_enter(regs, work); if (scno == NO_SYSCALL) goto trace_exit; } -- 2.34.1
After switch arm64 to Generic Entry, the compiler no longer inlines el0_svc_common() into do_el0_svc(). So inline el0_svc_common() and it has 1% performance uplift on perf bench basic syscall on kunpeng920 as below which is based on v6.19-rc1. | Metric | W/O this patch | With this patch | Change | | ---------- | -------------- | --------------- | --------- | | Total time | 2.195 [sec] | 2.171 [sec] | ↓1.1% | | usecs/op | 0.219575 | 0.217192 | ↓1.1% | | ops/sec | 4,554,260 | 4,604,225 | ↑1.1% | Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/syscall.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 77d00a5cf0e9..6fcd97c46716 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -66,8 +66,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, choose_random_kstack_offset(get_random_u16()); } -static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, - const syscall_fn_t syscall_table[]) +static __always_inline void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, + const syscall_fn_t syscall_table[]) { unsigned long work = READ_ONCE(current_thread_info()->syscall_work); unsigned long flags = read_thread_flags(); -- 2.34.1
From: kemal <kmal@cock.li> Support aarch64 to test "Syscall User Dispatch" with sud_test selftest testcase. Signed-off-by: kemal <kmal@cock.li> --- tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c | 2 +- tools/testing/selftests/syscall_user_dispatch/sud_test.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c b/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c index 073a03702ff5..6059abe75cb3 100644 --- a/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c +++ b/tools/testing/selftests/syscall_user_dispatch/sud_benchmark.c @@ -41,7 +41,7 @@ * out of the box, but don't enable them until they support syscall user * dispatch. */ -#if defined(__x86_64__) || defined(__i386__) +#if defined(__x86_64__) || defined(__i386__) || defined(__aarch64__) #define TEST_BLOCKED_RETURN #endif diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c index b855c6000287..3ffea2f4a66d 100644 --- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c +++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c @@ -192,6 +192,10 @@ static void handle_sigsys(int sig, siginfo_t *info, void *ucontext) ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A0] = ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A7]; #endif +#ifdef __aarch64__ + ((ucontext_t *)ucontext)->uc_mcontext.regs[0] = (unsigned int) + ((ucontext_t *)ucontext)->uc_mcontext.regs[8]; +#endif } int setup_sigsys_handler(void) -- 2.34.1
participants (1)
-
Jinjie Ruan