Backport some optimizing patches for kunpeng920 from upstream, including Five pre-patches from Mark to make some preparations before unmasking DAIF for arm64 syscall. One patch from Guo Hui for optimizing unixbench syscall. Two patches from Barry Song for optimizing test_idle_cores() and barrier handling in gic_ipi_send_mask().
Barry Song (2): sched/fair: Optimize test_idle_cores() for !SMT irqchip/gic-v3: Use dsb(ishst) to order writes with ICC_SGI1R_EL1 accesses
Guo Hui (1): arm64: syscall: unmask DAIF for tracing status
Mark Rutland (5): thread_info: Add helpers to snapshot thread flags x86: Snapshot thread flags entry: Snapshot thread flags sched: Snapshot thread flags arm64: Snapshot thread flags
arch/arm64/kernel/ptrace.c | 4 ++-- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 6 ++---- arch/x86/kernel/process.c | 8 ++++---- arch/x86/kernel/process.h | 4 ++-- arch/x86/mm/tlb.c | 2 +- drivers/irqchip/irq-gic-v3.c | 2 +- include/linux/entry-kvm.h | 2 +- include/linux/thread_info.h | 15 +++++++++++++++ kernel/entry/common.c | 4 ++-- kernel/entry/kvm.c | 4 ++-- kernel/sched/core.c | 2 +- kernel/sched/fair.c | 8 +++++--- 13 files changed, 39 insertions(+), 24 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/9112 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/4...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/9112 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/4...
From: Barry Song song.bao.hua@hisilicon.com
mainline inclusion from mainline-v5.13-rc1 commit c8987ae5af793a73e2c0d6ce804d8ff454ea377c category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5U46
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
update_idle_core() is only done for the case of sched_smt_present. but test_idle_cores() is done for all machines even those without SMT.
This can contribute to up 8%+ hackbench performance loss on a machine like kunpeng 920 which has no SMT. This patch removes the redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14 w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929 w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367 +4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song song.bao.hua@hisilicon.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Acked-by: Mel Gorman mgorman@suse.de Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/fair.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 273f6844bc2a..f431d0152220 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7167,9 +7167,11 @@ static inline bool test_idle_cores(int cpu, bool def) { struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu)); - if (sds) - return READ_ONCE(sds->has_idle_cores); + if (static_branch_likely(&sched_smt_present)) { + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu)); + if (sds) + return READ_ONCE(sds->has_idle_cores); + }
return def; }
From: Barry Song song.bao.hua@hisilicon.com
mainline inclusion from mainline-v5.18-rc1 commit 80e4e1f472889f31a4dcaea3a4eb7a565296f1f3 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5VG2
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
A dsb(ishst) barrier should be enough to order previous writes with the system register generating the SGI, as we only need to guarantee the visibility of data to other CPUs in the inner shareable domain before we send the SGI.
A micro-benchmark is written to verify the performance impact on kunpeng920 machine with 2 sockets, each socket has 2 dies, and each die has 24 CPUs, so totally the system has 2 * 2 * 24 = 96 CPUs. ~2% performance improvement can be seen by this benchmark.
The code of benchmark module:
#include <linux/module.h> #include <linux/timekeeping.h>
volatile int data0 ____cacheline_aligned; volatile int data1 ____cacheline_aligned; volatile int data2 ____cacheline_aligned; volatile int data3 ____cacheline_aligned; volatile int data4 ____cacheline_aligned; volatile int data5 ____cacheline_aligned; volatile int data6 ____cacheline_aligned;
static void ipi_latency_func(void *val) { }
static int __init ipi_latency_init(void) { ktime_t stime, etime, delta; int cpu, i; int start = smp_processor_id();
stime = ktime_get(); for ( i = 0; i < 1000; i++) for (cpu = 0; cpu < 96; cpu++) { data0 = data1 = data2 = data3 = data4 = data5 = data6 = cpu; smp_call_function_single(cpu, ipi_latency_func, NULL, 1); } etime = ktime_get();
delta = ktime_sub(etime, stime);
printk("%s ipi from cpu%d to cpu0-95 delta of 1000times:%lld\n", __func__, start, delta);
return 0; } module_init(ipi_latency_init);
static void ipi_latency_exit(void) { } module_exit(ipi_latency_exit);
MODULE_DESCRIPTION("IPI benchmark"); MODULE_LICENSE("GPL");
run the below commands 10 times on both Vanilla and the kernel with this patch: # taskset -c 0 insmod test.ko # rmmod test
The result on vanilla: ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126757449 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126784249 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126177703 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127022281 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126184883 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127374585 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:125778089 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126974441 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:127357625 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:126228184
The result on the kernel with this patch: ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124467401 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123474209 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123558497 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122993951 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:122984223 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123323609 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:124507583 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123386963 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123340664 ipi_latency_init ipi from cpu0 to cpu0-95 delta of 1000times:123285324
Signed-off-by: Barry Song song.bao.hua@hisilicon.com [maz: tidied up commit message] Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220220061910.6155-1-21cnbao@gmail.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/irqchip/irq-gic-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 8b2a45fbc9f7..fcf81e6b6487 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1283,7 +1283,7 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) * Ensure that stores to Normal memory are visible to the * other CPUs before issuing the IPI. */ - wmb(); + dsb(ishst);
for_each_cpu(cpu, mask) { u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.17-rc1 commit 7ad639840acf2800b5f387c495795f995a67a329 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
In <linux/thread_info.h> there are helpers to manipulate individual thread flags, but where code wants to check several flags at once, it must open code reading current_thread_info()->flags and operating on a snapshot.
As some flags can be set remotely it's necessary to use READ_ONCE() to get a consistent snapshot even when IRQs are disabled, but some code forgets to do this. Generally this is unlike to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race.
To make it easier to do the right thing, and to highlight that concurrent modification is possible, add new helpers to snapshot the flags, which should be used in preference to plain reads. Subsequent patches will move existing code to use the new helpers.
Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marco Elver elver@google.com Acked-by: Paul E. McKenney paulmck@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20211129130653.2037928-2-mark.rutland@arm.com
Conflict: include/linux/thread_info.h
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/thread_info.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 19f76d87f20f..af3ca6429d77 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -36,6 +36,21 @@ static inline long set_restart_fn(struct restart_block *restart,
#define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)
+/* + * This may be used in noinstr code, and needs to be __always_inline to prevent + * inadvertent instrumentation. + */ +static __always_inline unsigned long read_ti_thread_flags(struct thread_info *ti) +{ + return READ_ONCE(ti->flags); +} + +#define read_thread_flags() \ + read_ti_thread_flags(current_thread_info()) + +#define read_task_thread_flags(t) \ + read_ti_thread_flags(task_thread_info(t)) + #define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
#ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.17-rc1 commit dca99fb643a2e9bc2e67a0f626b09d4f177f0f09 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20211129130653.2037928-12-mark.rutland@arm.com
Conflict: arch/x86/mm/tlb.c
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/kernel/process.c | 8 ++++---- arch/x86/kernel/process.h | 4 ++-- arch/x86/mm/tlb.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e0c04a26a176..f9343b826651 100755 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -370,7 +370,7 @@ void arch_setup_new_exec(void) clear_thread_flag(TIF_SSBD); task_clear_spec_ssb_disable(current); task_clear_spec_ssb_noexec(current); - speculation_ctrl_update(task_thread_info(current)->flags); + speculation_ctrl_update(read_thread_flags()); } }
@@ -622,7 +622,7 @@ static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk) clear_tsk_thread_flag(tsk, TIF_SPEC_IB); } /* Return the updated threadinfo flags*/ - return task_thread_info(tsk)->flags; + return read_task_thread_flags(tsk); }
void speculation_ctrl_update(unsigned long tif) @@ -658,8 +658,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) { unsigned long tifp, tifn;
- tifn = READ_ONCE(task_thread_info(next_p)->flags); - tifp = READ_ONCE(task_thread_info(prev_p)->flags); + tifn = read_task_thread_flags(next_p); + tifp = read_task_thread_flags(prev_p);
switch_to_bitmap(tifp);
diff --git a/arch/x86/kernel/process.h b/arch/x86/kernel/process.h index 1d0797b2338a..76b547b83232 100644 --- a/arch/x86/kernel/process.h +++ b/arch/x86/kernel/process.h @@ -13,8 +13,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p); static inline void switch_to_extra(struct task_struct *prev, struct task_struct *next) { - unsigned long next_tif = task_thread_info(next)->flags; - unsigned long prev_tif = task_thread_info(prev)->flags; + unsigned long next_tif = read_task_thread_flags(next); + unsigned long prev_tif = read_task_thread_flags(prev);
if (IS_ENABLED(CONFIG_SMP)) { /* diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index f4b162f273f5..86a66efa26dc 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -318,7 +318,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
static unsigned long mm_mangle_tif_spec_ib(struct task_struct *next) { - unsigned long next_tif = task_thread_info(next)->flags; + unsigned long next_tif = read_task_thread_flags(next); unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
return (unsigned long)next->mm | ibpb;
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.17-rc1 commit 6ce895128b3bff738fe8d9dd74747a03e319e466 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20211129130653.2037928-3-mark.rutland@arm.com
Conflict: kernel/entry/common.c
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/entry-kvm.h | 2 +- kernel/entry/common.c | 4 ++-- kernel/entry/kvm.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h index 859fdfd7d46c..4fe38c2d2706 100644 --- a/include/linux/entry-kvm.h +++ b/include/linux/entry-kvm.h @@ -60,7 +60,7 @@ int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu); */ static inline bool __xfer_to_guest_mode_work_pending(void) { - unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + unsigned long ti_work = read_thread_flags();
return !!(ti_work & XFER_TO_GUEST_MODE_WORK); } diff --git a/kernel/entry/common.c b/kernel/entry/common.c index a028b28daed5..7e4fc453da7d 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -186,7 +186,7 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, /* Check if any of the above work has queued a deferred wakeup */ rcu_nocb_flush_deferred_wakeup();
- ti_work = READ_ONCE(current_thread_info()->flags); + ti_work = read_thread_flags(); }
/* Return the latest work state for arch_exit_to_user_mode() */ @@ -195,7 +195,7 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
static void exit_to_user_mode_prepare(struct pt_regs *regs) { - unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + unsigned long ti_work = read_thread_flags();
lockdep_assert_irqs_disabled();
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c index 7b946847be78..6d8b245aad4f 100644 --- a/kernel/entry/kvm.c +++ b/kernel/entry/kvm.c @@ -25,7 +25,7 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work) if (ret) return ret;
- ti_work = READ_ONCE(current_thread_info()->flags); + ti_work = read_thread_flags(); } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched()); return 0; } @@ -42,7 +42,7 @@ int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu) * disabled in the inner loop before going into guest mode. No need * to disable interrupts here. */ - ti_work = READ_ONCE(current_thread_info()->flags); + ti_work = read_thread_flags(); if (!(ti_work & XFER_TO_GUEST_MODE_WORK)) return 0;
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.17-rc1 commit 0569b245132c40015281610353935a50e282eb94 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
The READ_ONCE(ti->flags) .. cmpxchg(ti->flags) loop in set_nr_if_polling() is left as-is for clarity.
Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Paul E. McKenney paulmck@kernel.org Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Link: https://lore.kernel.org/r/20211129130653.2037928-4-mark.rutland@arm.com
Conflict: kernel/sched/core.c
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 557b01680ae7..770d3e7ace4f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7613,7 +7613,7 @@ void sched_show_task(struct task_struct *p) rcu_read_unlock(); pr_cont(" stack:%5lu pid:%5d ppid:%6d flags:0x%08lx\n", free, task_pid_nr(p), ppid, - (unsigned long)task_thread_info(p)->flags); + read_task_thread_flags(p));
print_worker_info(KERN_INFO, p); show_stack(p, NULL, KERN_INFO);
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.17-rc1 commit 342b3808786518ced347f40b59bae68664e20007 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
Some thread flags can be set remotely, and so even when IRQs are disabled, the flags can change under our feet. Generally this is unlikely to cause a problem in practice, but it is somewhat unsound, and KCSAN will legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Will Deacon will@kernel.org Acked-by: Paul E. McKenney paulmck@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20211129130653.2037928-7-mark.rutland@arm.com
Conflict: arch/arm64/kernel/entry-common.c
Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/ptrace.c | 4 ++-- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 82115f1b8c38..10db30242494 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2098,7 +2098,7 @@ static void tracehook_report_syscall(struct pt_regs *regs,
int syscall_trace_enter(struct pt_regs *regs) { - unsigned long flags = READ_ONCE(current_thread_info()->flags); + unsigned long flags = read_thread_flags();
if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER); @@ -2121,7 +2121,7 @@ int syscall_trace_enter(struct pt_regs *regs)
void syscall_trace_exit(struct pt_regs *regs) { - unsigned long flags = READ_ONCE(current_thread_info()->flags); + unsigned long flags = read_thread_flags();
audit_syscall_exit(regs);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 7437291ff9d2..a0c85bb01751 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -891,7 +891,7 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, }
local_daif_mask(); - thread_flags = READ_ONCE(current_thread_info()->flags); + thread_flags = read_thread_flags(); } while (thread_flags & _TIF_WORK_MASK); }
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index e52f83668de5..d59471fd679c 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -109,7 +109,7 @@ static void cortex_a76_erratum_1463225_svc_handler(void) { } static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { - unsigned long flags = current_thread_info()->flags; + unsigned long flags = read_thread_flags();
regs->orig_x0 = regs->regs[0]; regs->syscallno = scno; @@ -177,7 +177,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, */ if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { local_daif_mask(); - flags = current_thread_info()->flags; + flags = read_thread_flags(); if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) return; local_daif_restore(DAIF_PROCCTX);
From: Guo Hui guohui@uniontech.com
mainline inclusion from mainline-v6.5-rc1 commit 1da185fc8288fadb952e06881d0b75e924780103 category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IA5WFA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------------------------------
The following code: static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { ...
if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { local_daif_mask(); flags = read_thread_flags(); -------------------------------- A if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) return; local_daif_restore(DAIF_PROCCTX); }
trace_exit: syscall_trace_exit(regs); ------- B }
1. The flags in the if conditional statement should be used in the same way as the flags in the function syscall_trace_exit, because DAIF is not shielded in the function syscall_trace_exit, so when the flags are obtained in line A of the code, there is no need to shield DAIF. Don't care about the modification of flags after line A.
2. Masking DAIF caused syscall performance to deteriorate by about 10%. The Unixbench single core syscall performance data is as follows:
Machine: Kunpeng 920
Mask DAIF: System Call Overhead 1172314.1 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX System Call Overhead 15000.0 1172314.1 781.5 ======== System Benchmarks Index Score (Partial Only) 781.5
Unmask DAIF: System Call Overhead 1287944.6 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX System Call Overhead 15000.0 1287944.6 858.6 ======== System Benchmarks Index Score (Partial Only) 858.6
Rationale from Mark Rutland as for why this is safe:
This masking is an artifact of the old "ret_fast_syscall" assembly that was converted to C in commit:
f37099b6992a0b81 ("arm64: convert syscall trace logic to C")
The assembly would mask DAIF, check the thread flags, and fall through to kernel_exit without unmasking if no tracing was needed. The conversion copied this masking into the C version, though this wasn't strictly necessary.
As (in general) thread flags can be manipulated by other threads, it's not safe to manipulate the thread flags with plain reads and writes, and since commit:
342b3808786518ce ("arm64: Snapshot thread flags")
... we use read_thread_flags() to read the flags atomically.
With this, there is no need to mask DAIF transiently around reading the flags, as we only decide whether to trace while DAIF is masked, and the actual tracing occurs with DAIF unmasked. When el0_svc_common() returns its caller will unconditionally mask DAIF via exit_to_user_mode(), so the masking is redundant.
Signed-off-by: Guo Hui guohui@uniontech.com Reviewed-by: Mark Rutland mark.rutland@arm.com Link: https://lore.kernel.org/r/20230526024715.8773-1-guohui@uniontech.com [catalin.marinas@arm.com: updated comment with Mark's rationale] Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/syscall.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index d59471fd679c..9bd304568d90 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -176,11 +176,9 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, * exit regardless, as the old entry assembly did. */ if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { - local_daif_mask(); flags = read_thread_flags(); if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) return; - local_daif_restore(DAIF_PROCCTX); }
trace_exit: