From: Piotr Figiel figiel@google.com
mainline inclusion from mainline-5.13-rc1 commit 90f093fa8ea48e5d991332cee160b761423d55c1 category: feature feature: Userspace percpu bugzilla: https://gitee.com/openeuler/kernel/issues/I4W2BQ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
For userspace checkpoint and restore (C/R) a way of getting process state containing RSEQ configuration is needed.
There are two ways this information is going to be used: - to re-enable RSEQ for threads which had it enabled before C/R - to detect if a thread was in a critical section during C/R
Since C/R preserves TLS memory and addresses RSEQ ABI will be restored using the address registered before C/R.
Detection whether the thread is in a critical section during C/R is needed to enforce behavior of RSEQ abort during C/R. Attaching with ptrace() before registers are dumped itself doesn't cause RSEQ abort. Restoring the instruction pointer within the critical section is problematic because rseq_cs may get cleared before the control is passed to the migrated application code leading to RSEQ invariants not being preserved. C/R code will use RSEQ ABI address to find the abort handler to which the instruction pointer needs to be set.
To achieve above goals expose the RSEQ ABI address and the signature value with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.
This new ptrace request can also be used by debuggers so they are aware of stops within restartable sequences in progress.
Signed-off-by: Piotr Figiel figiel@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Michal Miroslaw emmir@google.com Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Acked-by: Oleg Nesterov oleg@redhat.com Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com Signed-off-by: Yunfeng Ye yeyunfeng@huawei.com Reviewed-by: Chao Liu liuchao173@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/uapi/linux/ptrace.h | 10 ++++++++++ kernel/ptrace.c | 25 +++++++++++++++++++++++++ 2 files changed, 35 insertions(+)
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h index 83ee45fa634b..3747bf816f9a 100644 --- a/include/uapi/linux/ptrace.h +++ b/include/uapi/linux/ptrace.h @@ -102,6 +102,16 @@ struct ptrace_syscall_info { }; };
+#define PTRACE_GET_RSEQ_CONFIGURATION 0x420f + +struct ptrace_rseq_configuration { + __u64 rseq_abi_pointer; + __u32 rseq_abi_size; + __u32 signature; + __u32 flags; + __u32 pad; +}; + /* * These values are stored in task->ptrace_message * by tracehook_report_syscall_* to describe the current syscall-stop. diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 0087ce50d99e..e3210358bcd2 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -31,6 +31,7 @@ #include <linux/cn_proc.h> #include <linux/compat.h> #include <linux/sched/signal.h> +#include <linux/minmax.h>
#include <asm/syscall.h> /* for syscall_get_* */
@@ -795,6 +796,24 @@ static int ptrace_peek_siginfo(struct task_struct *child, return ret; }
+#ifdef CONFIG_RSEQ +static long ptrace_get_rseq_configuration(struct task_struct *task, + unsigned long size, void __user *data) +{ + struct ptrace_rseq_configuration conf = { + .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, + .rseq_abi_size = sizeof(*task->rseq), + .signature = task->rseq_sig, + .flags = 0, + }; + + size = min_t(unsigned long, size, sizeof(conf)); + if (copy_to_user(data, &conf, size)) + return -EFAULT; + return sizeof(conf); +} +#endif + #ifdef PTRACE_SINGLESTEP #define is_singlestep(request) ((request) == PTRACE_SINGLESTEP) #else @@ -1243,6 +1262,12 @@ int ptrace_request(struct task_struct *child, long request, ret = seccomp_get_metadata(child, addr, datavp); break;
+#ifdef CONFIG_RSEQ + case PTRACE_GET_RSEQ_CONFIGURATION: + ret = ptrace_get_rseq_configuration(child, addr, datavp); + break; +#endif + default: break; }
From: Zhang Qiao zhangqiao22@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT CVE: NA
--------------------------------
After removing dependency CONFIG_x86, if enable CONFIG_QOS_SCHED, only x86 server can handle priority inversion issue.
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Cheng Jian cj.chengjian@huawei.com Reviewed-by: Chen Hui judy.chenhui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- init/Kconfig | 1 - 1 file changed, 1 deletion(-)
diff --git a/init/Kconfig b/init/Kconfig index 4410b711f9dc..17533f1f19d4 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -953,7 +953,6 @@ config QOS_SCHED bool "Qos task scheduling" depends on CGROUP_SCHED depends on CFS_BANDWIDTH - depends on X86
default n
From: Zhang Qiao zhangqiao22@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT CVE: NA
--------------------------------
Signed-off-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: Cheng Jian cj.chengjian@huawei.com Reviewed-by: Chen Hui judy.chenhui@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 62f43ff270f9..8d150178b5b8 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -149,6 +149,7 @@ CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y +CONFIG_QOS_SCHED=y # CONFIG_CGROUP_DEBUG is not set CONFIG_SOCK_CGROUP_DATA=y CONFIG_CGROUP_FILES=y
From: Chao Liu liuchao173@huawei.com
euler inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4VPIB CVE: NA
-------------------------------------------------
If this config is enabled, block mapping is not used. The linear address page table is mapped to 4 KB. As a result, the TLB miss rate is high, affecting performance.
For examples, tested by libMicro benchmark: enable disable Improve memsetP2_10m 3540.37760 2129.715200 66.2% memset_4k 0.38400 0.204800 87.5% mprot_twz8k 7.16800 3.072000 133.3% unmap_ra8k 7.93600 4.096000 93.8% unmap_wa128k 68.86400 33.024000 108.5%
This additional enhancement can be turned on with rodata=full if this option is set to 'n'.
Signed-off-by: Chao Liu liuchao173@huawei.com Reviewed-by: Kai Liu kai.liu@suse.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 8d150178b5b8..18836a160cf4 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -434,7 +434,7 @@ CONFIG_ARM64_CPU_PARK=y # CONFIG_XEN is not set CONFIG_FORCE_MAX_ZONEORDER=11 CONFIG_UNMAP_KERNEL_AT_EL0=y -CONFIG_RODATA_FULL_DEFAULT_ENABLED=y +# CONFIG_RODATA_FULL_DEFAULT_ENABLED is not set CONFIG_ARM64_PMEM_RESERVE=y CONFIG_ARM64_PMEM_LEGACY=m # CONFIG_ARM64_SW_TTBR0_PAN is not set