From: Piotr Figiel figiel@google.com
mainline inclusion from mainline-5.13-rc1 commit 90f093fa8ea48e5d991332cee160b761423d55c1 category: feature feature: Userspace percpu bugzilla: https://gitee.com/openeuler/kernel/issues/I4W2BQ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
For userspace checkpoint and restore (C/R) a way of getting process state containing RSEQ configuration is needed.
There are two ways this information is going to be used: - to re-enable RSEQ for threads which had it enabled before C/R - to detect if a thread was in a critical section during C/R
Since C/R preserves TLS memory and addresses RSEQ ABI will be restored using the address registered before C/R.
Detection whether the thread is in a critical section during C/R is needed to enforce behavior of RSEQ abort during C/R. Attaching with ptrace() before registers are dumped itself doesn't cause RSEQ abort. Restoring the instruction pointer within the critical section is problematic because rseq_cs may get cleared before the control is passed to the migrated application code leading to RSEQ invariants not being preserved. C/R code will use RSEQ ABI address to find the abort handler to which the instruction pointer needs to be set.
To achieve above goals expose the RSEQ ABI address and the signature value with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.
This new ptrace request can also be used by debuggers so they are aware of stops within restartable sequences in progress.
Signed-off-by: Piotr Figiel figiel@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Michal Miroslaw emmir@google.com Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Acked-by: Oleg Nesterov oleg@redhat.com Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com Signed-off-by: Yunfeng Ye yeyunfeng@huawei.com Reviewed-by: Chao Liu liuchao173@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/uapi/linux/ptrace.h | 10 ++++++++++ kernel/ptrace.c | 25 +++++++++++++++++++++++++ 2 files changed, 35 insertions(+)
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h index 83ee45fa634b..3747bf816f9a 100644 --- a/include/uapi/linux/ptrace.h +++ b/include/uapi/linux/ptrace.h @@ -102,6 +102,16 @@ struct ptrace_syscall_info { }; };
+#define PTRACE_GET_RSEQ_CONFIGURATION 0x420f + +struct ptrace_rseq_configuration { + __u64 rseq_abi_pointer; + __u32 rseq_abi_size; + __u32 signature; + __u32 flags; + __u32 pad; +}; + /* * These values are stored in task->ptrace_message * by tracehook_report_syscall_* to describe the current syscall-stop. diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 0087ce50d99e..e3210358bcd2 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -31,6 +31,7 @@ #include <linux/cn_proc.h> #include <linux/compat.h> #include <linux/sched/signal.h> +#include <linux/minmax.h>
#include <asm/syscall.h> /* for syscall_get_* */
@@ -795,6 +796,24 @@ static int ptrace_peek_siginfo(struct task_struct *child, return ret; }
+#ifdef CONFIG_RSEQ +static long ptrace_get_rseq_configuration(struct task_struct *task, + unsigned long size, void __user *data) +{ + struct ptrace_rseq_configuration conf = { + .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, + .rseq_abi_size = sizeof(*task->rseq), + .signature = task->rseq_sig, + .flags = 0, + }; + + size = min_t(unsigned long, size, sizeof(conf)); + if (copy_to_user(data, &conf, size)) + return -EFAULT; + return sizeof(conf); +} +#endif + #ifdef PTRACE_SINGLESTEP #define is_singlestep(request) ((request) == PTRACE_SINGLESTEP) #else @@ -1243,6 +1262,12 @@ int ptrace_request(struct task_struct *child, long request, ret = seccomp_get_metadata(child, addr, datavp); break;
+#ifdef CONFIG_RSEQ + case PTRACE_GET_RSEQ_CONFIGURATION: + ret = ptrace_get_rseq_configuration(child, addr, datavp); + break; +#endif + default: break; }