[PATCH OLK-6.6 v7 00/10] Add xcall 2.0 support
Add xcall 2.0 support, which is compatible with xcall1.0, and can be used to dynamic instruction replacement for hardware xcall. Changes in v7: - Fix alternative bug in sync ventry. - Add "xcall" limit for xcall2.0. Changes in v6: - Add "xcall=debug" debug mode to use the "fast syscall" for every system call. - Use alternative to reduce the baseline performance noise floor when not set "xcall" on the kernel cmdline - Move binary_path out from xcall struct, and add to xcall_comm struct. - Fix1. insert_xcall_locked() check both "name" and "binary" to avoid repeated attach same name of xcall. - Fix2. Check if it is an executable file for the binary path. - Fix3. Check if it is an absolute path for the binary path. - Fix4. Remove the extra free_xcall_comm(info) in default branch in proc_xcall_command(). - Fix5. Add zero syscall hijack check for xcall_prog_register(). - Use __arm64_sys_ni_syscall() instead of define inv_xcall(). - Replace the -ENOSYS return value with -EINVAL, to avoid the warning. Changes in v5: - Fix build problem. - el0t_64_sync_ventry jump to kernel_ventry to do VMAP_STACK. - Some cleanups. - Merge the three refactor patch. Changes in v4: - Reuse uprobe_write_opcode() instead of use __replace_page() and copy_to_page(). - Handle error for uprobe_write_opcode(). - Support xcall1.0 before 920G. - Change xcall1.0 bitmap to byte array to save check insns. - Use percpu variable to save check insns for xcall1.0. - A little cleanup. Changes in v3: - Add validity checks during xcall registration to prevent out-of-bounds access and null pointer dereferences. - Extract __replace_page() and copy_to_page() from uprobe, to avoid code duplication. - Replace with new compatible refactored code. - Remove unnecessary member, such as old_name. Changes in v2: - Remove duplicates or unnecessary ~300 LOC, such as link_slot, init_task. - Fix kabi. - Some cleanups. Jinjie Ruan (2): xcall2.0: Support xcall1.0 for hardware xcall openeuler_defconfig: Enable CONFIG_DYNAMIC_XCALL by default Liao Chang (6): xcall2.0: Add userspace proc interface xcall2.0: Add xcall module register interface xcall2.0: Add xcall_area xcall2.0: Hajack syscall with dynamic instruciton replace xcall: Rework the early exception vector of XCALL and SYNC xcall2.0: Add a basic testcase Yipeng Zou (1): xcall: Add debug mode to force fast syscall for all threads Yuntao Liu (1): xcall2.0: Intruduce xcall2.0 prefetch kernel module arch/arm64/Kconfig.turbo | 13 + arch/arm64/configs/openeuler_defconfig | 1 + arch/arm64/include/asm/exception.h | 4 +- arch/arm64/include/asm/mmu_context.h | 7 - arch/arm64/include/asm/xcall.h | 154 ++++---- arch/arm64/kernel/cpufeature.c | 76 ++-- arch/arm64/kernel/entry-common.c | 28 +- arch/arm64/kernel/entry.S | 86 +---- arch/arm64/kernel/probes/uprobes.c | 6 + arch/arm64/kernel/process.c | 5 + arch/arm64/kernel/syscall.c | 14 + arch/arm64/kernel/xcall/Makefile | 3 +- arch/arm64/kernel/xcall/core.c | 390 +++++++++++++++++++++ arch/arm64/kernel/xcall/entry.S | 139 ++++++-- arch/arm64/kernel/xcall/proc.c | 226 ++++++++++++ arch/arm64/kernel/xcall/xcall.c | 66 ++-- arch/arm64/kvm/sys_regs.c | 1 + drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/xcall/Kconfig | 19 + drivers/staging/xcall/Makefile | 1 + drivers/staging/xcall/dynamic_xcall_test.c | 90 +++++ drivers/staging/xcall/prefetch.c | 265 ++++++++++++++ fs/proc/proc_xcall.c | 144 ++------ include/linux/mm_types.h | 4 + include/linux/xcall.h | 51 +++ kernel/events/uprobes.c | 23 ++ kernel/fork.c | 3 + mm/mmap.c | 14 +- 29 files changed, 1462 insertions(+), 374 deletions(-) create mode 100644 arch/arm64/kernel/xcall/core.c create mode 100644 arch/arm64/kernel/xcall/proc.c create mode 100644 drivers/staging/xcall/Kconfig create mode 100644 drivers/staging/xcall/Makefile create mode 100644 drivers/staging/xcall/dynamic_xcall_test.c create mode 100644 drivers/staging/xcall/prefetch.c create mode 100644 include/linux/xcall.h -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Add "xcall", "xcall_comm" key struct and provide the procfs interface to userspace. Add '/proc/xcall/command' interface, using this file for attaching xcall programs onto one executable. Argument syntax: +:COMM BINARY KERNEL_MODULE : Attach a xcall -:COMM : Detach a xcall COMM: Unique string for attached xcall. BINARY: Path to an executable. KERNEL_MODULE: Module name listed in /proc/modules provide xcall program. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Zheng Xinyu <zhengxinyu6@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/Kconfig.turbo | 12 ++ arch/arm64/include/asm/xcall.h | 26 +++- arch/arm64/kernel/xcall/Makefile | 3 +- arch/arm64/kernel/xcall/core.c | 158 +++++++++++++++++++++ arch/arm64/kernel/xcall/proc.c | 226 +++++++++++++++++++++++++++++++ include/linux/xcall.h | 26 ++++ 6 files changed, 449 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/kernel/xcall/core.c create mode 100644 arch/arm64/kernel/xcall/proc.c create mode 100644 include/linux/xcall.h diff --git a/arch/arm64/Kconfig.turbo b/arch/arm64/Kconfig.turbo index c4a8e4e889aa..cfefbdb605f8 100644 --- a/arch/arm64/Kconfig.turbo +++ b/arch/arm64/Kconfig.turbo @@ -71,4 +71,16 @@ config ACTLR_XCALL_XINT Use the 0x680 as the offset to the exception vector base address for the Armv8.8 NMI taken from EL0. +config DYNAMIC_XCALL + bool "Support dynamically replace and load system call" + depends on FAST_SYSCALL + default n + help + Xcall 2.0 add "/proc/xcall/comm" interface to + attach xcall programs onto one executable, + and support different custom syscall implementation + by dynamic instruction replaced with 'svc ffff' + and a kernel module which provides customized + implementation. + endmenu # "Turbo features selection" diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index 5765a96eed53..9a813d65eb83 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -7,10 +7,34 @@ #include <linux/percpu.h> #include <linux/sched.h> #include <linux/types.h> +#include <linux/xcall.h> #include <asm/actlr.h> #include <asm/cpufeature.h> +struct xcall_comm { + char *name; + char *binary; + struct path binary_path; + char *module; + struct list_head list; +}; + +struct xcall { + /* used for xcall_attach */ + struct list_head list; + refcount_t ref; + /* file attached xcall */ + struct inode *binary; + struct xcall_prog *program; + char *name; +}; + +#ifdef CONFIG_DYNAMIC_XCALL +extern int xcall_attach(struct xcall_comm *info); +extern int xcall_detach(struct xcall_comm *info); +#endif /* CONFIG_DYNAMIC_XCALL */ + DECLARE_STATIC_KEY_FALSE(xcall_enable); struct xcall_info { @@ -93,4 +117,4 @@ static inline void cpu_switch_xcall_entry(struct task_struct *tsk) } #endif /* CONFIG_ACTLR_XCALL_XINT */ -#endif /*__ASM_XCALL_H*/ +#endif /* __ASM_XCALL_H */ diff --git a/arch/arm64/kernel/xcall/Makefile b/arch/arm64/kernel/xcall/Makefile index 0168bd190793..4a9c8eedcba9 100644 --- a/arch/arm64/kernel/xcall/Makefile +++ b/arch/arm64/kernel/xcall/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y += xcall.o +obj-y += xcall.o +obj-$(CONFIG_DYNAMIC_XCALL) += core.o proc.o diff --git a/arch/arm64/kernel/xcall/core.c b/arch/arm64/kernel/xcall/core.c new file mode 100644 index 000000000000..863cbb72afa6 --- /dev/null +++ b/arch/arm64/kernel/xcall/core.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Huawei Limited. + */ + +#define pr_fmt(fmt) "xcall: " fmt + +#include <linux/slab.h> +#include <linux/xcall.h> + +#include <asm/xcall.h> + +static DEFINE_SPINLOCK(xcall_list_lock); +static LIST_HEAD(xcalls_list); +static DEFINE_SPINLOCK(prog_list_lock); +static LIST_HEAD(progs_list); + +/* + * Travel the list of all registered xcall_prog during module installation + * to find the xcall_prog. + */ +static struct xcall_prog *get_xcall_prog(const char *module) +{ + struct xcall_prog *p; + + spin_lock(&prog_list_lock); + list_for_each_entry(p, &progs_list, list) { + if (!strcmp(module, p->name)) { + spin_unlock(&prog_list_lock); + return p; + } + } + spin_unlock(&prog_list_lock); + return NULL; +} + + +static struct xcall *get_xcall(struct xcall *xcall) +{ + refcount_inc(&xcall->ref); + return xcall; +} + +static void put_xcall(struct xcall *xcall) +{ + if (!refcount_dec_and_test(&xcall->ref)) + return; + + pr_info("free xcall resource.\n"); + kfree(xcall->name); + if (xcall->program) + module_put(xcall->program->owner); + + kfree(xcall); +} + +static struct xcall *find_xcall(const char *name, struct inode *binary) +{ + struct xcall *xcall; + + list_for_each_entry(xcall, &xcalls_list, list) { + if ((name && !strcmp(name, xcall->name)) || + (binary && xcall->binary == binary)) + return get_xcall(xcall); + } + return NULL; +} + +static struct xcall *find_xcall_by_name_locked(const char *name) +{ + struct xcall *ret = NULL; + + spin_lock(&xcall_list_lock); + ret = find_xcall(name, NULL); + spin_unlock(&xcall_list_lock); + return ret; +} + +static struct xcall *insert_xcall_locked(struct xcall *xcall) +{ + struct xcall *ret = NULL; + + spin_lock(&xcall_list_lock); + ret = find_xcall(xcall->name, xcall->binary); + if (!ret) + list_add(&xcall->list, &xcalls_list); + else + put_xcall(ret); + spin_unlock(&xcall_list_lock); + return ret; +} + +static void delete_xcall(struct xcall *xcall) +{ + spin_lock(&xcall_list_lock); + list_del(&xcall->list); + spin_unlock(&xcall_list_lock); + + put_xcall(xcall); +} + +/* Init xcall with a given inode */ +static int init_xcall(struct xcall *xcall, struct xcall_comm *comm) +{ + struct xcall_prog *program = get_xcall_prog(comm->module); + + if (!program || !try_module_get(program->owner)) + return -EINVAL; + + xcall->binary = d_real_inode(comm->binary_path.dentry); + xcall->program = program; + refcount_set(&xcall->ref, 1); + INIT_LIST_HEAD(&xcall->list); + + return 0; +} + +int xcall_attach(struct xcall_comm *comm) +{ + struct xcall *xcall; + int ret; + + xcall = kzalloc(sizeof(struct xcall), GFP_KERNEL); + if (!xcall) + return -ENOMEM; + + ret = init_xcall(xcall, comm); + if (ret) { + kfree(xcall); + return ret; + } + + xcall->name = kstrdup(comm->name, GFP_KERNEL); + if (!xcall->name) { + delete_xcall(xcall); + return -ENOMEM; + } + + if (insert_xcall_locked(xcall)) { + delete_xcall(xcall); + return -EINVAL; + } + + return 0; +} + +int xcall_detach(struct xcall_comm *comm) +{ + struct xcall *xcall; + + xcall = find_xcall_by_name_locked(comm->name); + if (!xcall) + return -EINVAL; + + put_xcall(xcall); + delete_xcall(xcall); + return 0; +} diff --git a/arch/arm64/kernel/xcall/proc.c b/arch/arm64/kernel/xcall/proc.c new file mode 100644 index 000000000000..4ec52752fb79 --- /dev/null +++ b/arch/arm64/kernel/xcall/proc.c @@ -0,0 +1,226 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Huawei Limited. + */ + +#include <linux/namei.h> +#include <linux/path.h> +#include <linux/proc_fs.h> +#include <linux/seq_file.h> +#include <linux/slab.h> +#include <linux/string.h> +#include <linux/xcall.h> + +#include <asm/xcall.h> + +static LIST_HEAD(comm_list); +static DECLARE_RWSEM(comm_rwsem); + +static void free_xcall_comm(struct xcall_comm *info) +{ + if (!info) + return; + kfree(info->name); + kfree(info->binary); + kfree(info->module); + path_put(&info->binary_path); + kfree(info); +} + +static struct xcall_comm *find_xcall_comm(struct xcall_comm *comm) +{ + struct xcall_comm *temp; + + list_for_each_entry(temp, &comm_list, list) { + if (!strcmp(comm->name, temp->name)) + return temp; + } + + return NULL; +} + +static void delete_xcall_comm_locked(struct xcall_comm *info) +{ + struct xcall_comm *ret; + + down_write(&comm_rwsem); + ret = find_xcall_comm(info); + if (ret) + list_del(&ret->list); + up_write(&comm_rwsem); + free_xcall_comm(ret); +} + +static void insert_xcall_comm_locked(struct xcall_comm *info) +{ + down_write(&comm_rwsem); + if (!find_xcall_comm(info)) + list_add(&info->list, &comm_list); + up_write(&comm_rwsem); +} + +static int is_absolute_path(const char *path) +{ + return path[0] == '/'; +} + +static int parse_xcall_command(int argc, char **argv, + struct xcall_comm *info) +{ + struct dentry *dentry; + + if (strlen(argv[0]) < 3) + return -ECANCELED; + + if (argv[0][0] != '+' && argv[0][0] != '-') + return -ECANCELED; + + if (argv[0][1] != ':') + return -ECANCELED; + + if (argv[0][0] == '+' && argc != 3) + return -ECANCELED; + + if (argv[0][0] == '-' && argc != 1) + return -ECANCELED; + + info->name = kstrdup(&argv[0][2], GFP_KERNEL); + if (!info->name) + return -ENOMEM; + + if (argv[0][0] == '-') + return '-'; + + info->binary = kstrdup(argv[1], GFP_KERNEL); + if (!info->binary || !is_absolute_path(info->binary)) + goto free_name; + + if (kern_path(info->binary, LOOKUP_FOLLOW, &info->binary_path)) + goto free_binary; + + dentry = info->binary_path.dentry; + if (!dentry || !S_ISREG(d_inode(dentry)->i_mode) || + !(d_inode(dentry)->i_mode & 0111)) + goto put_path; + + info->module = kstrdup(argv[2], GFP_KERNEL); + if (!info->module) + goto put_path; + + return argv[0][0]; + +put_path: + path_put(&info->binary_path); +free_binary: + kfree(info->binary); +free_name: + kfree(info->name); + return 'x'; +} + +/* + * /proc/xcall/comm + * Argument syntax: + * +:COMM ELF_FILE [KERNEL_MODULE] : Attach a xcall + * -:COMM : Detach a xcall + * + * COMM: : Unique string for attached xcall. + * ELF_FILE : Path to an executable or library. + * KERNEL_MODULE : Module name listed in /proc/modules provide xcall program. + */ +int proc_xcall_command(int argc, char **argv) +{ + struct xcall_comm *info; + int ret, op; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + INIT_LIST_HEAD(&info->list); + + op = parse_xcall_command(argc, argv, info); + switch (op) { + case '+': + ret = xcall_attach(info); + if (!ret) + insert_xcall_comm_locked(info); + else + free_xcall_comm(info); + break; + case '-': + ret = xcall_detach(info); + if (!ret) + delete_xcall_comm_locked(info); + free_xcall_comm(info); + break; + default: + kfree(info); + return -EINVAL; + } + + return ret; +} + +static int xcall_comm_show(struct seq_file *m, void *v) +{ + struct xcall_comm *info; + + down_read(&comm_rwsem); + list_for_each_entry(info, &comm_list, list) { + seq_printf(m, "+:%s %s %s\n", + info->name, info->binary, + info->module); + } + up_read(&comm_rwsem); + return 0; +} + +static int xcall_comm_open(struct inode *inode, struct file *file) +{ + return single_open(file, xcall_comm_show, NULL); +} + +static ssize_t xcall_comm_write(struct file *file, + const char __user *user_buf, + size_t nbytes, loff_t *ppos) +{ + int argc = 0, ret = 0; + char *raw_comm; + char **argv; + + raw_comm = memdup_user_nul(user_buf, nbytes - 1); + if (IS_ERR(raw_comm)) + return PTR_ERR(raw_comm); + + argv = argv_split(GFP_KERNEL, raw_comm, &argc); + if (!argv) { + kfree(raw_comm); + return -ENOMEM; + } + + ret = proc_xcall_command(argc, argv); + + argv_free(argv); + + kfree(raw_comm); + + return ret ? ret : nbytes; +} + +static const struct proc_ops xcall_comm_ops = { + .proc_open = xcall_comm_open, + .proc_read = seq_read, + .proc_lseek = seq_lseek, + .proc_write = xcall_comm_write, +}; + +static int __init xcall_proc_init(void) +{ + if (!static_key_enabled(&xcall_enable)) + return 0; + + proc_mkdir("xcall", NULL); + proc_create("xcall/comm", 0644, NULL, &xcall_comm_ops); + return 0; +} +module_init(xcall_proc_init); diff --git a/include/linux/xcall.h b/include/linux/xcall.h new file mode 100644 index 000000000000..5d1b6e91ed56 --- /dev/null +++ b/include/linux/xcall.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2025 Huawei. + */ + +#ifndef _LINUX_XCALL_H +#define _LINUX_XCALL_H + +#include <linux/module.h> + +struct xcall_prog_object { + unsigned long scno; + unsigned long func; +}; + +#define PROG_NAME_LEN 64 +#define MAX_NR_SCNO 32 + +struct xcall_prog { + char name[PROG_NAME_LEN]; + struct module *owner; + struct list_head list; + struct xcall_prog_object objs[MAX_NR_SCNO]; + unsigned int nr_scno; +}; +#endif /* _LINUX_XCALL_H */ -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Add xcall_prog_register()/unregister() interface for user module to register xcall syscall. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Zheng Xinyu <zhengxinyu6@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/kernel/xcall/core.c | 68 +++++++++++++++++++++++++++++++--- include/linux/xcall.h | 12 ++++++ 2 files changed, 74 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/xcall/core.c b/arch/arm64/kernel/xcall/core.c index 863cbb72afa6..9624342a2bb5 100644 --- a/arch/arm64/kernel/xcall/core.c +++ b/arch/arm64/kernel/xcall/core.c @@ -23,17 +23,23 @@ static struct xcall_prog *get_xcall_prog(const char *module) { struct xcall_prog *p; - spin_lock(&prog_list_lock); list_for_each_entry(p, &progs_list, list) { - if (!strcmp(module, p->name)) { - spin_unlock(&prog_list_lock); + if (!strcmp(module, p->name)) return p; - } } - spin_unlock(&prog_list_lock); return NULL; } +static struct xcall_prog *get_xcall_prog_locked(const char *module) +{ + struct xcall_prog *ret; + + spin_lock(&prog_list_lock); + ret = get_xcall_prog(module); + spin_unlock(&prog_list_lock); + + return ret; +} static struct xcall *get_xcall(struct xcall *xcall) { @@ -102,7 +108,7 @@ static void delete_xcall(struct xcall *xcall) /* Init xcall with a given inode */ static int init_xcall(struct xcall *xcall, struct xcall_comm *comm) { - struct xcall_prog *program = get_xcall_prog(comm->module); + struct xcall_prog *program = get_xcall_prog_locked(comm->module); if (!program || !try_module_get(program->owner)) return -EINVAL; @@ -156,3 +162,53 @@ int xcall_detach(struct xcall_comm *comm) delete_xcall(xcall); return 0; } + +static int check_prog(struct xcall_prog *prog) +{ + struct xcall_prog_object *obj = prog->objs; + + prog->nr_scno = 0; + while (obj && obj->func) { + if (obj->scno >= __NR_syscalls) + return -EINVAL; + + prog->nr_scno++; + obj++; + } + + if (!prog->nr_scno || prog->nr_scno > MAX_NR_SCNO) + return -EINVAL; + + pr_info("Successly registered syscall number: %d\n", prog->nr_scno); + return 0; +} + +int xcall_prog_register(struct xcall_prog *prog) +{ + if (!static_key_enabled(&xcall_enable)) + return -EACCES; + + if (check_prog(prog)) + return -EINVAL; + + spin_lock(&prog_list_lock); + if (get_xcall_prog(prog->name)) { + spin_unlock(&prog_list_lock); + return -EBUSY; + } + list_add(&prog->list, &progs_list); + spin_unlock(&prog_list_lock); + return 0; +} +EXPORT_SYMBOL(xcall_prog_register); + +void xcall_prog_unregister(struct xcall_prog *prog) +{ + if (!static_key_enabled(&xcall_enable)) + return; + + spin_lock(&prog_list_lock); + list_del(&prog->list); + spin_unlock(&prog_list_lock); +} +EXPORT_SYMBOL(xcall_prog_unregister); diff --git a/include/linux/xcall.h b/include/linux/xcall.h index 5d1b6e91ed56..a37994bc1d9a 100644 --- a/include/linux/xcall.h +++ b/include/linux/xcall.h @@ -23,4 +23,16 @@ struct xcall_prog { struct xcall_prog_object objs[MAX_NR_SCNO]; unsigned int nr_scno; }; + +#ifdef CONFIG_DYNAMIC_XCALL +extern int xcall_prog_register(struct xcall_prog *prog); +extern void xcall_prog_unregister(struct xcall_prog *prog); +#else /* !CONFIG_DYNAMIC_XCALL */ +static inline int xcall_prog_register(struct xcall_prog *prog) +{ + return -EINVAL; +} +static inline void xcall_prog_unregister(struct xcall_prog *prog) {} +#endif /* CONFIG_DYNAMIC_XCALL */ + #endif /* _LINUX_XCALL_H */ -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- In xcall 2.0 each process is associated with an unique xcall area. In the mmap process, associate an xcall area with all matching executable files and populate the system call table to prepare for hijacking and replacing custom system calls. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Zheng Xinyu <zhengxinyu6@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/xcall.h | 17 ++++++ arch/arm64/kernel/xcall/core.c | 108 +++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 4 ++ include/linux/xcall.h | 13 ++++ kernel/fork.c | 3 + mm/mmap.c | 14 ++++- 6 files changed, 156 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index 9a813d65eb83..fd0232d5bb99 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -4,13 +4,16 @@ #include <linux/atomic.h> #include <linux/jump_label.h> +#include <linux/mm_types.h> #include <linux/percpu.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/xcall.h> +#include <linux/refcount.h> #include <asm/actlr.h> #include <asm/cpufeature.h> +#include <asm/syscall.h> struct xcall_comm { char *name; @@ -30,9 +33,23 @@ struct xcall { char *name; }; +struct xcall_area { + /* + * 0...NR_syscalls - 1: function pointers to hijack default syscall + * NR_syscalls...NR_syscalls * 2 - 1: function pointers in kernel module + */ + unsigned long sys_call_table[NR_syscalls * 2]; + refcount_t ref; + struct xcall *xcall; +}; + #ifdef CONFIG_DYNAMIC_XCALL extern int xcall_attach(struct xcall_comm *info); extern int xcall_detach(struct xcall_comm *info); + +#define mm_xcall_area(mm) ((struct xcall_area *)((mm)->xcall)) +#else +#define mm_xcall_area(mm) (NULL) #endif /* CONFIG_DYNAMIC_XCALL */ DECLARE_STATIC_KEY_FALSE(xcall_enable); diff --git a/arch/arm64/kernel/xcall/core.c b/arch/arm64/kernel/xcall/core.c index 9624342a2bb5..153dad1aacc8 100644 --- a/arch/arm64/kernel/xcall/core.c +++ b/arch/arm64/kernel/xcall/core.c @@ -6,6 +6,7 @@ #define pr_fmt(fmt) "xcall: " fmt #include <linux/slab.h> +#include <linux/syscalls.h> #include <linux/xcall.h> #include <asm/xcall.h> @@ -41,6 +42,8 @@ static struct xcall_prog *get_xcall_prog_locked(const char *module) return ret; } +#define inv_xcall_syscall ((unsigned long)__arm64_sys_ni_syscall) + static struct xcall *get_xcall(struct xcall *xcall) { refcount_inc(&xcall->ref); @@ -121,6 +124,111 @@ static int init_xcall(struct xcall *xcall, struct xcall_comm *comm) return 0; } +static int fill_xcall_syscall(struct xcall_area *area, struct xcall *xcall) +{ + unsigned int scno_offset, scno_count = 0; + struct xcall_prog_object *obj; + + obj = xcall->program->objs; + while (scno_count < xcall->program->nr_scno && obj->func) { + scno_offset = NR_syscalls + obj->scno; + if (area->sys_call_table[scno_offset] != inv_xcall_syscall) { + pr_err("Process can not mount more than one xcall.\n"); + return -EINVAL; + } + + area->sys_call_table[scno_offset] = obj->func; + obj += 1; + scno_count++; + } + + return 0; +} + +static struct xcall_area *create_xcall_area(struct mm_struct *mm) +{ + struct xcall_area *area; + int i; + + area = kzalloc(sizeof(*area), GFP_KERNEL); + if (!area) + return NULL; + + refcount_set(&area->ref, 1); + + for (i = 0; i < NR_syscalls; i++) { + area->sys_call_table[i] = inv_xcall_syscall; + area->sys_call_table[i + NR_syscalls] = inv_xcall_syscall; + } + + smp_store_release(&mm->xcall, area); + return area; +} + +/* + * Initialize the xcall data of mm_struct data. + * And register xcall into one address space, which includes create + * the mm_struct associated xcall_area data + */ +int xcall_mmap(struct vm_area_struct *vma, struct mm_struct *mm) +{ + struct xcall_area *area; + struct xcall *xcall; + int ret = -EINVAL; + + if (list_empty(&xcalls_list)) + return 0; + + spin_lock(&xcall_list_lock); + xcall = find_xcall(NULL, file_inode(vma->vm_file)); + spin_unlock(&xcall_list_lock); + if (!xcall) + return ret; + + if (!xcall->program) + goto put_xcall; + + area = mm_xcall_area(mm); + if (!area && !create_xcall_area(mm)) { + ret = -ENOMEM; + goto put_xcall; + } + + area = (struct xcall_area *)READ_ONCE(mm->xcall); + // Each process is allowed to be associated with only one xcall. + if (!cmpxchg(&area->xcall, NULL, xcall) && !fill_xcall_syscall(area, xcall)) + return 0; + +put_xcall: + put_xcall(xcall); + return ret; +} + +void mm_init_xcall_area(struct mm_struct *mm, struct task_struct *p) +{ + struct xcall_area *area = mm_xcall_area(mm); + + if (area) + refcount_inc(&area->ref); +} + +void clear_xcall_area(struct mm_struct *mm) +{ + struct xcall_area *area = mm_xcall_area(mm); + + if (!area) + return; + + if (!refcount_dec_and_test(&area->ref)) + return; + + if (area->xcall) + put_xcall(area->xcall); + + kfree(area); + mm->xcall = NULL; +} + int xcall_attach(struct xcall_comm *comm) { struct xcall *xcall; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5d6ee378d7d4..082839935cc6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1025,7 +1025,11 @@ struct mm_struct { #else KABI_RESERVE(2) #endif +#ifdef CONFIG_DYNAMIC_XCALL + KABI_USE(3, void *xcall) +#else KABI_RESERVE(3) +#endif KABI_RESERVE(4) KABI_RESERVE(5) /* diff --git a/include/linux/xcall.h b/include/linux/xcall.h index a37994bc1d9a..510aebe4e7c0 100644 --- a/include/linux/xcall.h +++ b/include/linux/xcall.h @@ -8,6 +8,10 @@ #include <linux/module.h> +struct vm_area_struct; +struct mm_struct; +struct inode; + struct xcall_prog_object { unsigned long scno; unsigned long func; @@ -27,12 +31,21 @@ struct xcall_prog { #ifdef CONFIG_DYNAMIC_XCALL extern int xcall_prog_register(struct xcall_prog *prog); extern void xcall_prog_unregister(struct xcall_prog *prog); +extern void mm_init_xcall_area(struct mm_struct *mm, struct task_struct *p); +extern void clear_xcall_area(struct mm_struct *mm); +extern int xcall_mmap(struct vm_area_struct *vma, struct mm_struct *mm); #else /* !CONFIG_DYNAMIC_XCALL */ static inline int xcall_prog_register(struct xcall_prog *prog) { return -EINVAL; } static inline void xcall_prog_unregister(struct xcall_prog *prog) {} +static inline void mm_init_xcall_area(struct mm_struct *mm, struct task_struct *p) {} +static inline void clear_xcall_area(struct mm_struct *mm) {} +static inline int xcall_mmap(struct vm_area_struct *vma, struct mm_struct *mm) +{ + return 0; +} #endif /* CONFIG_DYNAMIC_XCALL */ #endif /* _LINUX_XCALL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index e9ce45e1f971..1ceb5583c5d7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -106,6 +106,7 @@ #endif #include <linux/share_pool.h> #include <linux/tick.h> +#include <linux/xcall.h> #include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -1373,6 +1374,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, #if defined(CONFIG_DAMON_MEM_SAMPLING) mm->damon_fifo = NULL; #endif + mm_init_xcall_area(mm, p); mm_init_uprobes_state(mm); hugetlb_count_init(mm); @@ -1426,6 +1428,7 @@ static inline void __mmput(struct mm_struct *mm) { VM_BUG_ON(atomic_read(&mm->mm_users)); + clear_xcall_area(mm); uprobe_clear_state(mm); exit_aio(mm); ksm_exit(mm); diff --git a/mm/mmap.c b/mm/mmap.c index ce70c8740f8a..bd99a80d4a93 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -49,6 +49,7 @@ #include <linux/ksm.h> #include <linux/share_pool.h> #include <linux/vm_object.h> +#include <linux/xcall.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> @@ -592,9 +593,12 @@ static inline void vma_complete(struct vma_prepare *vp, if (!vp->skip_vma_uprobe) { uprobe_mmap(vp->vma); + xcall_mmap(vp->vma, mm); - if (vp->adj_next) + if (vp->adj_next) { uprobe_mmap(vp->adj_next); + xcall_mmap(vp->adj_next, mm); + } } } @@ -624,8 +628,10 @@ static inline void vma_complete(struct vma_prepare *vp, goto again; } } - if (vp->insert && vp->file) + if (vp->insert && vp->file) { uprobe_mmap(vp->insert); + xcall_mmap(vp->insert, mm); + } validate_mm(mm); } @@ -2998,8 +3004,10 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file, mm->locked_vm += (len >> PAGE_SHIFT); } - if (file) + if (file) { uprobe_mmap(vma); + xcall_mmap(vma, mm); + } /* * New (or expanded) vma always get soft dirty status. -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Hajack syscall with dynamic instruciton replace. With xcal2.0, hardware xcall can directly modify the SVC instruction through dynamic instruction replacement, which avoids unnecessary system call number checks at the exception entry. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Zheng Xinyu <zhengxinyu6@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/Kconfig.turbo | 1 + arch/arm64/include/asm/exception.h | 3 ++ arch/arm64/include/asm/xcall.h | 38 +++++++++++++++++ arch/arm64/kernel/entry-common.c | 2 +- arch/arm64/kernel/probes/uprobes.c | 6 +++ arch/arm64/kernel/syscall.c | 14 ++++++ arch/arm64/kernel/xcall/core.c | 68 ++++++++++++++++++++++++++++++ kernel/events/uprobes.c | 23 ++++++++++ 8 files changed, 154 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig.turbo b/arch/arm64/Kconfig.turbo index cfefbdb605f8..778ea1025c2c 100644 --- a/arch/arm64/Kconfig.turbo +++ b/arch/arm64/Kconfig.turbo @@ -74,6 +74,7 @@ config ACTLR_XCALL_XINT config DYNAMIC_XCALL bool "Support dynamically replace and load system call" depends on FAST_SYSCALL + depends on UPROBES default n help Xcall 2.0 add "/proc/xcall/comm" interface to diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h index d69f0e6d53f8..94338104a18c 100644 --- a/arch/arm64/include/asm/exception.h +++ b/arch/arm64/include/asm/exception.h @@ -75,6 +75,9 @@ void do_el1_fpac(struct pt_regs *regs, unsigned long esr); void do_el0_mops(struct pt_regs *regs, unsigned long esr); void do_serror(struct pt_regs *regs, unsigned long esr); void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags); +#ifdef CONFIG_FAST_SYSCALL +void do_el0_xcall(struct pt_regs *regs); +#endif void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigned long far); diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index fd0232d5bb99..449c1ad4d50f 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -15,6 +15,9 @@ #include <asm/cpufeature.h> #include <asm/syscall.h> +#define SVC_0000 0xd4000001 +#define SVC_FFFF 0xd41fffe1 + struct xcall_comm { char *name; char *binary; @@ -43,13 +46,48 @@ struct xcall_area { struct xcall *xcall; }; +extern const syscall_fn_t *default_sys_call_table(void); #ifdef CONFIG_DYNAMIC_XCALL extern int xcall_attach(struct xcall_comm *info); extern int xcall_detach(struct xcall_comm *info); +extern int xcall_pre_sstep_check(struct pt_regs *regs); +extern int set_xcall_insn(struct mm_struct *mm, unsigned long vaddr, + uprobe_opcode_t opcode); #define mm_xcall_area(mm) ((struct xcall_area *)((mm)->xcall)) + +static inline long hijack_syscall(struct pt_regs *regs) +{ + struct xcall_area *area = mm_xcall_area(current->mm); + unsigned int scno = (unsigned int)regs->regs[8]; + syscall_fn_t syscall_fn; + + if (likely(!area)) + return -EINVAL; + + if (unlikely(scno >= __NR_syscalls)) + return -EINVAL; + + syscall_fn = (syscall_fn_t)area->sys_call_table[scno]; + return syscall_fn(regs); +} + +static inline const syscall_fn_t *real_syscall_table(void) +{ + struct xcall_area *area = mm_xcall_area(current->mm); + + if (likely(!area)) + return default_sys_call_table(); + + return (syscall_fn_t *)(&(area->sys_call_table[__NR_syscalls])); +} #else #define mm_xcall_area(mm) (NULL) +#define hijack_syscall(regs) (NULL) +static inline const syscall_fn_t *real_syscall_table(void) +{ + return sys_call_table; +} #endif /* CONFIG_DYNAMIC_XCALL */ DECLARE_STATIC_KEY_FALSE(xcall_enable); diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index 1e8171c1efe7..f4a21c66856a 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -827,7 +827,7 @@ static void noinstr el0_xcall(struct pt_regs *regs) #endif fp_user_discard(); local_daif_restore(DAIF_PROCCTX); - do_el0_svc(regs); + do_el0_xcall(regs); fast_exit_to_user_mode(regs); } diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c index a2f137a595fc..677a9589f9ca 100644 --- a/arch/arm64/kernel/probes/uprobes.c +++ b/arch/arm64/kernel/probes/uprobes.c @@ -6,6 +6,7 @@ #include <linux/ptrace.h> #include <linux/uprobes.h> #include <asm/cacheflush.h> +#include <asm/xcall.h> #include "decode-insn.h" @@ -171,6 +172,11 @@ static int uprobe_breakpoint_handler(struct pt_regs *regs, if (uprobe_pre_sstep_notifier(regs)) return DBG_HOOK_HANDLED; +#ifdef CONFIG_DYNAMIC_XCALL + if (xcall_pre_sstep_check(regs)) + return DBG_HOOK_HANDLED; +#endif + return DBG_HOOK_ERROR; } diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 07cc0dfef1fe..33b63e01d181 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -14,6 +14,7 @@ #include <asm/syscall.h> #include <asm/thread_info.h> #include <asm/unistd.h> +#include <asm/xcall.h> long a32_arm_syscall(struct pt_regs *regs, int scno); long sys_ni_syscall(void); @@ -162,6 +163,15 @@ static inline void delouse_pt_regs(struct pt_regs *regs) } #endif +#ifdef CONFIG_FAST_SYSCALL +void do_el0_xcall(struct pt_regs *regs) +{ + const syscall_fn_t *t = real_syscall_table(); + + el0_svc_common(regs, regs->regs[8], __NR_syscalls, t); +} +#endif + void do_el0_svc(struct pt_regs *regs) { const syscall_fn_t *t = sys_call_table; @@ -173,6 +183,10 @@ void do_el0_svc(struct pt_regs *regs) } #endif +#ifdef CONFIG_DYNAMIC_XCALL + if (!hijack_syscall(regs)) + return; +#endif el0_svc_common(regs, regs->regs[8], __NR_syscalls, t); } diff --git a/arch/arm64/kernel/xcall/core.c b/arch/arm64/kernel/xcall/core.c index 153dad1aacc8..0cbe97db188d 100644 --- a/arch/arm64/kernel/xcall/core.c +++ b/arch/arm64/kernel/xcall/core.c @@ -5,6 +5,7 @@ #define pr_fmt(fmt) "xcall: " fmt +#include <linux/mmap_lock.h> #include <linux/slab.h> #include <linux/syscalls.h> #include <linux/xcall.h> @@ -44,6 +45,66 @@ static struct xcall_prog *get_xcall_prog_locked(const char *module) #define inv_xcall_syscall ((unsigned long)__arm64_sys_ni_syscall) +static long patch_syscall(struct pt_regs *regs); + +static long filter_ksyscall(struct pt_regs *regs) +{ + struct xcall_area *area = mm_xcall_area(current->mm); + unsigned int scno = (unsigned int)regs->regs[8]; + + cmpxchg(&(area->sys_call_table[scno]), filter_ksyscall, patch_syscall); + regs->pc -= AARCH64_INSN_SIZE; + return 0; +} + +static long replay_syscall(struct pt_regs *regs) +{ + regs->pc -= AARCH64_INSN_SIZE; + return 0; +} + +static long patch_syscall(struct pt_regs *regs) +{ + struct xcall_area *area = mm_xcall_area(current->mm); + unsigned int scno = (unsigned int)regs->regs[8]; + syscall_fn_t syscall_fn; + unsigned long old; + int ret; + + old = cmpxchg(&(area->sys_call_table[scno]), patch_syscall, replay_syscall); + if (old != (unsigned long)patch_syscall) { + syscall_fn = (syscall_fn_t)area->sys_call_table[scno]; + return syscall_fn(regs); + } + + regs->pc -= AARCH64_INSN_SIZE; + + mmap_write_lock(current->mm); + ret = set_xcall_insn(current->mm, regs->pc, SVC_FFFF); + mmap_write_unlock(current->mm); + + if (!ret) { + xchg(&(area->sys_call_table[scno]), filter_ksyscall); + return 0; + } + + regs->pc += AARCH64_INSN_SIZE; + xchg(&(area->sys_call_table[scno]), patch_syscall); + pr_info("patch xcall insn failed for scno %u at %s.\n", + scno, ret > 0 ? "UPROBE_BRK" : "SVC_FFFF"); + + return ret; +} + +int xcall_pre_sstep_check(struct pt_regs *regs) +{ + struct xcall_area *area = mm_xcall_area(current->mm); + unsigned int scno = (unsigned int)regs->regs[8]; + + return area && (scno < NR_syscalls) && + (area->sys_call_table[scno] != inv_xcall_syscall); +} + static struct xcall *get_xcall(struct xcall *xcall) { refcount_inc(&xcall->ref); @@ -138,6 +199,7 @@ static int fill_xcall_syscall(struct xcall_area *area, struct xcall *xcall) } area->sys_call_table[scno_offset] = obj->func; + area->sys_call_table[obj->scno] = (unsigned long)patch_syscall; obj += 1; scno_count++; } @@ -320,3 +382,9 @@ void xcall_prog_unregister(struct xcall_prog *prog) spin_unlock(&prog_list_lock); } EXPORT_SYMBOL(xcall_prog_unregister); + +const syscall_fn_t *default_sys_call_table(void) +{ + return sys_call_table; +} +EXPORT_SYMBOL(default_sys_call_table); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 609e48784f77..e382f7e4d5d9 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -595,6 +595,29 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v *(uprobe_opcode_t *)&auprobe->insn); } +#ifdef CONFIG_DYNAMIC_XCALL +/* + * Force to patch any instruction without checking the old instruction + * is UPROBE_BRK. + */ +int set_xcall_insn(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode) +{ + struct uprobe uprobe = { .ref_ctr_offset = 0 }; + int ret; + + /* Use the UPROBE_SWBP_INSN to occupy the vaddr avoid uprobe writes it */ + ret = uprobe_write_opcode(&uprobe.arch, mm, vaddr, UPROBE_SWBP_INSN); + if (ret) + return 1; + + ret = uprobe_write_opcode(&uprobe.arch, mm, vaddr, opcode); + if (ret) + return -1; + + return 0; +} +#endif + static struct uprobe *get_uprobe(struct uprobe *uprobe) { refcount_inc(&uprobe->ref); -- 2.34.1
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Due to the default HCR_EL2.TACR value is 1, enable ACTLR_XCALL system-wide by default to avoid the overhead of vCPU trapping out caused by accessing ACTLR_XCALL takes place during scheduling context switch. And Separate the control interface of xcall in userspace into the two parts. The first one aims to register xcall to individual TASK via /proc/[pid]/xcall. The second one amis to register xcall to individual BINARY file via /proc/xcall/comm. So it needs some cleanup to the code to implement the first one. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/mmu_context.h | 7 --- arch/arm64/include/asm/xcall.h | 76 ---------------------------- arch/arm64/kernel/cpufeature.c | 71 +++++++++++++------------- arch/arm64/kernel/xcall/entry.S | 22 +------- arch/arm64/kernel/xcall/xcall.c | 40 +-------------- arch/arm64/kvm/sys_regs.c | 1 + fs/proc/proc_xcall.c | 64 +---------------------- 7 files changed, 40 insertions(+), 241 deletions(-) diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index 39595fa03491..a6fb325424e7 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -24,9 +24,6 @@ #include <asm/cputype.h> #include <asm/sysreg.h> #include <asm/tlbflush.h> -#ifdef CONFIG_ACTLR_XCALL_XINT -#include <asm/xcall.h> -#endif extern bool rodata_full; @@ -267,10 +264,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next, if (prev != next) __switch_mm(next); -#ifdef CONFIG_ACTLR_XCALL_XINT - cpu_switch_xcall_entry(tsk); -#endif - /* * Update the saved TTBR0_EL1 of the scheduled-in task as the previous * value may have not been initialised yet (activate_mm caller) or the diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index 449c1ad4d50f..baf766e8c987 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -2,17 +2,12 @@ #ifndef __ASM_XCALL_H #define __ASM_XCALL_H -#include <linux/atomic.h> #include <linux/jump_label.h> #include <linux/mm_types.h> -#include <linux/percpu.h> #include <linux/sched.h> -#include <linux/types.h> #include <linux/xcall.h> #include <linux/refcount.h> -#include <asm/actlr.h> -#include <asm/cpufeature.h> #include <asm/syscall.h> #define SVC_0000 0xd4000001 @@ -101,75 +96,4 @@ struct xcall_info { int xcall_init_task(struct task_struct *p, struct task_struct *orig); void xcall_task_free(struct task_struct *p); - -#ifdef CONFIG_ACTLR_XCALL_XINT -struct hw_xcall_info { - /* Must be first! */ - void *xcall_entry[__NR_syscalls + 1]; - atomic_t xcall_scno_count; - /* keep xcall_entry and xcall scno count consistent */ - spinlock_t lock; -}; - -#define TASK_HW_XINFO(p) ((struct hw_xcall_info *)p->xinfo) -#define XCALL_ENTRY_SIZE (sizeof(unsigned long) * (__NR_syscalls + 1)) - -DECLARE_PER_CPU(void *, __cpu_xcall_entry); -extern void xcall_entry(void); -extern void no_xcall_entry(void); - -static inline bool is_xcall_entry(struct hw_xcall_info *xinfo, unsigned int sc_no) -{ - return xinfo->xcall_entry[sc_no] == xcall_entry; -} - -static inline int set_hw_xcall_entry(struct hw_xcall_info *xinfo, - unsigned int sc_no, bool enable) -{ - spin_lock(&xinfo->lock); - if (enable && !is_xcall_entry(xinfo, sc_no)) { - xinfo->xcall_entry[sc_no] = xcall_entry; - atomic_inc(&xinfo->xcall_scno_count); - } - - if (!enable && is_xcall_entry(xinfo, sc_no)) { - xinfo->xcall_entry[sc_no] = no_xcall_entry; - atomic_dec(&xinfo->xcall_scno_count); - } - spin_unlock(&xinfo->lock); - - return 0; -} - -static inline void cpu_set_arch_xcall(bool enable) -{ - u64 el = read_sysreg(CurrentEL); - u64 val; - - if (el == CurrentEL_EL2) { - val = read_sysreg(actlr_el2); - val = enable ? (val | ACTLR_ELx_XCALL) : (val & ~ACTLR_ELx_XCALL); - write_sysreg(val, actlr_el2); - } else { - val = read_sysreg(actlr_el1); - val = enable ? (val | ACTLR_ELx_XCALL) : (val & ~ACTLR_ELx_XCALL); - write_sysreg(val, actlr_el1); - } -} - -static inline void cpu_switch_xcall_entry(struct task_struct *tsk) -{ - struct hw_xcall_info *xinfo = tsk->xinfo; - - if (!system_uses_xcall_xint() || !tsk->xinfo) - return; - - if (unlikely(atomic_read(&xinfo->xcall_scno_count) > 0)) { - __this_cpu_write(__cpu_xcall_entry, xinfo->xcall_entry); - cpu_set_arch_xcall(true); - } else - cpu_set_arch_xcall(false); -} -#endif /* CONFIG_ACTLR_XCALL_XINT */ - #endif /* __ASM_XCALL_H */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7aa13adffcda..23cae64e5528 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2444,6 +2444,39 @@ static void mpam_extra_caps(void) #include <asm/xcall.h> DEFINE_STATIC_KEY_FALSE(xcall_enable); +static int __init xcall_setup(char *str) +{ + static_branch_enable(&xcall_enable); + + return 1; +} +__setup("xcall", xcall_setup); + +static bool has_xcall_support(const struct arm64_cpu_capabilities *entry, int __unused) +{ + return static_key_enabled(&xcall_enable); +} +#endif + +#ifdef CONFIG_FAST_IRQ +bool is_xint_support; +static int __init xint_setup(char *str) +{ + if (!cpus_have_cap(ARM64_HAS_GIC_CPUIF_SYSREGS)) + return 1; + + is_xint_support = true; + return 1; +} +__setup("xint", xint_setup); + +static bool has_xint_support(const struct arm64_cpu_capabilities *entry, int __unused) +{ + return is_xint_support; +} +#endif + +#ifdef CONFIG_ACTLR_XCALL_XINT #define AIDR_ELx_XCALL_SHIFT 32 #define AIDR_ELx_XCALL (UL(1) << AIDR_ELx_XCALL_SHIFT) @@ -2478,40 +2511,6 @@ static bool is_arch_xcall_xint_support(void) return false; } -static int __init xcall_setup(char *str) -{ - if (!is_arch_xcall_xint_support()) - static_branch_enable(&xcall_enable); - - return 1; -} -__setup("xcall", xcall_setup); - -static bool has_xcall_support(const struct arm64_cpu_capabilities *entry, int __unused) -{ - return static_key_enabled(&xcall_enable); -} -#endif - -#ifdef CONFIG_FAST_IRQ -bool is_xint_support; -static int __init xint_setup(char *str) -{ - if (!cpus_have_cap(ARM64_HAS_GIC_CPUIF_SYSREGS)) - return 1; - - is_xint_support = true; - return 1; -} -__setup("xint", xint_setup); - -static bool has_xint_support(const struct arm64_cpu_capabilities *entry, int __unused) -{ - return is_xint_support; -} -#endif - -#ifdef CONFIG_ACTLR_XCALL_XINT static bool has_arch_xcall_xint_support(const struct arm64_cpu_capabilities *entry, int scope) { return is_arch_xcall_xint_support(); @@ -2555,14 +2554,14 @@ static void cpu_enable_arch_xcall_xint(const struct arm64_cpu_capabilities *__un el = read_sysreg(CurrentEL); if (el == CurrentEL_EL2) { actlr_el2 = read_sysreg(actlr_el2); - actlr_el2 |= ACTLR_ELx_XINT; + actlr_el2 |= (ACTLR_ELx_XINT | ACTLR_ELx_XCALL); write_sysreg(actlr_el2, actlr_el2); isb(); actlr_el2 = read_sysreg(actlr_el2); pr_info("actlr_el2: %llx, cpu:%d\n", actlr_el2, cpu); } else { actlr_el1 = read_sysreg(actlr_el1); - actlr_el1 |= ACTLR_ELx_XINT; + actlr_el1 |= (ACTLR_ELx_XINT | ACTLR_ELx_XCALL); write_sysreg(actlr_el1, actlr_el1); isb(); actlr_el1 = read_sysreg(actlr_el1); diff --git a/arch/arm64/kernel/xcall/entry.S b/arch/arm64/kernel/xcall/entry.S index 401be46f4fc2..7b75e8651a2a 100644 --- a/arch/arm64/kernel/xcall/entry.S +++ b/arch/arm64/kernel/xcall/entry.S @@ -151,33 +151,13 @@ alternative_else_nop_endif sb .endm /* .macro hw_xcal_restore_base_regs */ -SYM_CODE_START(no_xcall_entry) - ldp x20, x21, [sp, #0] - kernel_entry 0, 64 - mov x0, sp - bl el0t_64_sync_handler - b ret_to_user -SYM_CODE_END(no_xcall_entry) - SYM_CODE_START(xcall_entry) - ldp x20, x21, [sp, #0] hw_xcall_save_base_regs mov x0, sp bl el0t_64_xcall_handler hw_xcal_restore_base_regs SYM_CODE_END(xcall_entry) -SYM_CODE_START_LOCAL(el0t_64_hw_xcall) - stp x20, x21, [sp, #0] - ldr_this_cpu x21, __cpu_xcall_entry, x20 - mov x20, __NR_syscalls - /* x8 >= __NR_syscalls */ - cmp x8, __NR_syscalls - csel x20, x8, x20, lt - ldr x21, [x21, x20, lsl #3] - br x21 -SYM_CODE_END(el0t_64_hw_xcall) - .macro xcall_ventry .align 7 .Lventry_start\@: @@ -190,6 +170,6 @@ SYM_CODE_END(el0t_64_hw_xcall) msr tpidrro_el0, xzr .Lskip_tramp_vectors_cleanup\@: sub sp, sp, #PT_REGS_SIZE - b el0t_64_hw_xcall + b xcall_entry .org .Lventry_start\@ + 128 // Did we overflow the ventry slot? .endm diff --git a/arch/arm64/kernel/xcall/xcall.c b/arch/arm64/kernel/xcall/xcall.c index d8eaec7e4637..31072c0402f4 100644 --- a/arch/arm64/kernel/xcall/xcall.c +++ b/arch/arm64/kernel/xcall/xcall.c @@ -6,7 +6,6 @@ */ #include <linux/bitmap.h> -#include <linux/percpu.h> #include <linux/sched.h> #include <linux/slab.h> #include <asm/xcall.h> @@ -25,45 +24,8 @@ static inline int sw_xcall_init_task(struct task_struct *p, struct task_struct * return 0; } -#ifdef CONFIG_ACTLR_XCALL_XINT -static const void *default_syscall_table[__NR_syscalls + 1] = { - [0 ... __NR_syscalls] = no_xcall_entry, -}; - -asmlinkage DEFINE_PER_CPU(void *, __cpu_xcall_entry) = default_syscall_table; -static inline int hw_xcall_init_task(struct task_struct *p, struct task_struct *orig) -{ - struct hw_xcall_info *p_xinfo, *orig_xinfo; - - p->xinfo = kzalloc(sizeof(struct hw_xcall_info), GFP_KERNEL); - if (!p->xinfo) - return -ENOMEM; - - p_xinfo = TASK_HW_XINFO(p); - spin_lock_init(&p_xinfo->lock); - - if (!orig->xinfo) { - memcpy(p->xinfo, default_syscall_table, XCALL_ENTRY_SIZE); - atomic_set(&p_xinfo->xcall_scno_count, 0); - } else { - orig_xinfo = TASK_HW_XINFO(orig); - spin_lock(&orig_xinfo->lock); - memcpy(p->xinfo, orig->xinfo, XCALL_ENTRY_SIZE); - atomic_set(&p_xinfo->xcall_scno_count, - atomic_read(&orig_xinfo->xcall_scno_count)); - spin_unlock(&orig_xinfo->lock); - } - - return 0; -} -#endif - int xcall_init_task(struct task_struct *p, struct task_struct *orig) { -#ifdef CONFIG_ACTLR_XCALL_XINT - if (system_uses_xcall_xint()) - return hw_xcall_init_task(p, orig); -#endif if (static_branch_unlikely(&xcall_enable)) return sw_xcall_init_task(p, orig); @@ -72,6 +34,6 @@ int xcall_init_task(struct task_struct *p, struct task_struct *orig) void xcall_task_free(struct task_struct *p) { - if (system_uses_xcall_xint() || static_branch_unlikely(&xcall_enable)) + if (static_branch_unlikely(&xcall_enable)) kfree(p->xinfo); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 416693a3200a..73d5102e7a10 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -17,6 +17,7 @@ #include <linux/printk.h> #include <linux/uaccess.h> +#include <asm/actlr.h> #include <asm/cacheflush.h> #include <asm/cputype.h> #include <asm/debug-monitors.h> diff --git a/fs/proc/proc_xcall.c b/fs/proc/proc_xcall.c index 5a417bc7cb0a..5f45d0799b33 100644 --- a/fs/proc/proc_xcall.c +++ b/fs/proc/proc_xcall.c @@ -4,57 +4,11 @@ * * Copyright (C) 2025 Huawei Ltd. */ -#include <linux/cpufeature.h> #include <linux/sched.h> #include <linux/seq_file.h> #include <asm/xcall.h> #include "internal.h" -#ifdef CONFIG_ACTLR_XCALL_XINT -static void proc_hw_xcall_show(struct task_struct *p, struct seq_file *m) -{ - struct hw_xcall_info *hw_xinfo = TASK_HW_XINFO(p); - unsigned int i, start = 0, end = 0; - bool in_range = false; - - if (!hw_xinfo) - return; - - for (i = 0; i < __NR_syscalls; i++) { - bool scno_xcall_enable = is_xcall_entry(hw_xinfo, i); - - if (scno_xcall_enable && !in_range) { - in_range = true; - start = i; - } - - if ((!scno_xcall_enable || i == __NR_syscalls - 1) && in_range) { - in_range = false; - end = scno_xcall_enable ? i : i - 1; - if (i == start + 1) - seq_printf(m, "%u,", start); - else - seq_printf(m, "%u-%u,", start, end); - } - } - seq_puts(m, "\n"); -} - -static int proc_set_hw_xcall(struct task_struct *p, unsigned int sc_no, - bool is_clear) -{ - struct hw_xcall_info *hw_xinfo = TASK_HW_XINFO(p); - - if (!is_clear) - return set_hw_xcall_entry(hw_xinfo, sc_no, true); - - if (is_clear) - return set_hw_xcall_entry(hw_xinfo, sc_no, false); - - return -EINVAL; -} -#endif - static int xcall_show(struct seq_file *m, void *v) { struct inode *inode = m->private; @@ -62,20 +16,13 @@ static int xcall_show(struct seq_file *m, void *v) unsigned int rs, re; struct xcall_info *xinfo; - if (!system_uses_xcall_xint() && !static_key_enabled(&xcall_enable)) + if (!static_key_enabled(&xcall_enable)) return -EACCES; p = get_proc_task(inode); if (!p) return -ESRCH; -#ifdef CONFIG_ACTLR_XCALL_XINT - if (system_uses_xcall_xint()) { - proc_hw_xcall_show(p, m); - goto out; - } -#endif - xinfo = TASK_XINFO(p); if (!xinfo) goto out; @@ -124,7 +71,7 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, int is_clear = 0; struct xcall_info *xinfo; - if (!system_uses_xcall_xint() && !static_key_enabled(&xcall_enable)) + if (!static_key_enabled(&xcall_enable)) return -EACCES; memset(buffer, 0, sizeof(buffer)); @@ -148,13 +95,6 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, goto out; } -#ifdef CONFIG_ACTLR_XCALL_XINT - if (system_uses_xcall_xint()) { - ret = proc_set_hw_xcall(p, sc_no, is_clear); - goto out; - } -#endif - xinfo = TASK_XINFO(p); if (!is_clear && !test_bit(sc_no, xinfo->xcall_enable)) ret = xcall_enable_one(xinfo, sc_no); -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Use the symbol el0t_64_svc_entry as the dispatcher of svc exception handler: - el0_slow_syscall: use kernel_entry and ret_to_user to do exception context switch, additionally use el0_svc to invoke syscall functions. - el0_fast_syscall: use hw_xcall_save_base_regs and hw_xcall_restore_base_regs to do low-overhead context switch, additionally use el0_svc to invoke syscall functions. - el0_xcall_syscall: use hw_xcall_save_base_regs and hw_xcall_restore_base_regs to do low-overhead context switch, additionally use el0_xcall to invoke dynamically load syscall functions. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/exception.h | 1 - arch/arm64/include/asm/xcall.h | 3 +- arch/arm64/kernel/entry-common.c | 26 +++--- arch/arm64/kernel/entry.S | 86 +++---------------- arch/arm64/kernel/process.c | 5 ++ arch/arm64/kernel/xcall/entry.S | 127 +++++++++++++++++++++++++++-- arch/arm64/kernel/xcall/xcall.c | 16 +++- fs/proc/proc_xcall.c | 80 ++++++++---------- 8 files changed, 206 insertions(+), 138 deletions(-) diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h index 94338104a18c..1d87f724719d 100644 --- a/arch/arm64/include/asm/exception.h +++ b/arch/arm64/include/asm/exception.h @@ -83,6 +83,5 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne #ifdef CONFIG_ACTLR_XCALL_XINT asmlinkage void el0t_64_xint_handler(struct pt_regs *regs); -asmlinkage void el0t_64_xcall_handler(struct pt_regs *regs); #endif #endif /* __ASM_EXCEPTION_H */ diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index baf766e8c987..735a14870e4e 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -89,11 +89,12 @@ DECLARE_STATIC_KEY_FALSE(xcall_enable); struct xcall_info { /* Must be first! */ - DECLARE_BITMAP(xcall_enable, __NR_syscalls); + u8 xcall_enable[__NR_syscalls + 1]; }; #define TASK_XINFO(p) ((struct xcall_info *)p->xinfo) int xcall_init_task(struct task_struct *p, struct task_struct *orig); void xcall_task_free(struct task_struct *p); +void xcall_info_switch(struct task_struct *p); #endif /* __ASM_XCALL_H */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index f4a21c66856a..c72993bb4563 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -207,7 +207,7 @@ static __always_inline void fast_enter_from_user_mode(struct pt_regs *regs) mte_disable_tco_entry(current); #endif } -#endif +#endif /* CONFIG_FAST_SYSCALL || CONFIG_FAST_IRQ */ /* * Handle IRQ/context state management when entering an NMI from user/kernel @@ -818,8 +818,8 @@ static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr) } #ifdef CONFIG_FAST_SYSCALL -/* Copy from el0_sync */ -static void noinstr el0_xcall(struct pt_regs *regs) +/* dynamically load syscall handler */ +asmlinkage void noinstr el0_xcall_syscall(struct pt_regs *regs) { fast_enter_from_user_mode(regs); #ifndef CONFIG_SECURITY_FEATURE_BYPASS @@ -831,11 +831,21 @@ static void noinstr el0_xcall(struct pt_regs *regs) fast_exit_to_user_mode(regs); } -asmlinkage void noinstr el0t_64_fast_syscall_handler(struct pt_regs *regs) +/* low-overhead syscall handler */ +asmlinkage void noinstr el0_fast_syscall(struct pt_regs *regs) { - el0_xcall(regs); -} + fast_enter_from_user_mode(regs); +#ifndef CONFIG_SECURITY_FEATURE_BYPASS + cortex_a76_erratum_1463225_svc_handler(); #endif + fp_user_discard(); + local_daif_restore(DAIF_PROCCTX); + do_el0_svc(regs); + fast_exit_to_user_mode(regs); +} + +asmlinkage void el0_slow_syscall(struct pt_regs *regs) __alias(el0_svc); +#endif /* CONFIG_FAST_SYSCALL */ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs) { @@ -1052,10 +1062,6 @@ UNHANDLED(el0t, 32, error) #endif /* CONFIG_AARCH32_EL0 */ #ifdef CONFIG_ACTLR_XCALL_XINT -asmlinkage void noinstr el0t_64_xcall_handler(struct pt_regs *regs) -{ - el0_xcall(regs); -} asmlinkage void noinstr el0t_64_xint_handler(struct pt_regs *regs) { el0_interrupt(regs, ISR_EL1_IS, handle_arch_irq, handle_arch_nmi_irq); diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 5648a3119f90..cceb4526745f 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -552,6 +552,10 @@ tsk .req x28 // current thread_info .text +#ifdef CONFIG_FAST_SYSCALL +#include "xcall/entry.S" +#endif + /* * Exception vectors. */ @@ -569,7 +573,11 @@ SYM_CODE_START(vectors) kernel_ventry 1, h, 64, fiq // FIQ EL1h kernel_ventry 1, h, 64, error // Error EL1h +#ifdef CONFIG_FAST_SYSCALL + sync_ventry // Synchronous 64-bit EL0 +#else kernel_ventry 0, t, 64, sync // Synchronous 64-bit EL0 +#endif kernel_ventry 0, t, 64, irq // IRQ 64-bit EL0 kernel_ventry 0, t, 64, fiq // FIQ 64-bit EL0 kernel_ventry 0, t, 64, error // Error 64-bit EL0 @@ -581,8 +589,6 @@ SYM_CODE_START(vectors) SYM_CODE_END(vectors) #ifdef CONFIG_ACTLR_XCALL_XINT -#include "xcall/entry.S" - .align 11 SYM_CODE_START(vectors_xcall_xint) kernel_ventry 1, t, 64, sync // Synchronous EL1t @@ -595,7 +601,11 @@ SYM_CODE_START(vectors_xcall_xint) kernel_ventry 1, h, 64, fiq // FIQ EL1h kernel_ventry 1, h, 64, error // Error EL1h +#ifdef CONFIG_FAST_SYSCALL + sync_ventry // Synchronous 64-bit EL0 +#else kernel_ventry 0, t, 64, sync // Synchronous 64-bit EL0 +#endif kernel_ventry 0, t, 64, irq // IRQ 64-bit EL0 kernel_ventry 0, t, 64, fiq // FIQ 64-bit EL0 kernel_ventry 0, t, 64, error // Error 64-bit EL0 @@ -605,7 +615,7 @@ SYM_CODE_START(vectors_xcall_xint) kernel_ventry 0, t, 32, fiq // FIQ 32-bit EL0 kernel_ventry 0, t, 32, error // Error 32-bit EL0 SYM_CODE_END(vectors_xcall_xint) -#endif +#endif /* CONFIG_ACTLR_XCALL_XINT */ #ifdef CONFIG_VMAP_STACK SYM_CODE_START_LOCAL(__bad_stack) @@ -637,65 +647,6 @@ SYM_CODE_START_LOCAL(__bad_stack) SYM_CODE_END(__bad_stack) #endif /* CONFIG_VMAP_STACK */ -#ifdef CONFIG_FAST_SYSCALL - .macro check_esr_el1_ec_svc64 - /* Only support SVC64 for now */ - mrs x20, esr_el1 - lsr w20, w20, #ESR_ELx_EC_SHIFT - cmp x20, #ESR_ELx_EC_SVC64 - .endm - - .macro check_syscall_nr - cmp x8, __NR_syscalls - .endm - - .macro check_xcall_enable - /* x21 = task_struct->xinfo->xcall_enable */ - ldr_this_cpu x20, __entry_task, x21 - ldr x21, [x20, #TSK_XCALL] - /* x20 = sc_no / 8 */ - lsr x20, x8, 3 - ldr x21, [x21, x20] - /* x8 = sc_no % 8 */ - and x8, x8, 7 - mov x20, 1 - lsl x20, x20, x8 - and x21, x21, x20 - cmp x21, 0 - .endm - - .macro check_xcall_pre_kernel_entry - stp x20, x21, [sp, #0] - /* is ESR_ELx_EC_SVC64 */ - check_esr_el1_ec_svc64 - bne .Lskip_xcall\@ - /* x8 >= __NR_syscalls */ - check_syscall_nr - bhs .Lskip_xcall\@ - str x8, [sp, #16] - /* is xcall enabled */ - check_xcall_enable - ldr x8, [sp, #16] - beq .Lskip_xcall\@ - ldp x20, x21, [sp, #0] - /* do xcall */ -#ifdef CONFIG_SECURITY_FEATURE_BYPASS - kernel_entry 0, 64, xcall -#else - kernel_entry 0, 64 -#endif - mov x0, sp - bl el0t_64_fast_syscall_handler -#ifdef CONFIG_SECURITY_FEATURE_BYPASS - kernel_exit 0, xcall -#else - b ret_to_user -#endif -.Lskip_xcall\@: - ldp x20, x21, [sp, #0] - .endm -#endif - #ifdef CONFIG_FAST_IRQ .macro check_xint_pre_kernel_entry stp x0, x1, [sp, #0] @@ -748,16 +699,6 @@ SYM_CODE_END(__bad_stack) .macro entry_handler el:req, ht:req, regsize:req, label:req SYM_CODE_START_LOCAL(el\el\ht\()_\regsize\()_\label) -#ifdef CONFIG_FAST_SYSCALL - .if \el == 0 && \regsize == 64 && \label == sync - /* Only support el0 aarch64 sync exception */ - alternative_if_not ARM64_HAS_XCALL - b .Lret_to_kernel_entry\@ - alternative_else_nop_endif - check_xcall_pre_kernel_entry - .Lret_to_kernel_entry\@: - .endif -#endif #ifdef CONFIG_FAST_IRQ .if \regsize == 64 && \label == irq && \el == 0 && \ht == t alternative_if_not ARM64_HAS_XINT @@ -797,7 +738,6 @@ SYM_CODE_END(el\el\ht\()_\regsize\()_\label) entry_handler 0, t, 64, error #ifdef CONFIG_ACTLR_XCALL_XINT - entry_handler 0, t, 64, xcall entry_handler 0, t, 64, xint #endif entry_handler 0, t, 32, sync diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index fe3f89445fcb..e9e5ce956f15 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -55,6 +55,7 @@ #include <asm/stacktrace.h> #include <asm/switch_to.h> #include <asm/system_misc.h> +#include <asm/xcall.h> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) #include <linux/stackprotector.h> @@ -472,6 +473,10 @@ DEFINE_PER_CPU(struct task_struct *, __entry_task); static void entry_task_switch(struct task_struct *next) { __this_cpu_write(__entry_task, next); +#ifdef CONFIG_FAST_SYSCALL + if (static_branch_unlikely(&xcall_enable)) + xcall_info_switch(next); +#endif } /* diff --git a/arch/arm64/kernel/xcall/entry.S b/arch/arm64/kernel/xcall/entry.S index 7b75e8651a2a..f531f1e17f48 100644 --- a/arch/arm64/kernel/xcall/entry.S +++ b/arch/arm64/kernel/xcall/entry.S @@ -84,7 +84,7 @@ alternative_else_nop_endif #endif .endm /* .macro hw_xcall_save_base_regs */ - .macro hw_xcal_restore_base_regs + .macro hw_xcall_restore_base_regs #ifdef CONFIG_ARM64_PSEUDO_NMI alternative_if_not ARM64_HAS_GIC_PRIO_MASKING b .Lskip_pmr_restore\@ @@ -149,15 +149,89 @@ alternative_else_nop_endif add sp, sp, #PT_REGS_SIZE // restore sp eret sb - .endm /* .macro hw_xcal_restore_base_regs */ + .endm /* .macro hw_xcall_restore_base_regs */ -SYM_CODE_START(xcall_entry) +SYM_CODE_START_LOCAL(el0t_64_svc_entry) +alternative_if_not ARM64_HAS_HW_XCALL_XINT + /* Hijack SVC to dynamically load syscalls via '/proc/xcall/comm' */ + ldr x20, [sp, #S_SYSCALLNO] // ESR.bits[15,0] + cmp x20, 0xfff + b.ge el0t_64_xcall_entry +alternative_else_nop_endif + + /* Hijack SVC to low overhead syscalls via '/prox/[pid]/xcall' */ + cmp x8, __NR_syscalls + b.ge .slow_syscall + ldr_this_cpu x21, __xcall_info, x20 + ldrb w20, [x21, x8] + cmp x20, 0 + bne el0t_fast_syscall + +.slow_syscall: + ldp x20, x21, [sp, #16 * 10] + kernel_entry 0, 64 + mov x0, sp + bl el0_slow_syscall + b ret_to_user +SYM_INNER_LABEL(el0t_64_xcall_entry, SYM_L_GLOBAL) + lsr x20, x20, #12 + adr x21, .xcall_func_table + ldr w20, [x21, x20, lsl #2] + add x20, x20, x21 + br x20 + /* ISS==0F~FF: Entry to optimized and customized syscalls + */ +.xcall_func_table: + .rept 16 + .word el0t_xcall_syscall - .xcall_func_table + .endr +SYM_CODE_END(el0t_64_svc_entry) + +SYM_CODE_START_LOCAL(el0t_xcall_syscall) + ldp x20, x21, [sp, #16 * 10] hw_xcall_save_base_regs mov x0, sp - bl el0t_64_xcall_handler - hw_xcal_restore_base_regs -SYM_CODE_END(xcall_entry) + bl el0_xcall_syscall + hw_xcall_restore_base_regs +SYM_CODE_END(el0t_xcall_syscall) +SYM_CODE_START_LOCAL(el0t_fast_syscall) + ldp x20, x21, [sp, #16 * 10] + hw_xcall_save_base_regs + mov x0, sp + bl el0_fast_syscall + hw_xcall_restore_base_regs +SYM_CODE_END(el0t_fast_syscall) + +SYM_CODE_START_LOCAL(el0t_64_sync_ventry) + kernel_ventry 0, t, 64, sync +SYM_CODE_END(el0t_64_sync_ventry) + +SYM_CODE_START_LOCAL(el0t_64_sync_ventry_vector) + ldp x20, x21, [sp, #16 * 10] + add sp, sp, #PT_REGS_SIZE + kernel_ventry 0, t, 64, sync +SYM_CODE_END(el0t_64_sync_ventry_vector) + +SYM_CODE_START_LOCAL(el0t_64_sync_table) + // 0 - (ESR_ELx_EC_SVC64 - 1) + .rept ESR_ELx_EC_SVC64 + .word el0t_64_sync_table - el0t_64_sync_ventry_vector + .endr + // ESR_ELx_EC_SVC64 + .word el0t_64_sync_table - el0t_64_svc_entry + // (ESR_ELx_EC_SVC64 + 1) - ESR_ELx_EC_MAX + .rept ESR_ELx_EC_MAX - ESR_ELx_EC_SVC64 + .word el0t_64_sync_table - el0t_64_sync_ventry_vector + .endr +SYM_CODE_END(el0t_64_sync_table) + +/*********************************************** + * * + * Xcall exception entry code for 920G CPU * + * * + ***********************************************/ +#ifdef CONFIG_ACTLR_XCALL_XINT .macro xcall_ventry .align 7 .Lventry_start\@: @@ -170,6 +244,45 @@ SYM_CODE_END(xcall_entry) msr tpidrro_el0, xzr .Lskip_tramp_vectors_cleanup\@: sub sp, sp, #PT_REGS_SIZE - b xcall_entry + stp x20, x21, [sp, #16 * 10] + /* Decode ESR.ICC bits[15, 0] for use later */ + mrs x21, esr_el1 + uxth w20, w21 + b el0t_64_xcall_entry +.org .Lventry_start\@ + 128 // Did we overflow the ventry slot? + .endm +#endif /* CONFIG_ACTLR_XCALL_XINT */ + +/**************************************************************** + * * + * Sync exception entry code for early CPUs before 920G * + * * + ****************************************************************/ + .macro sync_ventry + .align 7 +.Lventry_start\@: + /* + * This must be the first instruction of the EL0 vector entries. It is + * skipped by the trampoline vectors, to trigger the cleanup. + */ + b .Lskip_tramp_vectors_cleanup\@ + mrs x30, tpidrro_el0 + msr tpidrro_el0, xzr +.Lskip_tramp_vectors_cleanup\@: +alternative_if_not ARM64_HAS_XCALL + b el0t_64_sync_ventry +alternative_else_nop_endif + sub sp, sp, #PT_REGS_SIZE + /* Save ESR and ICC.bits[15,0] for use later */ + stp x20, x21, [sp, #16 * 10] + mrs x20, esr_el1 + uxth w21, w20 + stp x20, x21, [sp, #(S_SYSCALLNO - 8)] + /* Using jump table for different exception causes */ + lsr w21, w20, #ESR_ELx_EC_SHIFT + adr x20, el0t_64_sync_table + ldr w21, [x20, x21, lsl #2] + sub x20, x20, x21 + br x20 .org .Lventry_start\@ + 128 // Did we overflow the ventry slot? .endm diff --git a/arch/arm64/kernel/xcall/xcall.c b/arch/arm64/kernel/xcall/xcall.c index 31072c0402f4..96e6274571d3 100644 --- a/arch/arm64/kernel/xcall/xcall.c +++ b/arch/arm64/kernel/xcall/xcall.c @@ -17,8 +17,9 @@ static inline int sw_xcall_init_task(struct task_struct *p, struct task_struct * return -ENOMEM; if (orig->xinfo) { - bitmap_copy(TASK_XINFO(p)->xcall_enable, TASK_XINFO(orig)->xcall_enable, - __NR_syscalls); + memcpy(TASK_XINFO(p)->xcall_enable, + TASK_XINFO(orig)->xcall_enable, + (__NR_syscalls + 1) * sizeof(u8)); } return 0; @@ -37,3 +38,14 @@ void xcall_task_free(struct task_struct *p) if (static_branch_unlikely(&xcall_enable)) kfree(p->xinfo); } + +static u8 default_xcall_info[__NR_syscalls + 1] = { + [0 ... __NR_syscalls] = 0, +}; +DEFINE_PER_CPU(u8*, __xcall_info) = default_xcall_info; + +void xcall_info_switch(struct task_struct *task) +{ + if (TASK_XINFO(task)->xcall_enable) + __this_cpu_write(__xcall_info, TASK_XINFO(task)->xcall_enable); +} diff --git a/fs/proc/proc_xcall.c b/fs/proc/proc_xcall.c index 5f45d0799b33..8f7375235803 100644 --- a/fs/proc/proc_xcall.c +++ b/fs/proc/proc_xcall.c @@ -12,9 +12,10 @@ static int xcall_show(struct seq_file *m, void *v) { struct inode *inode = m->private; - struct task_struct *p; - unsigned int rs, re; + int start = -1, first = 1; struct xcall_info *xinfo; + struct task_struct *p; + int scno = 0; if (!static_key_enabled(&xcall_enable)) return -EACCES; @@ -27,14 +28,27 @@ static int xcall_show(struct seq_file *m, void *v) if (!xinfo) goto out; - for (rs = 0, bitmap_next_set_region(xinfo->xcall_enable, &rs, &re, __NR_syscalls); - rs < re; rs = re + 1, - bitmap_next_set_region(xinfo->xcall_enable, &rs, &re, __NR_syscalls)) { - if (rs == (re - 1)) - seq_printf(m, "%d,", rs); - else - seq_printf(m, "%d-%d,", rs, re - 1); + for (scno = 0; scno <= __NR_syscalls; scno++) { + if (scno == __NR_syscalls || !xinfo->xcall_enable[scno]) { + if (start == -1) + continue; + + if (!first) + seq_puts(m, ","); + + if (start == scno - 1) + seq_printf(m, "%d", start); + else + seq_printf(m, "%d-%d", start, scno - 1); + + first = 0; + start = -1; + } else { + if (start == -1) + start = scno; + } } + seq_puts(m, "\n"); out: put_task_struct(p); @@ -47,45 +61,28 @@ static int xcall_open(struct inode *inode, struct file *filp) return single_open(filp, xcall_show, inode); } -static int xcall_enable_one(struct xcall_info *xinfo, unsigned int sc_no) -{ - test_and_set_bit(sc_no, xinfo->xcall_enable); - return 0; -} - -static int xcall_disable_one(struct xcall_info *xinfo, unsigned int sc_no) -{ - test_and_clear_bit(sc_no, xinfo->xcall_enable); - return 0; -} - -static ssize_t xcall_write(struct file *file, const char __user *buf, +static ssize_t xcall_write(struct file *file, const char __user *ubuf, size_t count, loff_t *offset) { - struct inode *inode = file_inode(file); - struct task_struct *p; - char buffer[5]; - const size_t maxlen = sizeof(buffer) - 1; unsigned int sc_no = __NR_syscalls; + struct task_struct *p; + char buf[5]; int ret = 0; - int is_clear = 0; - struct xcall_info *xinfo; if (!static_key_enabled(&xcall_enable)) return -EACCES; - memset(buffer, 0, sizeof(buffer)); - if (!count || copy_from_user(buffer, buf, count > maxlen ? maxlen : count)) - return -EFAULT; - - p = get_proc_task(inode); - if (!p || !p->xinfo) + p = get_proc_task(file_inode(file)); + if (!p || !TASK_XINFO(p)) return -ESRCH; - if (buffer[0] == '!') - is_clear = 1; + memset(buf, '\0', 5); + if (!count || (count > 5) || copy_from_user(buf, ubuf, count)) { + ret = -EFAULT; + goto out; + } - if (kstrtouint(buffer + is_clear, 10, &sc_no)) { + if (kstrtouint((buf + (int)(buf[0] == '!')), 10, &sc_no)) { ret = -EINVAL; goto out; } @@ -95,13 +92,8 @@ static ssize_t xcall_write(struct file *file, const char __user *buf, goto out; } - xinfo = TASK_XINFO(p); - if (!is_clear && !test_bit(sc_no, xinfo->xcall_enable)) - ret = xcall_enable_one(xinfo, sc_no); - else if (is_clear && test_bit(sc_no, xinfo->xcall_enable)) - ret = xcall_disable_one(xinfo, sc_no); - else - ret = -EINVAL; + (TASK_XINFO(p))->xcall_enable[sc_no] = (int)(buf[0] != '!'); + ret = 0; out: put_task_struct(p); -- 2.34.1
From: Yipeng Zou <zouyipeng@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Add a debug mode that forces the entire system (all tasks, all CPUs) to use the "fast syscall" path for every system-call invocation. The switch is activated at boot with "xcall=debug" on the kernel cmdline. WARNING: This option deliberately bypasses several safety checks and can expose latent bugs in architecture-specific assembly stubs, ptrace, audit, compat or instrumentation code. It is explicitly unsupported for production systems and may render the machine unstable or insecure. Use only in controlled test environments! Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/include/asm/xcall.h | 8 ++++++++ arch/arm64/kernel/cpufeature.c | 5 +++++ arch/arm64/kernel/xcall/xcall.c | 12 ++++++++++-- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/xcall.h b/arch/arm64/include/asm/xcall.h index 735a14870e4e..0f70f03cc3a2 100644 --- a/arch/arm64/include/asm/xcall.h +++ b/arch/arm64/include/asm/xcall.h @@ -13,6 +13,14 @@ #define SVC_0000 0xd4000001 #define SVC_FFFF 0xd41fffe1 +/* + * Only can switch by cmdline 'xcall=debug', + * By default xcall init with XCALL_MODE_TASK. + */ +#define XCALL_MODE_TASK 0 +#define XCALL_MODE_SYSTEM 1 +extern int sw_xcall_mode; + struct xcall_comm { char *name; char *binary; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 23cae64e5528..f81acf037a5c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2448,6 +2448,11 @@ static int __init xcall_setup(char *str) { static_branch_enable(&xcall_enable); + if (str && !strcmp(str, "=debug")) { + sw_xcall_mode = XCALL_MODE_SYSTEM; + pr_warn("Enable xcall across the entire system, for debugging only!\n"); + } + return 1; } __setup("xcall", xcall_setup); diff --git a/arch/arm64/kernel/xcall/xcall.c b/arch/arm64/kernel/xcall/xcall.c index 96e6274571d3..35bc959a0a51 100644 --- a/arch/arm64/kernel/xcall/xcall.c +++ b/arch/arm64/kernel/xcall/xcall.c @@ -10,17 +10,25 @@ #include <linux/slab.h> #include <asm/xcall.h> +// Only can switch by cmdline 'xcall=debug' +int sw_xcall_mode = XCALL_MODE_TASK; + static inline int sw_xcall_init_task(struct task_struct *p, struct task_struct *orig) { p->xinfo = kzalloc(sizeof(struct xcall_info), GFP_KERNEL); if (!p->xinfo) return -ENOMEM; - if (orig->xinfo) { + if (!orig->xinfo) + return 0; + + /* In xcall debug mode, all syscalls are enabled by default! */ + if (sw_xcall_mode == XCALL_MODE_SYSTEM) + memset(TASK_XINFO(p)->xcall_enable, 1, (__NR_syscalls + 1) * sizeof(u8)); + else memcpy(TASK_XINFO(p)->xcall_enable, TASK_XINFO(orig)->xcall_enable, (__NR_syscalls + 1) * sizeof(u8)); - } return 0; } -- 2.34.1
From: Liao Chang <liaochang1@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Add a xcall2.0 basic testcase. This module can be combined with the syscall sub-item of Unixbench to evaluate the baseline noise of xcall2.0's "Dynamic Instruction Replacement" mechanism. Users can also use this module as a reference to implement custom system calls. Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Zheng Xinyu <zhengxinyu6@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/xcall/Kconfig | 19 +++++ drivers/staging/xcall/Makefile | 1 + drivers/staging/xcall/dynamic_xcall_test.c | 90 ++++++++++++++++++++++ 5 files changed, 113 insertions(+) create mode 100644 drivers/staging/xcall/Kconfig create mode 100644 drivers/staging/xcall/Makefile create mode 100644 drivers/staging/xcall/dynamic_xcall_test.c diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig index f9aef39cac2e..702216e0ddd2 100644 --- a/drivers/staging/Kconfig +++ b/drivers/staging/Kconfig @@ -78,4 +78,6 @@ source "drivers/staging/qlge/Kconfig" source "drivers/staging/vme_user/Kconfig" +source "drivers/staging/xcall/Kconfig" + endif # STAGING diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile index ffa70dda481d..3df57d6ab9b2 100644 --- a/drivers/staging/Makefile +++ b/drivers/staging/Makefile @@ -28,3 +28,4 @@ obj-$(CONFIG_PI433) += pi433/ obj-$(CONFIG_XIL_AXIS_FIFO) += axis-fifo/ obj-$(CONFIG_FIELDBUS_DEV) += fieldbus/ obj-$(CONFIG_QLGE) += qlge/ +obj-$(CONFIG_DYNAMIC_XCALL) += xcall/ diff --git a/drivers/staging/xcall/Kconfig b/drivers/staging/xcall/Kconfig new file mode 100644 index 000000000000..bf7421fa8a14 --- /dev/null +++ b/drivers/staging/xcall/Kconfig @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 +menu "Xcall" + +if ARM64 + +config DYNAMIC_XCALL_TESTCASE + tristate "xcall2.0 test case" + depends on DYNAMIC_XCALL + help + A simple example of using the xcall2.0 kernel module. + This module can be combined with the syscall sub-item of + Unixbench to evaluate the baseline noise of xcall2.0's + "Dynamic Instruction Replacement" mechanism. Users can + also use this module as a reference to implement custom + system calls. + +endif # if ARM64 + +endmenu diff --git a/drivers/staging/xcall/Makefile b/drivers/staging/xcall/Makefile new file mode 100644 index 000000000000..668ac4f3b471 --- /dev/null +++ b/drivers/staging/xcall/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_DYNAMIC_XCALL_TESTCASE) += dynamic_xcall_test.o diff --git a/drivers/staging/xcall/dynamic_xcall_test.c b/drivers/staging/xcall/dynamic_xcall_test.c new file mode 100644 index 000000000000..159c2a15854c --- /dev/null +++ b/drivers/staging/xcall/dynamic_xcall_test.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * A simple dummy xcall for syscall testing + * + * The data struct and functions marked as MANDATORY have to + * be includes in all of kernel xcall modules. + * + * Copyright (C) 2025 Huawei Limited. + */ + +#define pr_fmt(fmt) "dummy_xcall: " fmt + +#include <linux/module.h> +#include <linux/xcall.h> +#include <linux/fs.h> + +#include <asm/xcall.h> + +static long __do_sys_close(struct pt_regs *regs) +{ + return default_sys_call_table()[__NR_close](regs); +} + +static long __do_sys_getpid(struct pt_regs *regs) +{ + return default_sys_call_table()[__NR_getpid](regs); +} + +static long __do_sys_getuid(struct pt_regs *regs) +{ + return default_sys_call_table()[__NR_getuid](regs); +} + +static long __do_sys_unmask(struct pt_regs *regs) +{ + return default_sys_call_table()[__NR_umask](regs); +} + +static long __do_sys_dup(struct pt_regs *regs) +{ + return default_sys_call_table()[__NR_dup](regs); +} + +/* MANDATORY */ +static struct xcall_prog dummy_xcall_prog = { + .name = "dummy_xcall", + .owner = THIS_MODULE, + .objs = { + { + .scno = (unsigned long)__NR_getpid, + .func = (unsigned long)__do_sys_getpid, + }, + { + .scno = (unsigned long)__NR_getuid, + .func = (unsigned long)__do_sys_getuid, + }, + { + .scno = (unsigned long)__NR_close, + .func = (unsigned long)__do_sys_close, + }, + { + .scno = (unsigned long)__NR_umask, + .func = (unsigned long)__do_sys_unmask, + }, + { + .scno = (unsigned long)__NR_dup, + .func = (unsigned long)__do_sys_dup, + }, + {} + } +}; + +/* MANDATORY */ +static int __init dummy_xcall_init(void) +{ + INIT_LIST_HEAD(&dummy_xcall_prog.list); + return xcall_prog_register(&dummy_xcall_prog); +} + +/* MANDATORY */ +static void __exit dummy_xcall_exit(void) +{ + xcall_prog_unregister(&dummy_xcall_prog); +} + +module_init(dummy_xcall_init); +module_exit(dummy_xcall_exit); +MODULE_AUTHOR("Liao Chang <liaochang1@huawei.com>"); +MODULE_DESCRIPTION("Dummy Xcall"); +MODULE_LICENSE("GPL"); -- 2.34.1
From: Yuntao Liu <liuyuntao12@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Intruduce xcall2.0 redis async prefetch kernel module. Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- drivers/staging/xcall/Makefile | 2 +- drivers/staging/xcall/prefetch.c | 265 +++++++++++++++++++++++++++++++ 2 files changed, 266 insertions(+), 1 deletion(-) create mode 100644 drivers/staging/xcall/prefetch.c diff --git a/drivers/staging/xcall/Makefile b/drivers/staging/xcall/Makefile index 668ac4f3b471..d8c6137e2945 100644 --- a/drivers/staging/xcall/Makefile +++ b/drivers/staging/xcall/Makefile @@ -1 +1 @@ -obj-$(CONFIG_DYNAMIC_XCALL_TESTCASE) += dynamic_xcall_test.o +obj-$(CONFIG_DYNAMIC_XCALL_TESTCASE) += dynamic_xcall_test.o prefetch.o diff --git a/drivers/staging/xcall/prefetch.c b/drivers/staging/xcall/prefetch.c new file mode 100644 index 000000000000..f911f3635cb7 --- /dev/null +++ b/drivers/staging/xcall/prefetch.c @@ -0,0 +1,265 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * A simple dummy xcall for syscall testing + * + * The data struct and functions marked as MANDATORY have to + * be includes in all of kernel xcall modules. + * + * Copyright (C) 2025 Huawei Limited. + */ + +#define pr_fmt(fmt) "dummy_xcall: " fmt + +#include <linux/file.h> +#include <linux/hash.h> +#include <linux/module.h> +#include <linux/proc_fs.h> +#include <linux/socket.h> +#include <linux/xcall.h> +#include <net/sock.h> +#include <net/tcp_states.h> + +#include <asm/xcall.h> + +#define MAX_FD 100 + +static unsigned long xcall_cache_hit; +static unsigned long xcall_cache_miss; + +struct proc_dir_entry *xcall_proc_dir; + +enum cache_state { + XCALL_CACHE_NONE = 0, + XCALL_CACHE_PREFETCH, + XCALL_CACHE_READY, + XCALL_CACHE_CANCEL +}; + +struct prefetch_item { + int fd; + int cpu; + int pos; + int len; + atomic_t state; + struct file *file; + struct work_struct work; + char cache[PAGE_SIZE]; +}; + +static struct epoll_event events[MAX_FD] = {0}; + +static struct prefetch_item prefetch_items[MAX_FD] = {0}; +static struct workqueue_struct *rc_work; + +static inline bool transition_state(struct prefetch_item *pfi, + enum cache_state old, enum cache_state new) +{ + return atomic_cmpxchg(&pfi->state, old, new) == old; +} + +static void prefetch_work_fn(struct work_struct *work) +{ + struct prefetch_item *pfi = container_of(work, struct prefetch_item, work); + + if (!transition_state(pfi, XCALL_CACHE_NONE, XCALL_CACHE_PREFETCH)) + return; + + pfi->pos = 0; + pfi->len = kernel_read(pfi->file, pfi->cache, PAGE_SIZE, &pfi->file->f_pos); + + transition_state(pfi, XCALL_CACHE_PREFETCH, XCALL_CACHE_READY); +} + +static long __do_sys_epoll_pwait(struct pt_regs *regs) +{ + struct prefetch_item *pfi; + int i, fd, err; + long ret; + + ret = default_sys_call_table()[__NR_epoll_pwait](regs); + if (!ret) + return 0; + + err = copy_from_user(events, (void __user *)regs->regs[1], + ret * sizeof(struct epoll_event)); + if (err) + return -EFAULT; + + for (i = 0; i < ret; i++) { + fd = events[i].data; + if (events[i].events & EPOLLIN) { + pfi = &prefetch_items[fd]; + if (!pfi->file) + pfi->file = fget(fd); + + queue_work_on(250 + (fd % 4), rc_work, &pfi->work); + } + } + + return ret; +} + +static long __do_sys_read(struct pt_regs *regs) +{ + int fd = regs->regs[0]; + struct prefetch_item *pfi = &prefetch_items[fd]; + void *user_buf = (void *)regs->regs[1]; + int count = regs->regs[2]; + int copy_len; + long ret; + + if (!pfi->file) + goto not_epoll_fd; + + while (!transition_state(pfi, XCALL_CACHE_READY, XCALL_CACHE_CANCEL)) { + if (transition_state(pfi, XCALL_CACHE_NONE, XCALL_CACHE_CANCEL)) + goto slow_read; + } + + xcall_cache_hit++; + copy_len = pfi->len; + + if (copy_len == 0) { + transition_state(pfi, XCALL_CACHE_CANCEL, XCALL_CACHE_NONE); + return 0; + } + + copy_len = (copy_len >= count) ? count : copy_len; + copy_len -= copy_to_user(user_buf, (void *)(pfi->cache + pfi->pos), copy_len); + pfi->len -= copy_len; + pfi->pos += copy_len; + + if (pfi->len == 0) + transition_state(pfi, XCALL_CACHE_CANCEL, XCALL_CACHE_NONE); + else + transition_state(pfi, XCALL_CACHE_CANCEL, XCALL_CACHE_READY); + return copy_len; + +slow_read: + xcall_cache_miss++; + pfi->len = 0; + pfi->pos = 0; + cancel_work_sync(&pfi->work); + transition_state(pfi, XCALL_CACHE_CANCEL, XCALL_CACHE_NONE); +not_epoll_fd: + ret = default_sys_call_table()[__NR_read](regs); + return ret; +} + +static long __do_sys_close(struct pt_regs *regs) +{ + struct prefetch_item *pfi; + int fd = regs->regs[0]; + long ret; + + pfi = &prefetch_items[fd]; + if (pfi && pfi->file) { + fput(pfi->file); + pfi->file = NULL; + } + + ret = default_sys_call_table()[__NR_close](regs); + return ret; +} + +/* MANDATORY */ +static struct xcall_prog xcall_prefetch_prog = { + .name = "xcall_prefetch", + .owner = THIS_MODULE, + .objs = { + { + .scno = (unsigned long)__NR_epoll_pwait, + .func = (unsigned long)__do_sys_epoll_pwait, + }, + { + .scno = (unsigned long)__NR_read, + .func = (unsigned long)__do_sys_read, + }, + { + .scno = (unsigned long)__NR_close, + .func = (unsigned long)__do_sys_close, + }, + {} + } +}; + +static ssize_t xcall_prefetch_reset(struct file *file, const char __user *buf, + size_t count, loff_t *pos) +{ + xcall_cache_hit = 0; + xcall_cache_miss = 0; + + return count; +} + +static int xcall_prefetch_show(struct seq_file *m, void *v) +{ + u64 percent; + + percent = DIV_ROUND_CLOSEST(xcall_cache_hit * 100ULL, xcall_cache_hit + xcall_cache_miss); + seq_printf(m, "epoll cache_{hit,miss}: %lu,%lu, hit ratio: %llu%%\n", + xcall_cache_hit, xcall_cache_miss, percent); + return 0; +} + +static int xcall_prefetch_open(struct inode *inode, struct file *file) +{ + return single_open(file, xcall_prefetch_show, NULL); +} + +static const struct proc_ops xcall_prefetch_fops = { + .proc_open = xcall_prefetch_open, + .proc_read = seq_read, + .proc_write = xcall_prefetch_reset, + .proc_lseek = seq_lseek, + .proc_release = single_release +}; + +static int __init init_xcall_prefetch_procfs(void) +{ + struct proc_dir_entry *prefetch_dir; + + xcall_proc_dir = proc_mkdir("xcall_stat", NULL); + if (!xcall_proc_dir) + return -ENOMEM; + prefetch_dir = proc_create("prefetch", 0640, xcall_proc_dir, &xcall_prefetch_fops); + if (!prefetch_dir) + goto rm_xcall_proc_dir; + + return 0; + +rm_xcall_proc_dir: + proc_remove(xcall_proc_dir); + return -ENOMEM; +} + +/* MANDATORY */ +static int __init dummy_xcall_init(void) +{ + int i; + + rc_work = alloc_workqueue("eventpoll_rc", 0, 0); + if (!rc_work) + pr_warn("alloc eventpoll_rc workqueue failed.\n"); + + for (i = 0; i < MAX_FD; i++) + INIT_WORK(&prefetch_items[i].work, prefetch_work_fn); + + init_xcall_prefetch_procfs(); + + INIT_LIST_HEAD(&xcall_prefetch_prog.list); + return xcall_prog_register(&xcall_prefetch_prog); +} + +/* MANDATORY */ +static void __exit dummy_xcall_exit(void) +{ + proc_remove(xcall_proc_dir); + xcall_prog_unregister(&xcall_prefetch_prog); +} + +module_init(dummy_xcall_init); +module_exit(dummy_xcall_exit); +MODULE_AUTHOR("Liao Chang <liaochang1@huawei.com>"); +MODULE_DESCRIPTION("Dummy Xcall"); +MODULE_LICENSE("GPL"); -- 2.34.1
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ID5CMS -------------------------------- Enable CONFIG_DYNAMIC_XCALL by default for arm64. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index e110f8d8e844..edaa367d2b0d 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -394,6 +394,7 @@ CONFIG_FAST_IRQ=y CONFIG_DEBUG_FEATURE_BYPASS=y CONFIG_SECURITY_FEATURE_BYPASS=y CONFIG_ACTLR_XCALL_XINT=y +CONFIG_DYNAMIC_XCALL=y # end of Turbo features selection # -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/18936 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/B24... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/18936 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/B24...
participants (2)
-
Jinjie Ruan -
patchwork bot