This series adds initial KVM RISC-V support. Currently, we are able to boot RISC-V 64bit Linux Guests with multiple VCPUs.
This series can be found in riscv_kvm_v13 branch at: https//github.com/avpatel/linux.git
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y
This is backported by Mingwang Li limingwang@huawei.com.
Alistair Francis (1): Revert "riscv: Use latest system call ABI"
Anup Patel (19): RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Add fragmented config for debug options RISC-V: Enable CPU Hotplug in defconfigs RISC-V: Add hypervisor extension related CSR defines RISC-V: Add initial skeletal KVM support RISC-V: KVM: Implement VCPU create, init and destroy functions RISC-V: KVM: Implement VCPU interrupts and requests handling RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls RISC-V: KVM: Implement VCPU world-switch RISC-V: KVM: Handle MMIO exits for VCPU RISC-V: KVM: Handle WFI exits for VCPU RISC-V: KVM: Implement VMID allocator RISC-V: KVM: Implement stage2 page table programming RISC-V: KVM: Implement MMU notifiers RISC-V: KVM: Document RISC-V specific parts of KVM API. RISC-V: KVM: Add MAINTAINERS entry RISC-V: Enable KVM for RV64 and RV32 RISC-V: KVM: Fix stage2_map_page() and kvm_riscv_vcpu_unpriv_read() RISC-V: KVM: Determine instruction correctly from HTINST CSR
Atish Patra (9): RISC-V: Mark existing SBI as 0.1 SBI. RISC-V: Add basic support for SBI v0.2 RISC-V: Add SBI v0.2 extension definitions RISC-V: Introduce a new config for SBI v0.1 RISC-V: Implement new SBI v0.2 extensions RISC-V: KVM: Add timer functionality RISC-V: KVM: FP lazy save/restore RISC-V: KVM: Implement ONE REG interface for FP registers RISC-V: KVM: Add SBI v0.1 support
limingwang (1): RISC-V: KVM: fix some interfaces problems
Documentation/virt/kvm/api.txt | 169 +++- MAINTAINERS | 11 + arch/riscv/Kconfig | 9 + arch/riscv/Makefile | 2 + arch/riscv/configs/defconfig | 24 +- arch/riscv/configs/extra_debug.config | 21 + arch/riscv/configs/rv32_defconfig | 24 +- arch/riscv/include/asm/csr.h | 90 +- arch/riscv/include/asm/hwcap.h | 22 + arch/riscv/include/asm/kvm_host.h | 274 ++++++ arch/riscv/include/asm/kvm_vcpu_timer.h | 44 + arch/riscv/include/asm/pgtable-bits.h | 1 + arch/riscv/include/asm/sbi.h | 178 ++-- arch/riscv/include/uapi/asm/kvm.h | 127 +++ arch/riscv/include/uapi/asm/unistd.h | 5 +- arch/riscv/kernel/asm-offsets.c | 154 ++++ arch/riscv/kernel/cpufeature.c | 83 +- arch/riscv/kernel/sbi.c | 522 +++++++++++- arch/riscv/kernel/setup.c | 2 + arch/riscv/kvm/Kconfig | 34 + arch/riscv/kvm/Makefile | 14 + arch/riscv/kvm/main.c | 99 +++ arch/riscv/kvm/mmu.c | 804 ++++++++++++++++++ arch/riscv/kvm/tlb.S | 74 ++ arch/riscv/kvm/vcpu.c | 1037 +++++++++++++++++++++++ arch/riscv/kvm/vcpu_exit.c | 678 +++++++++++++++ arch/riscv/kvm/vcpu_sbi.c | 173 ++++ arch/riscv/kvm/vcpu_switch.S | 391 +++++++++ arch/riscv/kvm/vcpu_timer.c | 225 +++++ arch/riscv/kvm/vm.c | 86 ++ arch/riscv/kvm/vmid.c | 120 +++ drivers/clocksource/timer-riscv.c | 8 + include/clocksource/timer-riscv.h | 16 + include/uapi/linux/kvm.h | 8 + 34 files changed, 5402 insertions(+), 127 deletions(-) create mode 100644 arch/riscv/configs/extra_debug.config create mode 100644 arch/riscv/include/asm/kvm_host.h create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h create mode 100644 arch/riscv/include/uapi/asm/kvm.h create mode 100644 arch/riscv/kvm/Kconfig create mode 100644 arch/riscv/kvm/Makefile create mode 100644 arch/riscv/kvm/main.c create mode 100644 arch/riscv/kvm/mmu.c create mode 100644 arch/riscv/kvm/tlb.S create mode 100644 arch/riscv/kvm/vcpu.c create mode 100644 arch/riscv/kvm/vcpu_exit.c create mode 100644 arch/riscv/kvm/vcpu_sbi.c create mode 100644 arch/riscv/kvm/vcpu_switch.S create mode 100644 arch/riscv/kvm/vcpu_timer.c create mode 100644 arch/riscv/kvm/vm.c create mode 100644 arch/riscv/kvm/vmid.c create mode 100644 include/clocksource/timer-riscv.h
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
As per the new SBI specification, current SBI implementation version is defined as 0.1 and will be removed/replaced in future. Each of the function call in 0.1 is defined as a separate extension which makes easier to replace them one at a time.
Rename existing implementation to reflect that. This patch is just a preparatory patch for SBI v0.2 and doesn't introduce any functional changes.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Reviewed-by: Anup Patel anup@brainfault.org Reviewed-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/sbi.h | 44 ++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 20 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index 2570c1e683d3..b38bc36f7429 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2015 Regents of the University of California + * Copyright (c) 2019 Western Digital Corporation or its affiliates. */
#ifndef _ASM_RISCV_SBI_H @@ -9,17 +10,17 @@ #include <linux/types.h>
#ifdef CONFIG_RISCV_SBI -#define SBI_SET_TIMER 0 -#define SBI_CONSOLE_PUTCHAR 1 -#define SBI_CONSOLE_GETCHAR 2 -#define SBI_CLEAR_IPI 3 -#define SBI_SEND_IPI 4 -#define SBI_REMOTE_FENCE_I 5 -#define SBI_REMOTE_SFENCE_VMA 6 -#define SBI_REMOTE_SFENCE_VMA_ASID 7 -#define SBI_SHUTDOWN 8 +#define SBI_EXT_0_1_SET_TIMER 0x0 +#define SBI_EXT_0_1_CONSOLE_PUTCHAR 0x1 +#define SBI_EXT_0_1_CONSOLE_GETCHAR 0x2 +#define SBI_EXT_0_1_CLEAR_IPI 0x3 +#define SBI_EXT_0_1_SEND_IPI 0x4 +#define SBI_EXT_0_1_REMOTE_FENCE_I 0x5 +#define SBI_EXT_0_1_REMOTE_SFENCE_VMA 0x6 +#define SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID 0x7 +#define SBI_EXT_0_1_SHUTDOWN 0x8
-#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({ \ +#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({ \ register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0); \ register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1); \ register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2); \ @@ -43,48 +44,50 @@
static inline void sbi_console_putchar(int ch) { - SBI_CALL_1(SBI_CONSOLE_PUTCHAR, ch); + SBI_CALL_1(SBI_EXT_0_1_CONSOLE_PUTCHAR, ch); }
static inline int sbi_console_getchar(void) { - return SBI_CALL_0(SBI_CONSOLE_GETCHAR); + return SBI_CALL_0(SBI_EXT_0_1_CONSOLE_GETCHAR); }
static inline void sbi_set_timer(uint64_t stime_value) { #if __riscv_xlen == 32 - SBI_CALL_2(SBI_SET_TIMER, stime_value, stime_value >> 32); + SBI_CALL_2(SBI_EXT_0_1_SET_TIMER, stime_value, + stime_value >> 32); #else - SBI_CALL_1(SBI_SET_TIMER, stime_value); + SBI_CALL_1(SBI_EXT_0_1_SET_TIMER, stime_value); #endif }
static inline void sbi_shutdown(void) { - SBI_CALL_0(SBI_SHUTDOWN); + SBI_CALL_0(SBI_EXT_0_1_SHUTDOWN); }
static inline void sbi_clear_ipi(void) { - SBI_CALL_0(SBI_CLEAR_IPI); + SBI_CALL_0(SBI_EXT_0_1_CLEAR_IPI); }
static inline void sbi_send_ipi(const unsigned long *hart_mask) { - SBI_CALL_1(SBI_SEND_IPI, hart_mask); + SBI_CALL_1(SBI_EXT_0_1_SEND_IPI, hart_mask); }
static inline void sbi_remote_fence_i(const unsigned long *hart_mask) { - SBI_CALL_1(SBI_REMOTE_FENCE_I, hart_mask); + SBI_CALL_1(SBI_EXT_0_1_REMOTE_FENCE_I, hart_mask); }
static inline void sbi_remote_sfence_vma(const unsigned long *hart_mask, unsigned long start, unsigned long size) { - SBI_CALL_3(SBI_REMOTE_SFENCE_VMA, hart_mask, start, size); + SBI_CALL_3(SBI_EXT_0_1_REMOTE_SFENCE_VMA, hart_mask, + start, size); }
static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, @@ -92,7 +95,8 @@ static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, unsigned long size, unsigned long asid) { - SBI_CALL_4(SBI_REMOTE_SFENCE_VMA_ASID, hart_mask, start, size, asid); + SBI_CALL_4(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, hart_mask, + start, size, asid); } #else /* CONFIG_RISCV_SBI */ /* stubs for code that is only reachable under IS_ENABLED(CONFIG_RISCV_SBI): */
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
The SBI v0.2 introduces a base extension which is backward compatible with v0.1. Implement all helper functions and minimum required SBI calls from v0.2 for now. All other base extension function will be added later as per need. As v0.2 calling convention is backward compatible with v0.1, remove the v0.1 helper functions and just use v0.2 calling convention.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Reviewed-by: Anup Patel anup@brainfault.org Reviewed-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/sbi.h | 139 ++++++++++---------- arch/riscv/kernel/sbi.c | 243 ++++++++++++++++++++++++++++++++++- arch/riscv/kernel/setup.c | 2 + 3 files changed, 310 insertions(+), 74 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index b38bc36f7429..1aeb4bb7baa8 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -10,93 +10,88 @@ #include <linux/types.h>
#ifdef CONFIG_RISCV_SBI -#define SBI_EXT_0_1_SET_TIMER 0x0 -#define SBI_EXT_0_1_CONSOLE_PUTCHAR 0x1 -#define SBI_EXT_0_1_CONSOLE_GETCHAR 0x2 -#define SBI_EXT_0_1_CLEAR_IPI 0x3 -#define SBI_EXT_0_1_SEND_IPI 0x4 -#define SBI_EXT_0_1_REMOTE_FENCE_I 0x5 -#define SBI_EXT_0_1_REMOTE_SFENCE_VMA 0x6 -#define SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID 0x7 -#define SBI_EXT_0_1_SHUTDOWN 0x8 +enum sbi_ext_id { + SBI_EXT_0_1_SET_TIMER = 0x0, + SBI_EXT_0_1_CONSOLE_PUTCHAR = 0x1, + SBI_EXT_0_1_CONSOLE_GETCHAR = 0x2, + SBI_EXT_0_1_CLEAR_IPI = 0x3, + SBI_EXT_0_1_SEND_IPI = 0x4, + SBI_EXT_0_1_REMOTE_FENCE_I = 0x5, + SBI_EXT_0_1_REMOTE_SFENCE_VMA = 0x6, + SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID = 0x7, + SBI_EXT_0_1_SHUTDOWN = 0x8, + SBI_EXT_BASE = 0x10, +};
-#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({ \ - register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0); \ - register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1); \ - register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2); \ - register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3); \ - register uintptr_t a7 asm ("a7") = (uintptr_t)(which); \ - asm volatile ("ecall" \ - : "+r" (a0) \ - : "r" (a1), "r" (a2), "r" (a3), "r" (a7) \ - : "memory"); \ - a0; \ -}) +enum sbi_ext_base_fid { + SBI_EXT_BASE_GET_SPEC_VERSION = 0, + SBI_EXT_BASE_GET_IMP_ID, + SBI_EXT_BASE_GET_IMP_VERSION, + SBI_EXT_BASE_PROBE_EXT, + SBI_EXT_BASE_GET_MVENDORID, + SBI_EXT_BASE_GET_MARCHID, + SBI_EXT_BASE_GET_MIMPID, +};
-/* Lazy implementations until SBI is finalized */ -#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0, 0) -#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0, 0) -#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0, 0) -#define SBI_CALL_3(which, arg0, arg1, arg2) \ - SBI_CALL(which, arg0, arg1, arg2, 0) -#define SBI_CALL_4(which, arg0, arg1, arg2, arg3) \ - SBI_CALL(which, arg0, arg1, arg2, arg3) +#define SBI_SPEC_VERSION_DEFAULT 0x1 +#define SBI_SPEC_VERSION_MAJOR_SHIFT 24 +#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f +#define SBI_SPEC_VERSION_MINOR_MASK 0xffffff
-static inline void sbi_console_putchar(int ch) -{ - SBI_CALL_1(SBI_EXT_0_1_CONSOLE_PUTCHAR, ch); -} +/* SBI return error codes */ +#define SBI_SUCCESS 0 +#define SBI_ERR_FAILURE -1 +#define SBI_ERR_NOT_SUPPORTED -2 +#define SBI_ERR_INVALID_PARAM -3 +#define SBI_ERR_DENIED -4 +#define SBI_ERR_INVALID_ADDRESS -5
-static inline int sbi_console_getchar(void) -{ - return SBI_CALL_0(SBI_EXT_0_1_CONSOLE_GETCHAR); -} +extern unsigned long sbi_spec_version; +struct sbiret { + long error; + long value; +};
-static inline void sbi_set_timer(uint64_t stime_value) -{ -#if __riscv_xlen == 32 - SBI_CALL_2(SBI_EXT_0_1_SET_TIMER, stime_value, - stime_value >> 32); -#else - SBI_CALL_1(SBI_EXT_0_1_SET_TIMER, stime_value); -#endif -} - -static inline void sbi_shutdown(void) -{ - SBI_CALL_0(SBI_EXT_0_1_SHUTDOWN); -} +int sbi_init(void); +struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0, + unsigned long arg1, unsigned long arg2, + unsigned long arg3, unsigned long arg4, + unsigned long arg5);
-static inline void sbi_clear_ipi(void) -{ - SBI_CALL_0(SBI_EXT_0_1_CLEAR_IPI); -} +void sbi_console_putchar(int ch); +int sbi_console_getchar(void); +void sbi_set_timer(uint64_t stime_value); +void sbi_shutdown(void); +void sbi_clear_ipi(void); +void sbi_send_ipi(const unsigned long *hart_mask); +void sbi_remote_fence_i(const unsigned long *hart_mask); +void sbi_remote_sfence_vma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size);
-static inline void sbi_send_ipi(const unsigned long *hart_mask) -{ - SBI_CALL_1(SBI_EXT_0_1_SEND_IPI, hart_mask); -} +void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long asid); +int sbi_probe_extension(int ext);
-static inline void sbi_remote_fence_i(const unsigned long *hart_mask) +/* Check if current SBI specification version is 0.1 or not */ +static inline int sbi_spec_is_0_1(void) { - SBI_CALL_1(SBI_EXT_0_1_REMOTE_FENCE_I, hart_mask); + return (sbi_spec_version == SBI_SPEC_VERSION_DEFAULT) ? 1 : 0; }
-static inline void sbi_remote_sfence_vma(const unsigned long *hart_mask, - unsigned long start, - unsigned long size) +/* Get the major version of SBI */ +static inline unsigned long sbi_major_version(void) { - SBI_CALL_3(SBI_EXT_0_1_REMOTE_SFENCE_VMA, hart_mask, - start, size); + return (sbi_spec_version >> SBI_SPEC_VERSION_MAJOR_SHIFT) & + SBI_SPEC_VERSION_MAJOR_MASK; }
-static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, - unsigned long start, - unsigned long size, - unsigned long asid) +/* Get the minor version of SBI */ +static inline unsigned long sbi_minor_version(void) { - SBI_CALL_4(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, hart_mask, - start, size, asid); + return sbi_spec_version & SBI_SPEC_VERSION_MINOR_MASK; } #else /* CONFIG_RISCV_SBI */ /* stubs for code that is only reachable under IS_ENABLED(CONFIG_RISCV_SBI): */ diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c index f6c7c3e82d28..33632e7f91da 100644 --- a/arch/riscv/kernel/sbi.c +++ b/arch/riscv/kernel/sbi.c @@ -1,17 +1,256 @@ // SPDX-License-Identifier: GPL-2.0-only +/* + * SBI initialilization and all extension implementation. + * + * Copyright (c) 2019 Western Digital Corporation or its affiliates. + */
#include <linux/init.h> #include <linux/pm.h> #include <asm/sbi.h>
+/* default SBI version is 0.1 */ +unsigned long sbi_spec_version = SBI_SPEC_VERSION_DEFAULT; +EXPORT_SYMBOL(sbi_spec_version); + +struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0, + unsigned long arg1, unsigned long arg2, + unsigned long arg3, unsigned long arg4, + unsigned long arg5) +{ + struct sbiret ret; + + register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0); + register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1); + register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2); + register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3); + register uintptr_t a4 asm ("a4") = (uintptr_t)(arg4); + register uintptr_t a5 asm ("a5") = (uintptr_t)(arg5); + register uintptr_t a6 asm ("a6") = (uintptr_t)(fid); + register uintptr_t a7 asm ("a7") = (uintptr_t)(ext); + asm volatile ("ecall" + : "+r" (a0), "+r" (a1) + : "r" (a2), "r" (a3), "r" (a4), "r" (a5), "r" (a6), "r" (a7) + : "memory"); + ret.error = a0; + ret.value = a1; + + return ret; +} +EXPORT_SYMBOL(sbi_ecall); + +static int sbi_err_map_linux_errno(int err) +{ + switch (err) { + case SBI_SUCCESS: + return 0; + case SBI_ERR_DENIED: + return -EPERM; + case SBI_ERR_INVALID_PARAM: + return -EINVAL; + case SBI_ERR_INVALID_ADDRESS: + return -EFAULT; + case SBI_ERR_NOT_SUPPORTED: + case SBI_ERR_FAILURE: + default: + return -ENOTSUPP; + }; +} + +/** + * sbi_console_putchar() - Writes given character to the console device. + * @ch: The data to be written to the console. + * + * Return: None + */ +void sbi_console_putchar(int ch) +{ + sbi_ecall(SBI_EXT_0_1_CONSOLE_PUTCHAR, 0, ch, 0, 0, 0, 0, 0); +} +EXPORT_SYMBOL(sbi_console_putchar); + +/** + * sbi_console_getchar() - Reads a byte from console device. + * + * Returns the value read from console. + */ +int sbi_console_getchar(void) +{ + struct sbiret ret; + + ret = sbi_ecall(SBI_EXT_0_1_CONSOLE_GETCHAR, 0, 0, 0, 0, 0, 0, 0); + + return ret.error; +} +EXPORT_SYMBOL(sbi_console_getchar); + +/** + * sbi_set_timer() - Program the timer for next timer event. + * @stime_value: The value after which next timer event should fire. + * + * Return: None + */ +void sbi_set_timer(uint64_t stime_value) +{ +#if __riscv_xlen == 32 + sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, + stime_value >> 32, 0, 0, 0, 0); +#else + sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, 0, 0, 0, 0, 0); +#endif +} +EXPORT_SYMBOL(sbi_set_timer); + +/** + * sbi_shutdown() - Remove all the harts from executing supervisor code. + * + * Return: None + */ +void sbi_shutdown(void) +{ + sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0); +} +EXPORT_SYMBOL(sbi_shutdown); + +/** + * sbi_clear_ipi() - Clear any pending IPIs for the calling hart. + * + * Return: None + */ +void sbi_clear_ipi(void) +{ + sbi_ecall(SBI_EXT_0_1_CLEAR_IPI, 0, 0, 0, 0, 0, 0, 0); +} + +/** + * sbi_send_ipi() - Send an IPI to any hart. + * @hart_mask: A cpu mask containing all the target harts. + * + * Return: None + */ +void sbi_send_ipi(const unsigned long *hart_mask) +{ + sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)hart_mask, + 0, 0, 0, 0, 0); +} +EXPORT_SYMBOL(sbi_send_ipi); + +/** + * sbi_remote_fence_i() - Execute FENCE.I instruction on given remote harts. + * @hart_mask: A cpu mask containing all the target harts. + * + * Return: None + */ +void sbi_remote_fence_i(const unsigned long *hart_mask) +{ + sbi_ecall(SBI_EXT_0_1_REMOTE_FENCE_I, 0, (unsigned long)hart_mask, + 0, 0, 0, 0, 0); +} +EXPORT_SYMBOL(sbi_remote_fence_i); + +/** + * sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote + * harts for the specified virtual address range. + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the virtual address + * @size: Total size of the virtual address range. + * + * Return: None + */ +void sbi_remote_sfence_vma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size) +{ + sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA, 0, + (unsigned long)hart_mask, start, size, 0, 0, 0); +} +EXPORT_SYMBOL(sbi_remote_sfence_vma); + +/** + * sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given + * remote harts for a virtual address range belonging to a specific ASID. + * + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the virtual address + * @size: Total size of the virtual address range. + * @asid: The value of address space identifier (ASID). + * + * Return: None + */ +void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long asid) +{ + sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, 0, + (unsigned long)hart_mask, start, size, asid, 0, 0); +} +EXPORT_SYMBOL(sbi_remote_sfence_vma_asid); + +/** + * sbi_probe_extension() - Check if an SBI extension ID is supported or not. + * @extid: The extension ID to be probed. + * + * Return: Extension specific nonzero value f yes, -ENOTSUPP otherwise. + */ +int sbi_probe_extension(int extid) +{ + struct sbiret ret; + + ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_PROBE_EXT, extid, + 0, 0, 0, 0, 0); + if (!ret.error) + if (ret.value) + return ret.value; + + return -ENOTSUPP; +} +EXPORT_SYMBOL(sbi_probe_extension); + +static long __sbi_base_ecall(int fid) +{ + struct sbiret ret; + + ret = sbi_ecall(SBI_EXT_BASE, fid, 0, 0, 0, 0, 0, 0); + if (!ret.error) + return ret.value; + else + return sbi_err_map_linux_errno(ret.error); +} + +static inline long sbi_get_spec_version(void) +{ + return __sbi_base_ecall(SBI_EXT_BASE_GET_SPEC_VERSION); +} + +static inline long sbi_get_firmware_id(void) +{ + return __sbi_base_ecall(SBI_EXT_BASE_GET_IMP_ID); +} + +static inline long sbi_get_firmware_version(void) +{ + return __sbi_base_ecall(SBI_EXT_BASE_GET_IMP_VERSION); +} + static void sbi_power_off(void) { sbi_shutdown(); }
-static int __init sbi_init(void) +int __init sbi_init(void) { + int ret; + pm_power_off = sbi_power_off; + ret = sbi_get_spec_version(); + if (ret > 0) + sbi_spec_version = ret; + + pr_info("SBI specification v%lu.%lu detected\n", + sbi_major_version(), sbi_minor_version()); + if (!sbi_spec_is_0_1()) + pr_info("SBI implementation ID=0x%lx Version=0x%lx\n", + sbi_get_firmware_id(), sbi_get_firmware_version()); return 0; } -early_initcall(sbi_init); diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 365ff8420bfe..f0a3c51e3d1b 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -22,6 +22,7 @@ #include <asm/sections.h> #include <asm/pgtable.h> #include <asm/smp.h> +#include <asm/sbi.h> #include <asm/tlbflush.h> #include <asm/thread_info.h>
@@ -74,6 +75,7 @@ void __init setup_arch(char **cmdline_p) swiotlb_init(1); #endif
+ sbi_init(); #ifdef CONFIG_SMP setup_smp(); #endif
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Few v0.1 SBI calls are being replaced by new SBI calls that follows v0.2 calling convention.
This patch just defines these new extensions.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/sbi.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index 1aeb4bb7baa8..9612133213ba 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -21,6 +21,9 @@ enum sbi_ext_id { SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID = 0x7, SBI_EXT_0_1_SHUTDOWN = 0x8, SBI_EXT_BASE = 0x10, + SBI_EXT_TIME = 0x54494D45, + SBI_EXT_IPI = 0x735049, + SBI_EXT_RFENCE = 0x52464E43, };
enum sbi_ext_base_fid { @@ -33,6 +36,24 @@ enum sbi_ext_base_fid { SBI_EXT_BASE_GET_MIMPID, };
+enum sbi_ext_time_fid { + SBI_EXT_TIME_SET_TIMER = 0, +}; + +enum sbi_ext_ipi_fid { + SBI_EXT_IPI_SEND_IPI = 0, +}; + +enum sbi_ext_rfence_fid { + SBI_EXT_RFENCE_REMOTE_FENCE_I = 0, + SBI_EXT_RFENCE_REMOTE_SFENCE_VMA, + SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID, + SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA, + SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID, + SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA, + SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID, +}; + #define SBI_SPEC_VERSION_DEFAULT 0x1 #define SBI_SPEC_VERSION_MAJOR_SHIFT 24 #define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
We now have SBI v0.2 which is more scalable and extendable to handle future needs for RISC-V supervisor interfaces.
Introduce a new config and move all SBI v0.1 code under that config. This allows to implement the new replacement SBI extensions cleanly and remove v0.1 extensions easily in future. Currently, the config is enabled by default. Once all M-mode software, with v0.1, is no longer in use, this config option and all relevant code can be easily removed.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Reviewed-by: Anup Patel anup@brainfault.org Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/Kconfig | 6 ++ arch/riscv/include/asm/sbi.h | 2 + arch/riscv/kernel/sbi.c | 137 +++++++++++++++++++++++++++++------ 3 files changed, 121 insertions(+), 24 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 50cfa272f9e3..2946627d2a70 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -307,6 +307,12 @@ config SECCOMP and the task is only allowed to execute a few safe syscalls defined by each seccomp mode.
+config RISCV_SBI_V01 + bool "SBI v0.1 support" + default y + help + This config allows kernel to use SBI v0.1 APIs. This will be + deprecated in future once legacy M-mode software are no longer in use. endmenu
menu "Boot options" diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index 9612133213ba..89a25a049d12 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -11,6 +11,7 @@
#ifdef CONFIG_RISCV_SBI enum sbi_ext_id { +#ifdef CONFIG_RISCV_SBI_V01 SBI_EXT_0_1_SET_TIMER = 0x0, SBI_EXT_0_1_CONSOLE_PUTCHAR = 0x1, SBI_EXT_0_1_CONSOLE_GETCHAR = 0x2, @@ -20,6 +21,7 @@ enum sbi_ext_id { SBI_EXT_0_1_REMOTE_SFENCE_VMA = 0x6, SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID = 0x7, SBI_EXT_0_1_SHUTDOWN = 0x8, +#endif SBI_EXT_BASE = 0x10, SBI_EXT_TIME = 0x54494D45, SBI_EXT_IPI = 0x735049, diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c index 33632e7f91da..265637cb5eb0 100644 --- a/arch/riscv/kernel/sbi.c +++ b/arch/riscv/kernel/sbi.c @@ -13,6 +13,13 @@ unsigned long sbi_spec_version = SBI_SPEC_VERSION_DEFAULT; EXPORT_SYMBOL(sbi_spec_version);
+static void (*__sbi_set_timer)(uint64_t stime); +static int (*__sbi_send_ipi)(const unsigned long *hart_mask); +static int (*__sbi_rfence)(int fid, const unsigned long *hart_mask, + unsigned long hbase, unsigned long start, + unsigned long size, unsigned long arg4, + unsigned long arg5); + struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0, unsigned long arg1, unsigned long arg2, unsigned long arg3, unsigned long arg4, @@ -57,6 +64,7 @@ static int sbi_err_map_linux_errno(int err) }; }
+#ifdef CONFIG_RISCV_SBI_V01 /** * sbi_console_putchar() - Writes given character to the console device. * @ch: The data to be written to the console. @@ -85,41 +93,115 @@ int sbi_console_getchar(void) EXPORT_SYMBOL(sbi_console_getchar);
/** - * sbi_set_timer() - Program the timer for next timer event. - * @stime_value: The value after which next timer event should fire. + * sbi_shutdown() - Remove all the harts from executing supervisor code. * * Return: None */ -void sbi_set_timer(uint64_t stime_value) +void sbi_shutdown(void) { -#if __riscv_xlen == 32 - sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, - stime_value >> 32, 0, 0, 0, 0); -#else - sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, 0, 0, 0, 0, 0); -#endif + sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0); } EXPORT_SYMBOL(sbi_set_timer);
/** - * sbi_shutdown() - Remove all the harts from executing supervisor code. + * sbi_clear_ipi() - Clear any pending IPIs for the calling hart. * * Return: None */ -void sbi_shutdown(void) +void sbi_clear_ipi(void) { - sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0); + sbi_ecall(SBI_EXT_0_1_CLEAR_IPI, 0, 0, 0, 0, 0, 0, 0); } EXPORT_SYMBOL(sbi_shutdown);
/** - * sbi_clear_ipi() - Clear any pending IPIs for the calling hart. + * sbi_set_timer_v01() - Program the timer for next timer event. + * @stime_value: The value after which next timer event should fire. * * Return: None */ -void sbi_clear_ipi(void) +static void __sbi_set_timer_v01(uint64_t stime_value) { - sbi_ecall(SBI_EXT_0_1_CLEAR_IPI, 0, 0, 0, 0, 0, 0, 0); +#if __riscv_xlen == 32 + sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, + stime_value >> 32, 0, 0, 0, 0); +#else + sbi_ecall(SBI_EXT_0_1_SET_TIMER, 0, stime_value, 0, 0, 0, 0, 0); +#endif +} + +static int __sbi_send_ipi_v01(const unsigned long *hart_mask) +{ + sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)hart_mask, + 0, 0, 0, 0, 0); + return 0; +} + +static int __sbi_rfence_v01(int fid, const unsigned long *hart_mask, + unsigned long hbase, unsigned long start, + unsigned long size, unsigned long arg4, + unsigned long arg5) +{ + int result = 0; + + /* v0.2 function IDs are equivalent to v0.1 extension IDs */ + switch (fid) { + case SBI_EXT_RFENCE_REMOTE_FENCE_I: + sbi_ecall(SBI_EXT_0_1_REMOTE_FENCE_I, 0, + (unsigned long)hart_mask, 0, 0, 0, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA: + sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA, 0, + (unsigned long)hart_mask, start, size, + 0, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID: + sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, 0, + (unsigned long)hart_mask, start, size, + arg4, 0, 0); + break; + default: + pr_err("SBI call [%d]not supported in SBI v0.1\n", fid); + result = -EINVAL; + } + + return result; +} +#else +static void __sbi_set_timer_v01(uint64_t stime_value) +{ + pr_warn("Timer extension is not available in SBI v%lu.%lu\n", + sbi_major_version(), sbi_minor_version()); +} +static int __sbi_send_ipi_v01(const unsigned long *hart_mask) +{ + pr_warn("IPI extension is not available in SBI v%lu.%lu\n", + sbi_major_version(), sbi_minor_version()); + + return 0; +} +static int __sbi_rfence_v01(int fid, + const unsigned long *hart_mask, + unsigned long hbase, unsigned long start, + unsigned long size, unsigned long arg4, + unsigned long arg5) +{ + pr_warn("remote fence extension is not available in SBI v%lu.%lu\n", + sbi_major_version(), sbi_minor_version()); + + return 0; +} +#endif /* CONFIG_RISCV_SBI_V01 */ + +/** + * sbi_set_timer() - Program the timer for next timer event. + * @stime_value: The value after which next timer event should fire. + * + * Return: None + */ +void sbi_set_timer(uint64_t stime_value) +{ + __sbi_set_timer(stime_value); }
/** @@ -130,11 +212,11 @@ void sbi_clear_ipi(void) */ void sbi_send_ipi(const unsigned long *hart_mask) { - sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)hart_mask, - 0, 0, 0, 0, 0); + __sbi_send_ipi(hart_mask); } EXPORT_SYMBOL(sbi_send_ipi);
+ /** * sbi_remote_fence_i() - Execute FENCE.I instruction on given remote harts. * @hart_mask: A cpu mask containing all the target harts. @@ -143,8 +225,8 @@ EXPORT_SYMBOL(sbi_send_ipi); */ void sbi_remote_fence_i(const unsigned long *hart_mask) { - sbi_ecall(SBI_EXT_0_1_REMOTE_FENCE_I, 0, (unsigned long)hart_mask, - 0, 0, 0, 0, 0); + __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I, + hart_mask, 0, 0, 0, 0, 0); } EXPORT_SYMBOL(sbi_remote_fence_i);
@@ -161,8 +243,8 @@ void sbi_remote_sfence_vma(const unsigned long *hart_mask, unsigned long start, unsigned long size) { - sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA, 0, - (unsigned long)hart_mask, start, size, 0, 0, 0); + __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA, + hart_mask, 0, start, size, 0, 0); } EXPORT_SYMBOL(sbi_remote_sfence_vma);
@@ -182,8 +264,8 @@ void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, unsigned long size, unsigned long asid) { - sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, 0, - (unsigned long)hart_mask, start, size, asid, 0, 0); + __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID, + hart_mask, 0, start, size, asid, 0); } EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
@@ -249,8 +331,15 @@ int __init sbi_init(void)
pr_info("SBI specification v%lu.%lu detected\n", sbi_major_version(), sbi_minor_version()); - if (!sbi_spec_is_0_1()) + + if (!sbi_spec_is_0_1()) { pr_info("SBI implementation ID=0x%lx Version=0x%lx\n", sbi_get_firmware_id(), sbi_get_firmware_version()); + } + + __sbi_set_timer = __sbi_set_timer_v01; + __sbi_send_ipi = __sbi_send_ipi_v01; + __sbi_rfence = __sbi_rfence_v01; + return 0; }
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Few v0.1 SBI calls are being replaced by new SBI calls that follows v0.2 calling convention.
Implement the replacement extensions and few additional new SBI function calls that makes way for a better SBI interface in future.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Reviewed-by: Anup Patel anup@brainfault.org Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/sbi.h | 14 +++ arch/riscv/kernel/sbi.c | 198 ++++++++++++++++++++++++++++++++++- 2 files changed, 208 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index 89a25a049d12..e5569e7fdd7c 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -96,6 +96,20 @@ void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, unsigned long start, unsigned long size, unsigned long asid); +int sbi_remote_hfence_gvma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size); +int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long vmid); +int sbi_remote_hfence_vvma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size); +int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long asid); int sbi_probe_extension(int ext);
/* Check if current SBI specification version is 0.1 or not */ diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c index 265637cb5eb0..a1f2fca11f71 100644 --- a/arch/riscv/kernel/sbi.c +++ b/arch/riscv/kernel/sbi.c @@ -193,6 +193,102 @@ static int __sbi_rfence_v01(int fid, } #endif /* CONFIG_RISCV_SBI_V01 */
+static void __sbi_set_timer_v02(uint64_t stime_value) +{ +#if __riscv_xlen == 32 + sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value, + stime_value >> 32, 0, 0, 0, 0); +#else + sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value, 0, + 0, 0, 0, 0); +#endif +} + +static int __sbi_send_ipi_v02(const unsigned long *hart_mask) +{ + unsigned long hmask_val; + struct cpumask tmask; + struct sbiret ret = {0}; + int result; + + if (!hart_mask) { + riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask); + hmask_val = *(cpumask_bits(&tmask)); + } else + hmask_val = *hart_mask; + + ret = sbi_ecall(SBI_EXT_IPI, SBI_EXT_IPI_SEND_IPI, hmask_val, + 0, 0, 0, 0, 0); + if (ret.error) { + result = sbi_err_map_linux_errno(ret.error); + pr_err("%s: failed with error [%d]\n", __func__, result); + } else + result = ret.value; + + return result; +} + +static int __sbi_rfence_v02(int fid, const unsigned long *hart_mask, + unsigned long hbase, unsigned long start, + unsigned long size, unsigned long arg4, + unsigned long arg5) +{ + unsigned long hmask_val; + struct cpumask tmask; + struct sbiret ret = {0}; + int ext = SBI_EXT_RFENCE; + int result; + + if (!hart_mask) { + riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask); + hmask_val = *(cpumask_bits(&tmask)); + } else + hmask_val = *hart_mask; + + switch (fid) { + case SBI_EXT_RFENCE_REMOTE_FENCE_I: + ret = sbi_ecall(ext, fid, hmask_val, 0, 0, 0, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, arg4, 0); + break; + /*TODO: Handle non zero hbase cases */ + case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, arg4, 0); + break; + case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, 0, 0); + break; + case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID: + ret = sbi_ecall(ext, fid, hmask_val, 0, start, + size, arg4, 0); + break; + default: + pr_err("unknown function ID [%d] for SBI extension [%d]\n", + fid, ext); + result = -EINVAL; + } + + if (ret.error) { + result = sbi_err_map_linux_errno(ret.error); + pr_err("%s: failed with error [%d]\n", __func__, result); + } else + result = ret.value; + + return result; +} + /** * sbi_set_timer() - Program the timer for next timer event. * @stime_value: The value after which next timer event should fire. @@ -269,6 +365,85 @@ void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask, } EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
+/** + * sbi_remote_hfence_gvma() - Execute HFENCE.GVMA instructions on given remote + * harts for the specified guest physical address range. + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the guest physical address + * @size: Total size of the guest physical address range. + * + * Return: None + */ +int sbi_remote_hfence_gvma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size) +{ + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA, + hart_mask, 0, start, size, 0, 0); +} +EXPORT_SYMBOL_GPL(sbi_remote_hfence_gvma); + +/** + * sbi_remote_hfence_gvma_vmid() - Execute HFENCE.GVMA instructions on given + * remote harts for a guest physical address range belonging to a specific VMID. + * + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the guest physical address + * @size: Total size of the guest physical address range. + * @vmid: The value of guest ID (VMID). + * + * Return: 0 if success, Error otherwise. + */ +int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long vmid) +{ + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID, + hart_mask, 0, start, size, vmid, 0); +} +EXPORT_SYMBOL(sbi_remote_hfence_gvma_vmid); + +/** + * sbi_remote_hfence_vvma() - Execute HFENCE.VVMA instructions on given remote + * harts for the current guest virtual address range. + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the current guest virtual address + * @size: Total size of the current guest virtual address range. + * + * Return: None + */ +int sbi_remote_hfence_vvma(const unsigned long *hart_mask, + unsigned long start, + unsigned long size) +{ + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA, + hart_mask, 0, start, size, 0, 0); +} +EXPORT_SYMBOL(sbi_remote_hfence_vvma); + +/** + * sbi_remote_hfence_vvma_asid() - Execute HFENCE.VVMA instructions on given + * remote harts for current guest virtual address range belonging to a specific + * ASID. + * + * @hart_mask: A cpu mask containing all the target harts. + * @start: Start of the current guest virtual address + * @size: Total size of the current guest virtual address range. + * @asid: The value of address space identifier (ASID). + * + * Return: None + */ +int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask, + unsigned long start, + unsigned long size, + unsigned long asid) +{ + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID, + hart_mask, 0, start, size, asid, 0); +} +EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid); + /** * sbi_probe_extension() - Check if an SBI extension ID is supported or not. * @extid: The extension ID to be probed. @@ -335,11 +510,26 @@ int __init sbi_init(void) if (!sbi_spec_is_0_1()) { pr_info("SBI implementation ID=0x%lx Version=0x%lx\n", sbi_get_firmware_id(), sbi_get_firmware_version()); + if (sbi_probe_extension(SBI_EXT_TIME) > 0) { + __sbi_set_timer = __sbi_set_timer_v02; + pr_info("SBI v0.2 TIME extension detected\n"); + } else + __sbi_set_timer = __sbi_set_timer_v01; + if (sbi_probe_extension(SBI_EXT_IPI) > 0) { + __sbi_send_ipi = __sbi_send_ipi_v02; + pr_info("SBI v0.2 IPI extension detected\n"); + } else + __sbi_send_ipi = __sbi_send_ipi_v01; + if (sbi_probe_extension(SBI_EXT_RFENCE) > 0) { + __sbi_rfence = __sbi_rfence_v02; + pr_info("SBI v0.2 RFENCE extension detected\n"); + } else + __sbi_rfence = __sbi_rfence_v01; + } else { + __sbi_set_timer = __sbi_set_timer_v01; + __sbi_send_ipi = __sbi_send_ipi_v01; + __sbi_rfence = __sbi_rfence_v01; }
- __sbi_set_timer = __sbi_set_timer_v01; - __sbi_send_ipi = __sbi_send_ipi_v01; - __sbi_rfence = __sbi_rfence_v01; - return 0; }
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch adds riscv_isa bitmap which represents Host ISA features common across all Host CPUs. The riscv_isa is not same as elf_hwcap because elf_hwcap will only have ISA features relevant for user-space apps whereas riscv_isa will have ISA features relevant to both kernel and user-space apps.
One of the use-case for riscv_isa bitmap is in KVM hypervisor where we will use it to do following operations:
1. Check whether hypervisor extension is available 2. Find ISA features that need to be virtualized (e.g. floating point support, vector extension, etc.)
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Atish Patra atish.patra@wdc.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/hwcap.h | 22 +++++++++ arch/riscv/kernel/cpufeature.c | 83 ++++++++++++++++++++++++++++++++-- 2 files changed, 102 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 1bb0cd04aec3..5589c012e004 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -8,6 +8,7 @@ #ifndef _ASM_RISCV_HWCAP_H #define _ASM_RISCV_HWCAP_H
+#include <linux/bits.h> #include <uapi/asm/hwcap.h>
#ifndef __ASSEMBLY__ @@ -22,6 +23,27 @@ enum { };
extern unsigned long elf_hwcap; + +#define RISCV_ISA_EXT_a ('a' - 'a') +#define RISCV_ISA_EXT_c ('c' - 'a') +#define RISCV_ISA_EXT_d ('d' - 'a') +#define RISCV_ISA_EXT_f ('f' - 'a') +#define RISCV_ISA_EXT_h ('h' - 'a') +#define RISCV_ISA_EXT_i ('i' - 'a') +#define RISCV_ISA_EXT_m ('m' - 'a') +#define RISCV_ISA_EXT_s ('s' - 'a') +#define RISCV_ISA_EXT_u ('u' - 'a') + +#define RISCV_ISA_EXT_MAX 256 + +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap); + +#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext) + +bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit); +#define riscv_isa_extension_available(isa_bitmap, ext) \ + __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext) + #endif
#endif /* _ASM_RISCV_HWCAP_H */ diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index a5ad00043104..ac202f44a670 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -6,6 +6,7 @@ * Copyright (C) 2017 SiFive */
+#include <linux/bitmap.h> #include <linux/of.h> #include <asm/processor.h> #include <asm/hwcap.h> @@ -13,15 +14,57 @@ #include <asm/switch_to.h>
unsigned long elf_hwcap __read_mostly; + +/* Host ISA bitmap */ +static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly; + #ifdef CONFIG_FPU bool has_fpu __read_mostly; #endif
+/** + * riscv_isa_extension_base() - Get base extension word + * + * @isa_bitmap: ISA bitmap to use + * Return: base extension word as unsigned long value + * + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used. + */ +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap) +{ + if (!isa_bitmap) + return riscv_isa[0]; + return isa_bitmap[0]; +} +EXPORT_SYMBOL_GPL(riscv_isa_extension_base); + +/** + * __riscv_isa_extension_available() - Check whether given extension + * is available or not + * + * @isa_bitmap: ISA bitmap to use + * @bit: bit position of the desired extension + * Return: true or false + * + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used. + */ +bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int bit) +{ + const unsigned long *bmap = (isa_bitmap) ? isa_bitmap : riscv_isa; + + if (bit >= RISCV_ISA_EXT_MAX) + return false; + + return test_bit(bit, bmap) ? true : false; +} +EXPORT_SYMBOL_GPL(__riscv_isa_extension_available); + void riscv_fill_hwcap(void) { struct device_node *node; const char *isa; - size_t i; + char print_str[BITS_PER_LONG + 1]; + size_t i, j, isa_len; static unsigned long isa2hwcap[256] = {0};
isa2hwcap['i'] = isa2hwcap['I'] = COMPAT_HWCAP_ISA_I; @@ -33,8 +76,11 @@ void riscv_fill_hwcap(void)
elf_hwcap = 0;
+ bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX); + for_each_of_cpu_node(node) { unsigned long this_hwcap = 0; + unsigned long this_isa = 0;
if (riscv_of_processor_hartid(node) < 0) continue; @@ -44,8 +90,24 @@ void riscv_fill_hwcap(void) continue; }
- for (i = 0; i < strlen(isa); ++i) + i = 0; + isa_len = strlen(isa); +#if IS_ENABLED(CONFIG_32BIT) + if (!strncmp(isa, "rv32", 4)) + i += 4; +#elif IS_ENABLED(CONFIG_64BIT) + if (!strncmp(isa, "rv64", 4)) + i += 4; +#endif + for (; i < isa_len; ++i) { this_hwcap |= isa2hwcap[(unsigned char)(isa[i])]; + /* + * TODO: X, Y and Z extension parsing for Host ISA + * bitmap will be added in-future. + */ + if ('a' <= isa[i] && isa[i] < 'x') + this_isa |= (1UL << (isa[i] - 'a')); + }
/* * All "okay" hart should have same isa. Set HWCAP based on @@ -56,6 +118,11 @@ void riscv_fill_hwcap(void) elf_hwcap &= this_hwcap; else elf_hwcap = this_hwcap; + + if (riscv_isa[0]) + riscv_isa[0] &= this_isa; + else + riscv_isa[0] = this_isa; }
/* We don't support systems with F but without D, so mask those out @@ -65,7 +132,17 @@ void riscv_fill_hwcap(void) elf_hwcap &= ~COMPAT_HWCAP_ISA_F; }
- pr_info("elf_hwcap is 0x%lx\n", elf_hwcap); + memset(print_str, 0, sizeof(print_str)); + for (i = 0, j = 0; i < BITS_PER_LONG; i++) + if (riscv_isa[0] & BIT_MASK(i)) + print_str[j++] = (char)('a' + i); + pr_info("riscv: ISA extensions %s\n", print_str); + + memset(print_str, 0, sizeof(print_str)); + for (i = 0, j = 0; i < BITS_PER_LONG; i++) + if (elf_hwcap & BIT_MASK(i)) + print_str[j++] = (char)('a' + i); + pr_info("riscv: ELF capabilities %s\n", print_str);
#ifdef CONFIG_FPU if (elf_hwcap & (COMPAT_HWCAP_ISA_F | COMPAT_HWCAP_ISA_D))
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Various Linux kernel DEBUG options have big performance impact so these should not be enabled in RISC-V normal defconfigs.
Instead we should have separate RISC-V fragmented config for enabling these DEBUG options. This way Linux RISC-V kernel can be built for both non-debug and debug purposes using same defconfig.
This patch moves additional DEBUG options to extra_debug.config.
To configure a non-debug RV64 kernel, we use our normal defconfig: $ make O=<linux_build_directory> defconfig Wherease to configure a debug RV64 kernel, we use extra_debug.config: $ make O=<linux_build_directory> defconfig extra_debug.config
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/configs/defconfig | 20 -------------------- arch/riscv/configs/extra_debug.config | 21 +++++++++++++++++++++ arch/riscv/configs/rv32_defconfig | 20 -------------------- 3 files changed, 21 insertions(+), 40 deletions(-) create mode 100644 arch/riscv/configs/extra_debug.config
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index e2ff95cb3390..65483902422f 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -101,27 +101,7 @@ CONFIG_CRYPTO_USER_API_HASH=y CONFIG_CRYPTO_DEV_VIRTIO=y CONFIG_PRINTK_TIME=y CONFIG_DEBUG_FS=y -CONFIG_DEBUG_PAGEALLOC=y -CONFIG_DEBUG_VM=y -CONFIG_DEBUG_VM_PGFLAGS=y -CONFIG_DEBUG_MEMORY_INIT=y -CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_WQ_WATCHDOG=y -CONFIG_SCHED_STACK_END_CHECK=y -CONFIG_DEBUG_TIMEKEEPING=y -CONFIG_DEBUG_RT_MUTEXES=y -CONFIG_DEBUG_SPINLOCK=y -CONFIG_DEBUG_MUTEXES=y -CONFIG_DEBUG_RWSEMS=y -CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_STACKTRACE=y -CONFIG_DEBUG_LIST=y -CONFIG_DEBUG_PLIST=y -CONFIG_DEBUG_SG=y # CONFIG_RCU_TRACE is not set -CONFIG_RCU_EQS_DEBUG=y -CONFIG_DEBUG_BLOCK_EXT_DEVT=y -# CONFIG_FTRACE is not set -# CONFIG_RUNTIME_TESTING_MENU is not set -CONFIG_MEMTEST=y diff --git a/arch/riscv/configs/extra_debug.config b/arch/riscv/configs/extra_debug.config new file mode 100644 index 000000000000..66c58bb645a4 --- /dev/null +++ b/arch/riscv/configs/extra_debug.config @@ -0,0 +1,21 @@ +CONFIG_DEBUG_PAGEALLOC=y +CONFIG_DEBUG_VM=y +CONFIG_DEBUG_VM_PGFLAGS=y +CONFIG_DEBUG_MEMORY_INIT=y +CONFIG_DEBUG_PER_CPU_MAPS=y +CONFIG_SOFTLOCKUP_DETECTOR=y +CONFIG_WQ_WATCHDOG=y +CONFIG_SCHED_STACK_END_CHECK=y +CONFIG_DEBUG_TIMEKEEPING=y +CONFIG_DEBUG_RT_MUTEXES=y +CONFIG_DEBUG_SPINLOCK=y +CONFIG_DEBUG_MUTEXES=y +CONFIG_DEBUG_RWSEMS=y +CONFIG_DEBUG_ATOMIC_SLEEP=y +CONFIG_STACKTRACE=y +CONFIG_DEBUG_LIST=y +CONFIG_DEBUG_PLIST=y +CONFIG_DEBUG_SG=y +CONFIG_RCU_EQS_DEBUG=y +CONFIG_DEBUG_BLOCK_EXT_DEVT=y +CONFIG_MEMTEST=y diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig index eb519407c841..bb9d1f3286a3 100644 --- a/arch/riscv/configs/rv32_defconfig +++ b/arch/riscv/configs/rv32_defconfig @@ -98,27 +98,7 @@ CONFIG_CRYPTO_USER_API_HASH=y CONFIG_CRYPTO_DEV_VIRTIO=y CONFIG_PRINTK_TIME=y CONFIG_DEBUG_FS=y -CONFIG_DEBUG_PAGEALLOC=y -CONFIG_DEBUG_VM=y -CONFIG_DEBUG_VM_PGFLAGS=y -CONFIG_DEBUG_MEMORY_INIT=y -CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_WQ_WATCHDOG=y -CONFIG_SCHED_STACK_END_CHECK=y -CONFIG_DEBUG_TIMEKEEPING=y -CONFIG_DEBUG_RT_MUTEXES=y -CONFIG_DEBUG_SPINLOCK=y -CONFIG_DEBUG_MUTEXES=y -CONFIG_DEBUG_RWSEMS=y -CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_STACKTRACE=y -CONFIG_DEBUG_LIST=y -CONFIG_DEBUG_PLIST=y -CONFIG_DEBUG_SG=y # CONFIG_RCU_TRACE is not set -CONFIG_RCU_EQS_DEBUG=y -CONFIG_DEBUG_BLOCK_EXT_DEVT=y -# CONFIG_FTRACE is not set -# CONFIG_RUNTIME_TESTING_MENU is not set -CONFIG_MEMTEST=y
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
DO NOT UPSTREAM !!!!!
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/configs/defconfig | 1 + arch/riscv/configs/rv32_defconfig | 1 + 2 files changed, 2 insertions(+)
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index 65483902422f..1e48b9c81e77 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -16,6 +16,7 @@ CONFIG_EXPERT=y CONFIG_BPF_SYSCALL=y CONFIG_SOC_SIFIVE=y CONFIG_SMP=y +CONFIG_HOTPLUG_CPU=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_NET=y diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig index bb9d1f3286a3..de073a3b9150 100644 --- a/arch/riscv/configs/rv32_defconfig +++ b/arch/riscv/configs/rv32_defconfig @@ -16,6 +16,7 @@ CONFIG_EXPERT=y CONFIG_BPF_SYSCALL=y CONFIG_ARCH_RV32I=y CONFIG_SMP=y +CONFIG_HOTPLUG_CPU=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_NET=y
From: Alistair Francis alistair.francis@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This reverts commit d4c08b9776b392e20efc6198ebe1bc8ec1911d9b.
The latest RISC-V 32bit glibc submission doesn't work with this patch, so let's revert it. This revert can be reverted when the glibc submission is updated to work on the 5.1 kernel.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Alistair Francis alistair.francis@wdc.com Upstream-Status: Inappropriate [enable feature] Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/Kconfig | 1 + arch/riscv/include/uapi/asm/unistd.h | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 2946627d2a70..630ae63c3976 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -12,6 +12,7 @@ config 32BIT
config RISCV def_bool y + select ARCH_32BIT_OFF_T if !64BIT # even on 32-bit, physical (and DMA) addresses are > 32-bits select PHYS_ADDR_T_64BIT select OF diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h index 13ce76cc5aff..95217ebb631d 100644 --- a/arch/riscv/include/uapi/asm/unistd.h +++ b/arch/riscv/include/uapi/asm/unistd.h @@ -17,9 +17,12 @@
#ifdef __LP64__ #define __ARCH_WANT_NEW_STAT -#define __ARCH_WANT_SET_GET_RLIMIT #define __ARCH_WANT_SYS_CLONE3 #endif /* __LP64__ */ +#define __ARCH_WANT_SET_GET_RLIMIT +#ifndef __LP64__ +#define __ARCH_WANT_TIME32_SYSCALLS +#endif
#include <asm-generic/unistd.h>
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch extends asm/csr.h by adding RISC-V hypervisor extension related defines.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/csr.h | 90 ++++++++++++++++++++++++++++++++++-- 1 file changed, 87 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 8e18d2c64399..c647c581a902 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -30,6 +30,8 @@ #define SR_XS_CLEAN _AC(0x00010000, UL) #define SR_XS_DIRTY _AC(0x00018000, UL)
+#define SR_MXR _AC(0x00080000, UL) + #ifndef CONFIG_64BIT #define SR_SD _AC(0x80000000, UL) /* FS/XS dirty */ #else @@ -51,26 +53,33 @@ #define CAUSE_IRQ_FLAG (_AC(1, UL) << (__riscv_xlen - 1))
/* Interrupt causes (minus the high bit) */ -#define IRQ_U_SOFT 0 #define IRQ_S_SOFT 1 +#define IRQ_VS_SOFT 2 #define IRQ_M_SOFT 3 -#define IRQ_U_TIMER 4 #define IRQ_S_TIMER 5 +#define IRQ_VS_TIMER 6 #define IRQ_M_TIMER 7 -#define IRQ_U_EXT 8 #define IRQ_S_EXT 9 +#define IRQ_VS_EXT 10 #define IRQ_M_EXT 11
/* Exception causes */ #define EXC_INST_MISALIGNED 0 #define EXC_INST_ACCESS 1 +#define EXC_INST_ILLEGAL 2 #define EXC_BREAKPOINT 3 #define EXC_LOAD_ACCESS 5 #define EXC_STORE_ACCESS 7 #define EXC_SYSCALL 8 +#define EXC_HYPERVISOR_SYSCALL 9 +#define EXC_SUPERVISOR_SYSCALL 10 #define EXC_INST_PAGE_FAULT 12 #define EXC_LOAD_PAGE_FAULT 13 #define EXC_STORE_PAGE_FAULT 15 +#define EXC_INST_GUEST_PAGE_FAULT 20 +#define EXC_LOAD_GUEST_PAGE_FAULT 21 +#define EXC_VIRTUAL_INST_FAULT 22 +#define EXC_STORE_GUEST_PAGE_FAULT 23
/* PMP configuration */ #define PMP_R 0x01 @@ -82,6 +91,56 @@ #define PMP_A_NAPOT 0x18 #define PMP_L 0x80
+/* HSTATUS flags */ +#ifdef CONFIG_64BIT +#define HSTATUS_VSXL _AC(0x300000000, UL) +#define HSTATUS_VSXL_SHIFT 32 +#endif +#define HSTATUS_VTSR _AC(0x00400000, UL) +#define HSTATUS_VTW _AC(0x00200000, UL) +#define HSTATUS_VTVM _AC(0x00100000, UL) +#define HSTATUS_VGEIN _AC(0x0003f000, UL) +#define HSTATUS_VGEIN_SHIFT 12 +#define HSTATUS_HU _AC(0x00000200, UL) +#define HSTATUS_SPVP _AC(0x00000100, UL) +#define HSTATUS_SPV _AC(0x00000080, UL) +#define HSTATUS_GVA _AC(0x00000040, UL) +#define HSTATUS_VSBE _AC(0x00000020, UL) + +/* HGATP flags */ +#define HGATP_MODE_OFF _AC(0, UL) +#define HGATP_MODE_SV32X4 _AC(1, UL) +#define HGATP_MODE_SV39X4 _AC(8, UL) +#define HGATP_MODE_SV48X4 _AC(9, UL) + +#define HGATP32_MODE_SHIFT 31 +#define HGATP32_VMID_SHIFT 22 +#define HGATP32_VMID_MASK _AC(0x1FC00000, UL) +#define HGATP32_PPN _AC(0x003FFFFF, UL) + +#define HGATP64_MODE_SHIFT 60 +#define HGATP64_VMID_SHIFT 44 +#define HGATP64_VMID_MASK _AC(0x03FFF00000000000, UL) +#define HGATP64_PPN _AC(0x00000FFFFFFFFFFF, UL) + +#ifdef CONFIG_64BIT +#define HGATP_PPN HGATP64_PPN +#define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT +#define HGATP_VMID_MASK HGATP64_VMID_MASK +#define HGATP_MODE (HGATP_MODE_SV39X4 << HGATP64_MODE_SHIFT) +#else +#define HGATP_PPN HGATP32_PPN +#define HGATP_VMID_SHIFT HGATP32_VMID_SHIFT +#define HGATP_VMID_MASK HGATP32_VMID_MASK +#define HGATP_MODE (HGATP_MODE_SV32X4 << HGATP32_MODE_SHIFT) +#endif + +/* VSIP & HVIP relation */ +#define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT) +#define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \ + (_AC(1, UL) << IRQ_S_TIMER) | \ + (_AC(1, UL) << IRQ_S_EXT)) + /* symbolic CSR names: */ #define CSR_CYCLE 0xc00 #define CSR_TIME 0xc01 @@ -101,6 +160,31 @@ #define CSR_SIP 0x144 #define CSR_SATP 0x180
+#define CSR_VSSTATUS 0x200 +#define CSR_VSIE 0x204 +#define CSR_VSTVEC 0x205 +#define CSR_VSSCRATCH 0x240 +#define CSR_VSEPC 0x241 +#define CSR_VSCAUSE 0x242 +#define CSR_VSTVAL 0x243 +#define CSR_VSIP 0x244 +#define CSR_VSATP 0x280 + +#define CSR_HSTATUS 0x600 +#define CSR_HEDELEG 0x602 +#define CSR_HIDELEG 0x603 +#define CSR_HIE 0x604 +#define CSR_HTIMEDELTA 0x605 +#define CSR_HCOUNTEREN 0x606 +#define CSR_HGEIE 0x607 +#define CSR_HTIMEDELTAH 0x615 +#define CSR_HTVAL 0x643 +#define CSR_HIP 0x644 +#define CSR_HVIP 0x645 +#define CSR_HTINST 0x64a +#define CSR_HGATP 0x680 +#define CSR_HGEIP 0xe12 + #define CSR_MSTATUS 0x300 #define CSR_MISA 0x301 #define CSR_MIE 0x304
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch adds initial skeletal KVM RISC-V support which has: 1. A simple implementation of arch specific VM functions except kvm_vm_ioctl_get_dirty_log() which will implemeted in-future as part of stage2 page loging. 2. Stubs of required arch specific VCPU functions except kvm_arch_vcpu_ioctl_run() which is semi-complete and extended by subsequent patches. 3. Stubs for required arch specific stage2 MMU functions.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/Kconfig | 2 + arch/riscv/Makefile | 2 + arch/riscv/include/asm/kvm_host.h | 91 +++++++++ arch/riscv/include/uapi/asm/kvm.h | 47 +++++ arch/riscv/kvm/Kconfig | 33 +++ arch/riscv/kvm/Makefile | 13 ++ arch/riscv/kvm/main.c | 95 +++++++++ arch/riscv/kvm/mmu.c | 86 ++++++++ arch/riscv/kvm/vcpu.c | 326 ++++++++++++++++++++++++++++++ arch/riscv/kvm/vcpu_exit.c | 35 ++++ arch/riscv/kvm/vm.c | 79 ++++++++ 11 files changed, 809 insertions(+) create mode 100644 arch/riscv/include/asm/kvm_host.h create mode 100644 arch/riscv/include/uapi/asm/kvm.h create mode 100644 arch/riscv/kvm/Kconfig create mode 100644 arch/riscv/kvm/Makefile create mode 100644 arch/riscv/kvm/main.c create mode 100644 arch/riscv/kvm/mmu.c create mode 100644 arch/riscv/kvm/vcpu.c create mode 100644 arch/riscv/kvm/vcpu_exit.c create mode 100644 arch/riscv/kvm/vm.c
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 630ae63c3976..9c6ba54eadae 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -368,3 +368,5 @@ menu "Power management options" source "kernel/power/Kconfig"
endmenu + +source "arch/riscv/kvm/Kconfig" diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index b9009a2fbaf5..65b58cec9abd 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -77,6 +77,8 @@ head-y := arch/riscv/kernel/head.o
core-y += arch/riscv/
+core-$(CONFIG_KVM) += arch/riscv/kvm/ + libs-y += arch/riscv/lib/
PHONY += vdso_install diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h new file mode 100644 index 000000000000..fdd63bafb714 --- /dev/null +++ b/arch/riscv/include/asm/kvm_host.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#ifndef __RISCV_KVM_HOST_H__ +#define __RISCV_KVM_HOST_H__ + +#include <linux/types.h> +#include <linux/kvm.h> +#include <linux/kvm_types.h> + +#ifdef CONFIG_64BIT +#define KVM_MAX_VCPUS (1U << 16) +#else +#define KVM_MAX_VCPUS (1U << 9) +#endif + +#define KVM_USER_MEM_SLOTS 512 +#define KVM_HALT_POLL_NS_DEFAULT 500000 + +#define KVM_VCPU_MAX_FEATURES 0 + +#define KVM_REQ_SLEEP \ + KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1) + +struct kvm_vm_stat { + ulong remote_tlb_flush; +}; + +struct kvm_vcpu_stat { + u64 halt_successful_poll; + u64 halt_attempted_poll; + u64 halt_poll_success_ns; + u64 halt_poll_fail_ns; + u64 halt_poll_invalid; + u64 halt_wakeup; + u64 ecall_exit_stat; + u64 wfi_exit_stat; + u64 mmio_exit_user; + u64 mmio_exit_kernel; + u64 exits; +}; + +struct kvm_arch_memory_slot { +}; + +struct kvm_arch { + /* stage2 page table */ + pgd_t *pgd; + phys_addr_t pgd_phys; +}; + +struct kvm_cpu_trap { + unsigned long sepc; + unsigned long scause; + unsigned long stval; + unsigned long htval; + unsigned long htinst; +}; + +struct kvm_vcpu_arch { + /* Don't run the VCPU (blocked) */ + bool pause; + + /* SRCU lock index for in-kernel run loop */ + int srcu_idx; +}; + +static inline void kvm_arch_hardware_unsetup(void) {} +static inline void kvm_arch_sync_events(struct kvm *kvm) {} +static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} +static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} + +void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); +int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); +void kvm_riscv_stage2_free_pgd(struct kvm *kvm); +void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu); + +int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_cpu_trap *trap); + +static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {} + +#endif /* __RISCV_KVM_HOST_H__ */ diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h new file mode 100644 index 000000000000..d15875818b6e --- /dev/null +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#ifndef __LINUX_KVM_RISCV_H +#define __LINUX_KVM_RISCV_H + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> +#include <asm/ptrace.h> + +#define __KVM_HAVE_READONLY_MEM + +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1 + +/* for KVM_GET_REGS and KVM_SET_REGS */ +struct kvm_regs { +}; + +/* for KVM_GET_FPU and KVM_SET_FPU */ +struct kvm_fpu { +}; + +/* KVM Debug exit structure */ +struct kvm_debug_exit_arch { +}; + +/* for KVM_SET_GUEST_DEBUG */ +struct kvm_guest_debug_arch { +}; + +/* definition of registers in kvm_run */ +struct kvm_sync_regs { +}; + +/* dummy definition */ +struct kvm_sregs { +}; + +#endif + +#endif /* __LINUX_KVM_RISCV_H */ diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig new file mode 100644 index 000000000000..88edd477b3a8 --- /dev/null +++ b/arch/riscv/kvm/Kconfig @@ -0,0 +1,33 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# KVM configuration +# + +source "virt/kvm/Kconfig" + +menuconfig VIRTUALIZATION + bool "Virtualization" + help + Say Y here to get to see options for using your Linux host to run + other operating systems inside virtual machines (guests). + This option alone does not add any kernel code. + + If you say N, all options in this submenu will be skipped and + disabled. + +if VIRTUALIZATION + +config KVM + tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)" + depends on RISCV_SBI && MMU + select PREEMPT_NOTIFIERS + select ANON_INODES + select KVM_MMIO + select HAVE_KVM_VCPU_ASYNC_IOCTL + select SRCU + help + Support hosting virtualized guest machines. + + If unsure, say N. + +endif # VIRTUALIZATION diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile new file mode 100644 index 000000000000..37b5a59d4f4f --- /dev/null +++ b/arch/riscv/kvm/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for RISC-V KVM support +# + +common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o) + +ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm + +kvm-objs := $(common-objs-y) + +kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o + +obj-$(CONFIG_KVM) += kvm.o diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c new file mode 100644 index 000000000000..47926f0c175d --- /dev/null +++ b/arch/riscv/kvm/main.c @@ -0,0 +1,95 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/kvm_host.h> +#include <asm/csr.h> +#include <asm/hwcap.h> +#include <asm/sbi.h> + +long kvm_arch_dev_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + return -EINVAL; +} + +int kvm_arch_check_processor_compat(void *opaque) +{ + return 0; +} + +int kvm_arch_hardware_setup(void *opaque) +{ + return 0; +} + +int kvm_arch_hardware_enable(void) +{ + unsigned long hideleg, hedeleg; + + hedeleg = 0; + hedeleg |= (1UL << EXC_INST_MISALIGNED); + hedeleg |= (1UL << EXC_BREAKPOINT); + hedeleg |= (1UL << EXC_SYSCALL); + hedeleg |= (1UL << EXC_INST_PAGE_FAULT); + hedeleg |= (1UL << EXC_LOAD_PAGE_FAULT); + hedeleg |= (1UL << EXC_STORE_PAGE_FAULT); + csr_write(CSR_HEDELEG, hedeleg); + + hideleg = 0; + hideleg |= (1UL << IRQ_VS_SOFT); + hideleg |= (1UL << IRQ_VS_TIMER); + hideleg |= (1UL << IRQ_VS_EXT); + csr_write(CSR_HIDELEG, hideleg); + + csr_write(CSR_HCOUNTEREN, -1UL); + + csr_write(CSR_HVIP, 0); + + return 0; +} + +void kvm_arch_hardware_disable(void) +{ + csr_write(CSR_HEDELEG, 0); + csr_write(CSR_HIDELEG, 0); +} + +int kvm_arch_init(void *opaque) +{ + if (!riscv_isa_extension_available(NULL, h)) { + kvm_info("hypervisor extension not available\n"); + return -ENODEV; + } + + if (sbi_spec_is_0_1()) { + kvm_info("require SBI v0.2 or higher\n"); + return -ENODEV; + } + + if (sbi_probe_extension(SBI_EXT_RFENCE) <= 0) { + kvm_info("require SBI RFENCE extension\n"); + return -ENODEV; + } + + kvm_info("hypervisor extension available\n"); + + return 0; +} + +void kvm_arch_exit(void) +{ +} + +static int riscv_kvm_init(void) +{ + return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); +} +module_init(riscv_kvm_init); diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c new file mode 100644 index 000000000000..ec13507e8a18 --- /dev/null +++ b/arch/riscv/kvm/mmu.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/bitops.h> +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/hugetlb.h> +#include <linux/module.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/kvm_host.h> +#include <linux/sched/signal.h> +#include <asm/page.h> +#include <asm/pgtable.h> + +void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ +} + +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free) +{ +} + +int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, + unsigned long npages) +{ + return 0; +} + +void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) +{ +} + +void kvm_arch_flush_shadow_all(struct kvm *kvm) +{ + /* TODO: */ +} + +void kvm_arch_flush_shadow_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot) +{ +} + +void kvm_arch_commit_memory_region(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *old, + const struct kvm_memory_slot *new, + enum kvm_mr_change change) +{ + /* TODO: */ +} + +int kvm_arch_prepare_memory_region(struct kvm *kvm, + struct kvm_memory_slot *memslot, + const struct kvm_userspace_memory_region *mem, + enum kvm_mr_change change) +{ + /* TODO: */ + return 0; +} + +void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu) +{ + /* TODO: */ +} + +int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm) +{ + /* TODO: */ + return 0; +} + +void kvm_riscv_stage2_free_pgd(struct kvm *kvm) +{ + /* TODO: */ +} + +void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu) +{ + /* TODO: */ +} diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c new file mode 100644 index 000000000000..8d8d140a0caf --- /dev/null +++ b/arch/riscv/kvm/vcpu.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/bitops.h> +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/kdebug.h> +#include <linux/module.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/sched/signal.h> +#include <linux/fs.h> +#include <linux/kvm_host.h> +#include <asm/csr.h> +#include <asm/delay.h> +#include <asm/hwcap.h> + +struct kvm_stats_debugfs_item debugfs_entries[] = { + VCPU_STAT("halt_successful_poll", halt_successful_poll), + VCPU_STAT("halt_attempted_poll", halt_attempted_poll), + VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns), + VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns), + VCPU_STAT("halt_poll_invalid", halt_poll_invalid), + VCPU_STAT("halt_wakeup", halt_wakeup), + VCPU_STAT("ecall_exit_stat", ecall_exit_stat), + VCPU_STAT("wfi_exit_stat", wfi_exit_stat), + VCPU_STAT("mmio_exit_user", mmio_exit_user), + VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel), + VCPU_STAT("exits", exits), + { NULL } +}; + +int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) +{ + return 0; +} + +int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return 0; +} + +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) +{ + return 0; +} + +void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) +{ +} + +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return 0; +} + +void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) +{ + /* TODO: */ +} + +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return 0; +} + +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) +{ +} + +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) +{ +} + +int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return 0; +} + +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return 0; +} + +bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) +{ + /* TODO: */ + return false; +} + +bool kvm_arch_has_vcpu_debugfs(void) +{ + return false; +} + +int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) +{ + return 0; +} + +vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + +long kvm_arch_vcpu_async_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + /* TODO; */ + return -ENOIOCTLCMD; +} + +long kvm_arch_vcpu_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + /* TODO: */ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, + struct kvm_sregs *sregs) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, + struct kvm_sregs *sregs) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, + struct kvm_translation *tr) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, + struct kvm_mp_state *mp_state) +{ + /* TODO: */ + return 0; +} + +int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, + struct kvm_mp_state *mp_state) +{ + /* TODO: */ + return 0; +} + +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, + struct kvm_guest_debug *dbg) +{ + /* TODO; To be implemented later. */ + return -EINVAL; +} + +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + /* TODO: */ + + kvm_riscv_stage2_update_hgatp(vcpu); +} + +void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) +{ + /* TODO: */ +} + +static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) +{ + /* TODO: */ +} + +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) +{ + int ret; + struct kvm_cpu_trap trap; + struct kvm_run *run = vcpu->run; + + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + + /* Process MMIO value returned from user-space */ + if (run->exit_reason == KVM_EXIT_MMIO) { + ret = kvm_riscv_vcpu_mmio_return(vcpu, vcpu->run); + if (ret) { + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + return ret; + } + } + + if (run->immediate_exit) { + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + return -EINTR; + } + + vcpu_load(vcpu); + + kvm_sigset_activate(vcpu); + + ret = 1; + run->exit_reason = KVM_EXIT_UNKNOWN; + while (ret > 0) { + /* Check conditions before entering the guest */ + cond_resched(); + + kvm_riscv_check_vcpu_requests(vcpu); + + preempt_disable(); + + local_irq_disable(); + + /* + * Exit if we have a signal pending so that we can deliver + * the signal to user space. + */ + if (signal_pending(current)) { + ret = -EINTR; + run->exit_reason = KVM_EXIT_INTR; + } + + /* + * Ensure we set mode to IN_GUEST_MODE after we disable + * interrupts and before the final VCPU requests check. + * See the comment in kvm_vcpu_exiting_guest_mode() and + * Documentation/virtual/kvm/vcpu-requests.rst + */ + vcpu->mode = IN_GUEST_MODE; + + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + smp_mb__after_srcu_read_unlock(); + + if (ret <= 0 || + kvm_request_pending(vcpu)) { + vcpu->mode = OUTSIDE_GUEST_MODE; + local_irq_enable(); + preempt_enable(); + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + continue; + } + + guest_enter_irqoff(); + + __kvm_riscv_switch_to(&vcpu->arch); + + vcpu->mode = OUTSIDE_GUEST_MODE; + vcpu->stat.exits++; + + /* + * Save SCAUSE, STVAL, HTVAL, and HTINST because we might + * get an interrupt between __kvm_riscv_switch_to() and + * local_irq_enable() which can potentially change CSRs. + */ + trap.sepc = 0; + trap.scause = csr_read(CSR_SCAUSE); + trap.stval = csr_read(CSR_STVAL); + trap.htval = csr_read(CSR_HTVAL); + trap.htinst = csr_read(CSR_HTINST); + + /* + * We may have taken a host interrupt in VS/VU-mode (i.e. + * while executing the guest). This interrupt is still + * pending, as we haven't serviced it yet! + * + * We're now back in HS-mode with interrupts disabled + * so enabling the interrupts now will have the effect + * of taking the interrupt again, in HS-mode this time. + */ + local_irq_enable(); + + /* + * We do local_irq_enable() before calling guest_exit() so + * that if a timer interrupt hits while running the guest + * we account that tick as being spent in the guest. We + * enable preemption after calling guest_exit() so that if + * we get preempted we make sure ticks after that is not + * counted as guest time. + */ + guest_exit(); + + preempt_enable(); + + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + + ret = kvm_riscv_vcpu_exit(vcpu, run, &trap); + } + + kvm_sigset_deactivate(vcpu); + + vcpu_put(vcpu); + + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + + return ret; +} diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c new file mode 100644 index 000000000000..4484e9200fe4 --- /dev/null +++ b/arch/riscv/kvm/vcpu_exit.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/kvm_host.h> + +/** + * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation + * or in-kernel IO emulation + * + * @vcpu: The VCPU pointer + * @run: The VCPU run struct containing the mmio data + */ +int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) +{ + /* TODO: */ + return 0; +} + +/* + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on + * proper exit to userspace. + */ +int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_cpu_trap *trap) +{ + /* TODO: */ + return 0; +} diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c new file mode 100644 index 000000000000..ac0211820521 --- /dev/null +++ b/arch/riscv/kvm/vm.c @@ -0,0 +1,79 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/uaccess.h> +#include <linux/kvm_host.h> + +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +{ + /* TODO: To be added later. */ + return -ENOTSUPP; +} + +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) +{ + int r; + + r = kvm_riscv_stage2_alloc_pgd(kvm); + if (r) + return r; + + return 0; +} + +void kvm_arch_destroy_vm(struct kvm *kvm) +{ + int i; + + for (i = 0; i < KVM_MAX_VCPUS; ++i) { + if (kvm->vcpus[i]) { + kvm_arch_vcpu_destroy(kvm->vcpus[i]); + kvm->vcpus[i] = NULL; + } + } +} + +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) +{ + int r; + + switch (ext) { + case KVM_CAP_DEVICE_CTRL: + case KVM_CAP_USER_MEMORY: + case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: + case KVM_CAP_ONE_REG: + case KVM_CAP_READONLY_MEM: + case KVM_CAP_MP_STATE: + case KVM_CAP_IMMEDIATE_EXIT: + r = 1; + break; + case KVM_CAP_NR_VCPUS: + r = num_online_cpus(); + break; + case KVM_CAP_MAX_VCPUS: + r = KVM_MAX_VCPUS; + break; + case KVM_CAP_NR_MEMSLOTS: + r = KVM_USER_MEM_SLOTS; + break; + default: + r = 0; + break; + } + + return r; +} + +long kvm_arch_vm_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + return -EINVAL; +}
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch implements VCPU create, init and destroy functions required by generic KVM module. We don't have much dynamic resources in struct kvm_vcpu_arch so these functions are quite simple for KVM RISC-V.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 68 +++++++++++++++++++++++++++++++ arch/riscv/kvm/vcpu.c | 55 +++++++++++++++++++++---- 2 files changed, 114 insertions(+), 9 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index fdd63bafb714..e0aeed7d5144 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -63,7 +63,75 @@ struct kvm_cpu_trap { unsigned long htinst; };
+struct kvm_cpu_context { + unsigned long zero; + unsigned long ra; + unsigned long sp; + unsigned long gp; + unsigned long tp; + unsigned long t0; + unsigned long t1; + unsigned long t2; + unsigned long s0; + unsigned long s1; + unsigned long a0; + unsigned long a1; + unsigned long a2; + unsigned long a3; + unsigned long a4; + unsigned long a5; + unsigned long a6; + unsigned long a7; + unsigned long s2; + unsigned long s3; + unsigned long s4; + unsigned long s5; + unsigned long s6; + unsigned long s7; + unsigned long s8; + unsigned long s9; + unsigned long s10; + unsigned long s11; + unsigned long t3; + unsigned long t4; + unsigned long t5; + unsigned long t6; + unsigned long sepc; + unsigned long sstatus; + unsigned long hstatus; +}; + +struct kvm_vcpu_csr { + unsigned long vsstatus; + unsigned long hie; + unsigned long vstvec; + unsigned long vsscratch; + unsigned long vsepc; + unsigned long vscause; + unsigned long vstval; + unsigned long hvip; + unsigned long vsatp; +}; + struct kvm_vcpu_arch { + /* VCPU ran atleast once */ + bool ran_atleast_once; + + /* ISA feature bits (similar to MISA) */ + unsigned long isa; + + /* CPU context of Guest VCPU */ + struct kvm_cpu_context guest_context; + + /* CPU CSR context of Guest VCPU */ + struct kvm_vcpu_csr guest_csr; + + /* CPU context upon Guest VCPU reset */ + struct kvm_cpu_context guest_reset_context; + + /* CPU CSR context upon Guest VCPU reset */ + struct kvm_vcpu_csr guest_reset_csr; + /* Don't run the VCPU (blocked) */ bool pause;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 8d8d140a0caf..84deeddbffbe 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -35,6 +35,27 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } };
+#define KVM_RISCV_ISA_ALLOWED (riscv_isa_extension_mask(a) | \ + riscv_isa_extension_mask(c) | \ + riscv_isa_extension_mask(d) | \ + riscv_isa_extension_mask(f) | \ + riscv_isa_extension_mask(i) | \ + riscv_isa_extension_mask(m) | \ + riscv_isa_extension_mask(s) | \ + riscv_isa_extension_mask(u)) + +static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr; + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context; + + memcpy(csr, reset_csr, sizeof(*csr)); + + memcpy(cntx, reset_cntx, sizeof(*cntx)); +} + int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) { return 0; @@ -42,7 +63,25 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) { - /* TODO: */ + struct kvm_cpu_context *cntx; + + /* Mark this VCPU never ran */ + vcpu->arch.ran_atleast_once = false; + + /* Setup ISA features available to VCPU */ + vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED; + + /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */ + cntx = &vcpu->arch.guest_reset_context; + cntx->sstatus = SR_SPP | SR_SPIE; + cntx->hstatus = 0; + cntx->hstatus |= HSTATUS_VTW; + cntx->hstatus |= HSTATUS_SPVP; + cntx->hstatus |= HSTATUS_SPV; + + /* Reset VCPU */ + kvm_riscv_reset_vcpu(vcpu); + return 0; }
@@ -55,15 +94,10 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) { }
-int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) -{ - /* TODO: */ - return 0; -} - void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) { - /* TODO: */ + /* Flush the pages pre-allocated for Stage2 page table mappings */ + kvm_riscv_stage2_flush_cache(vcpu); }
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) @@ -209,6 +243,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) struct kvm_cpu_trap trap; struct kvm_run *run = vcpu->run;
+ /* Mark this VCPU ran at least once */ + vcpu->arch.ran_atleast_once = true; + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
/* Process MMIO value returned from user-space */ @@ -282,7 +319,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) * get an interrupt between __kvm_riscv_switch_to() and * local_irq_enable() which can potentially change CSRs. */ - trap.sepc = 0; + trap.sepc = vcpu->arch.guest_context.sepc; trap.scause = csr_read(CSR_SCAUSE); trap.stval = csr_read(CSR_STVAL); trap.htval = csr_read(CSR_HTVAL);
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch implements VCPU interrupts and requests which are both asynchronous events.
The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from user-space. In future, the in-kernel IRQCHIP emulation will use kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt() functions to set/unset VCPU interrupts.
Important VCPU requests implemented by this patch are: KVM_REQ_SLEEP - set whenever VCPU itself goes to sleep state KVM_REQ_VCPU_RESET - set whenever VCPU reset is requested
The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request and kvm_riscv_vcpu_has_interrupt() function.
The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added later) to power-up a VCPU in power-off state. The user-space can use the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 23 ++++ arch/riscv/include/uapi/asm/kvm.h | 3 + arch/riscv/kvm/vcpu.c | 182 +++++++++++++++++++++++++++--- 3 files changed, 195 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index e0aeed7d5144..7dc52559b4fc 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -132,6 +132,21 @@ struct kvm_vcpu_arch { /* CPU CSR context upon Guest VCPU reset */ struct kvm_vcpu_csr guest_reset_csr;
+ /* + * VCPU interrupts + * + * We have a lockless approach for tracking pending VCPU interrupts + * implemented using atomic bitops. The irqs_pending bitmap represent + * pending interrupts whereas irqs_pending_mask represent bits changed + * in irqs_pending. Our approach is modeled around multiple producer + * and single consumer problem where the consumer is the VCPU itself. + */ + unsigned long irqs_pending; + unsigned long irqs_pending_mask; + + /* VCPU power-off state */ + bool power_off; + /* Don't run the VCPU (blocked) */ bool pause;
@@ -156,4 +171,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); +int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); +void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu); +void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu); +bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask); +void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu); +void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu); + #endif /* __RISCV_KVM_HOST_H__ */ diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index d15875818b6e..6dbc056d58ba 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -18,6 +18,9 @@
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_INTERRUPT_SET -1U +#define KVM_INTERRUPT_UNSET -2U + /* for KVM_GET_REGS and KVM_SET_REGS */ struct kvm_regs { }; diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 84deeddbffbe..7acb2e622597 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -11,6 +11,7 @@ #include <linux/err.h> #include <linux/kdebug.h> #include <linux/module.h> +#include <linux/percpu.h> #include <linux/uaccess.h> #include <linux/vmalloc.h> #include <linux/sched/signal.h> @@ -54,6 +55,9 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) memcpy(csr, reset_csr, sizeof(*csr));
memcpy(cntx, reset_cntx, sizeof(*cntx)); + + WRITE_ONCE(vcpu->arch.irqs_pending, 0); + WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); }
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) @@ -102,8 +106,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) { - /* TODO: */ - return 0; + return kvm_riscv_vcpu_has_interrupts(vcpu, 1UL << IRQ_VS_TIMER); }
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) @@ -116,20 +119,18 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { - /* TODO: */ - return 0; + return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) && + !vcpu->arch.power_off && !vcpu->arch.pause); }
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { - /* TODO: */ - return 0; + return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; }
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) { - /* TODO: */ - return false; + return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false; }
bool kvm_arch_has_vcpu_debugfs(void) @@ -150,7 +151,21 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) long kvm_arch_vcpu_async_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - /* TODO; */ + struct kvm_vcpu *vcpu = filp->private_data; + void __user *argp = (void __user *)arg; + + if (ioctl == KVM_INTERRUPT) { + struct kvm_interrupt irq; + + if (copy_from_user(&irq, argp, sizeof(irq))) + return -EFAULT; + + if (irq.irq == KVM_INTERRUPT_SET) + return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_EXT); + else + return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT); + } + return -ENOIOCTLCMD; }
@@ -199,18 +214,121 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) return -EINVAL; }
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + unsigned long mask, val; + + if (READ_ONCE(vcpu->arch.irqs_pending_mask)) { + mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0); + val = READ_ONCE(vcpu->arch.irqs_pending) & mask; + + csr->hvip &= ~mask; + csr->hvip |= val; + } +} + +void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu) +{ + unsigned long hvip; + struct kvm_vcpu_arch *v = &vcpu->arch; + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + + /* Read current HVIP and HIE CSRs */ + hvip = csr_read(CSR_HVIP); + csr->hie = csr_read(CSR_HIE); + + /* Sync-up HVIP.VSSIP bit changes does by Guest */ + if ((csr->hvip ^ hvip) & (1UL << IRQ_VS_SOFT)) { + if (hvip & (1UL << IRQ_VS_SOFT)) { + if (!test_and_set_bit(IRQ_VS_SOFT, + &v->irqs_pending_mask)) + set_bit(IRQ_VS_SOFT, &v->irqs_pending); + } else { + if (!test_and_set_bit(IRQ_VS_SOFT, + &v->irqs_pending_mask)) + clear_bit(IRQ_VS_SOFT, &v->irqs_pending); + } + } +} + +int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) +{ + if (irq != IRQ_VS_SOFT && + irq != IRQ_VS_TIMER && + irq != IRQ_VS_EXT) + return -EINVAL; + + set_bit(irq, &vcpu->arch.irqs_pending); + smp_mb__before_atomic(); + set_bit(irq, &vcpu->arch.irqs_pending_mask); + + kvm_vcpu_kick(vcpu); + + return 0; +} + +int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) +{ + if (irq != IRQ_VS_SOFT && + irq != IRQ_VS_TIMER && + irq != IRQ_VS_EXT) + return -EINVAL; + + clear_bit(irq, &vcpu->arch.irqs_pending); + smp_mb__before_atomic(); + set_bit(irq, &vcpu->arch.irqs_pending_mask); + + return 0; +} + +bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask) +{ + return (READ_ONCE(vcpu->arch.irqs_pending) & + vcpu->arch.guest_csr.hie & mask) ? true : false; +} + +void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu) +{ + vcpu->arch.power_off = true; + kvm_make_request(KVM_REQ_SLEEP, vcpu); + kvm_vcpu_kick(vcpu); +} + +void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu) +{ + vcpu->arch.power_off = false; + kvm_vcpu_wake_up(vcpu); +} + int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { - /* TODO: */ + if (vcpu->arch.power_off) + mp_state->mp_state = KVM_MP_STATE_STOPPED; + else + mp_state->mp_state = KVM_MP_STATE_RUNNABLE; + return 0; }
int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { - /* TODO: */ - return 0; + int ret = 0; + + switch (mp_state->mp_state) { + case KVM_MP_STATE_RUNNABLE: + vcpu->arch.power_off = false; + break; + case KVM_MP_STATE_STOPPED: + kvm_riscv_vcpu_power_off(vcpu); + break; + default: + ret = -EINVAL; + } + + return ret; }
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, @@ -234,7 +352,33 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) { - /* TODO: */ + struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); + + if (kvm_request_pending(vcpu)) { + if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) { + rcuwait_wait_event(wait, + (!vcpu->arch.power_off) && (!vcpu->arch.pause), + TASK_INTERRUPTIBLE); + + if (vcpu->arch.power_off || vcpu->arch.pause) { + /* + * Awaken to handle a signal, request to + * sleep again later. + */ + kvm_make_request(KVM_REQ_SLEEP, vcpu); + } + } + + if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) + kvm_riscv_reset_vcpu(vcpu); + } +} + +static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + + csr_write(CSR_HVIP, csr->hvip); }
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) @@ -298,6 +442,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); smp_mb__after_srcu_read_unlock();
+ /* + * We might have got VCPU interrupts updated asynchronously + * so update it in HW. + */ + kvm_riscv_vcpu_flush_interrupts(vcpu); + + /* Update HVIP CSR for current CPU */ + kvm_riscv_update_hvip(vcpu); + if (ret <= 0 || kvm_request_pending(vcpu)) { vcpu->mode = OUTSIDE_GUEST_MODE; @@ -325,6 +478,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) trap.htval = csr_read(CSR_HTVAL); trap.htinst = csr_read(CSR_HTINST);
+ /* Syncup interrupts state with HW */ + kvm_riscv_vcpu_sync_interrupts(vcpu); + /* * We may have taken a host interrupt in VS/VU-mode (i.e. * while executing the guest). This interrupt is still
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access VCPU config and registers from user-space.
We have three types of VCPU registers: 1. CONFIG - these are VCPU config and capabilities 2. CORE - these are VCPU general purpose registers 3. CSR - these are VCPU control and status registers
The CONFIG register available to user-space is ISA. The ISA register is a read and write register where user-space can only write the desired VCPU ISA capabilities before running the VCPU.
The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7, T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except PC and MODE. The PC register represents program counter whereas the MODE register represent VCPU privilege mode (i.e. S/U-mode).
The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC, SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
In future, more VCPU register types will be added (such as FP) for the KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/uapi/asm/kvm.h | 52 ++++++- arch/riscv/kvm/vcpu.c | 246 +++++++++++++++++++++++++++++- 2 files changed, 294 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index 6dbc056d58ba..3a20327242f1 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -41,10 +41,60 @@ struct kvm_guest_debug_arch { struct kvm_sync_regs { };
-/* dummy definition */ +/* for KVM_GET_SREGS and KVM_SET_SREGS */ struct kvm_sregs { };
+/* CONFIG registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ +struct kvm_riscv_config { + unsigned long isa; +}; + +/* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ +struct kvm_riscv_core { + struct user_regs_struct regs; + unsigned long mode; +}; + +/* Possible privilege modes for kvm_riscv_core */ +#define KVM_RISCV_MODE_S 1 +#define KVM_RISCV_MODE_U 0 + +/* CSR registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ +struct kvm_riscv_csr { + unsigned long sstatus; + unsigned long sie; + unsigned long stvec; + unsigned long sscratch; + unsigned long sepc; + unsigned long scause; + unsigned long stval; + unsigned long sip; + unsigned long satp; +}; + +#define KVM_REG_SIZE(id) \ + (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) + +/* If you need to interpret the index values, here is the key: */ +#define KVM_REG_RISCV_TYPE_MASK 0x00000000FF000000 +#define KVM_REG_RISCV_TYPE_SHIFT 24 + +/* Config registers are mapped as type 1 */ +#define KVM_REG_RISCV_CONFIG (0x01 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CONFIG_REG(name) \ + (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long)) + +/* Core registers are mapped as type 2 */ +#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CORE_REG(name) \ + (offsetof(struct kvm_riscv_core, name) / sizeof(unsigned long)) + +/* Control and status registers are mapped as type 3 */ +#define KVM_REG_RISCV_CSR (0x03 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CSR_REG(name) \ + (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long)) + #endif
#endif /* __LINUX_KVM_RISCV_H */ diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 7acb2e622597..1a692ac4a6f3 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -18,7 +18,6 @@ #include <linux/fs.h> #include <linux/kvm_host.h> #include <asm/csr.h> -#include <asm/delay.h> #include <asm/hwcap.h>
struct kvm_stats_debugfs_item debugfs_entries[] = { @@ -148,6 +147,225 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) return VM_FAULT_SIGBUS; }
+static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CONFIG); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + switch (reg_num) { + case KVM_REG_RISCV_CONFIG_REG(isa): + reg_val = vcpu->arch.isa; + break; + default: + return -EINVAL; + }; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CONFIG); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + switch (reg_num) { + case KVM_REG_RISCV_CONFIG_REG(isa): + if (!vcpu->arch.ran_atleast_once) { + vcpu->arch.isa = reg_val; + vcpu->arch.isa &= riscv_isa_extension_base(NULL); + vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED; + } else { + return -ENOTSUPP; + } + break; + default: + return -EINVAL; + }; + + return 0; +} + +static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CORE); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) + return -EINVAL; + + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) + reg_val = cntx->sepc; + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) + reg_val = ((unsigned long *)cntx)[reg_num]; + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) + reg_val = (cntx->sstatus & SR_SPP) ? + KVM_RISCV_MODE_S : KVM_RISCV_MODE_U; + else + return -EINVAL; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CORE); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) + cntx->sepc = reg_val; + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) + ((unsigned long *)cntx)[reg_num] = reg_val; + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) { + if (reg_val == KVM_RISCV_MODE_S) + cntx->sstatus |= SR_SPP; + else + cntx->sstatus &= ~SR_SPP; + } else + return -EINVAL; + + return 0; +} + +static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CSR); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) + return -EINVAL; + + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) { + kvm_riscv_vcpu_flush_interrupts(vcpu); + reg_val = csr->hvip >> VSIP_TO_HVIP_SHIFT; + reg_val = reg_val & VSIP_VALID_MASK; + } else if (reg_num == KVM_REG_RISCV_CSR_REG(sie)) { + reg_val = csr->hie >> VSIP_TO_HVIP_SHIFT; + reg_val = reg_val & VSIP_VALID_MASK; + } else + reg_val = ((unsigned long *)csr)[reg_num]; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CSR); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + if (reg_num == KVM_REG_RISCV_CSR_REG(sip) || + reg_num == KVM_REG_RISCV_CSR_REG(sie)) { + reg_val = reg_val << VSIP_TO_HVIP_SHIFT; + reg_val = reg_val & VSIP_VALID_MASK; + } + + ((unsigned long *)csr)[reg_num] = reg_val; + + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) + WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); + + return 0; +} + +static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) + return kvm_riscv_vcpu_set_reg_config(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) + return kvm_riscv_vcpu_set_reg_core(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) + return kvm_riscv_vcpu_set_reg_csr(vcpu, reg); + + return -EINVAL; +} + +static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) + return kvm_riscv_vcpu_get_reg_config(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) + return kvm_riscv_vcpu_get_reg_core(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) + return kvm_riscv_vcpu_get_reg_csr(vcpu, reg); + + return -EINVAL; +} + long kvm_arch_vcpu_async_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -172,8 +390,30 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp, long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - /* TODO: */ - return -EINVAL; + struct kvm_vcpu *vcpu = filp->private_data; + void __user *argp = (void __user *)arg; + long r = -EINVAL; + + switch (ioctl) { + case KVM_SET_ONE_REG: + case KVM_GET_ONE_REG: { + struct kvm_one_reg reg; + + r = -EFAULT; + if (copy_from_user(®, argp, sizeof(reg))) + break; + + if (ioctl == KVM_SET_ONE_REG) + r = kvm_riscv_vcpu_set_reg(vcpu, ®); + else + r = kvm_riscv_vcpu_get_reg(vcpu, ®); + break; + } + default: + break; + } + + return r; }
int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch implements the VCPU world-switch for KVM RISC-V.
The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly switches general purpose registers, SSTATUS, STVEC, SSCRATCH and HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put() interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions respectively.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 9 +- arch/riscv/kernel/asm-offsets.c | 76 ++++++++++++ arch/riscv/kvm/Makefile | 2 +- arch/riscv/kvm/vcpu.c | 30 ++++- arch/riscv/kvm/vcpu_switch.S | 194 ++++++++++++++++++++++++++++++ 5 files changed, 307 insertions(+), 4 deletions(-) create mode 100644 arch/riscv/kvm/vcpu_switch.S
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 7dc52559b4fc..3a8f7b5eafd6 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -120,6 +120,13 @@ struct kvm_vcpu_arch { /* ISA feature bits (similar to MISA) */ unsigned long isa;
+ /* SSCRATCH and STVEC of Host */ + unsigned long host_sscratch; + unsigned long host_stvec; + + /* CPU context of Host */ + struct kvm_cpu_context host_context; + /* CPU context of Guest VCPU */ struct kvm_cpu_context guest_context;
@@ -169,7 +176,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_cpu_trap *trap);
-static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {} +void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 07cb9c10de4e..3a230882b91e 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -7,7 +7,9 @@ #define GENERATING_ASM_OFFSETS
#include <linux/kbuild.h> +#include <linux/mm.h> #include <linux/sched.h> +#include <asm/kvm_host.h> #include <asm/thread_info.h> #include <asm/ptrace.h>
@@ -109,6 +111,80 @@ void asm_offsets(void) OFFSET(PT_BADADDR, pt_regs, badaddr); OFFSET(PT_CAUSE, pt_regs, cause);
+ OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); + OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); + OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); + OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp); + OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp); + OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0); + OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1); + OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2); + OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0); + OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1); + OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0); + OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1); + OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2); + OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3); + OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4); + OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5); + OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6); + OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7); + OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2); + OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3); + OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4); + OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5); + OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6); + OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7); + OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8); + OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9); + OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10); + OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11); + OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3); + OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4); + OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5); + OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6); + OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc); + OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus); + OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus); + + OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch, host_context.zero); + OFFSET(KVM_ARCH_HOST_RA, kvm_vcpu_arch, host_context.ra); + OFFSET(KVM_ARCH_HOST_SP, kvm_vcpu_arch, host_context.sp); + OFFSET(KVM_ARCH_HOST_GP, kvm_vcpu_arch, host_context.gp); + OFFSET(KVM_ARCH_HOST_TP, kvm_vcpu_arch, host_context.tp); + OFFSET(KVM_ARCH_HOST_T0, kvm_vcpu_arch, host_context.t0); + OFFSET(KVM_ARCH_HOST_T1, kvm_vcpu_arch, host_context.t1); + OFFSET(KVM_ARCH_HOST_T2, kvm_vcpu_arch, host_context.t2); + OFFSET(KVM_ARCH_HOST_S0, kvm_vcpu_arch, host_context.s0); + OFFSET(KVM_ARCH_HOST_S1, kvm_vcpu_arch, host_context.s1); + OFFSET(KVM_ARCH_HOST_A0, kvm_vcpu_arch, host_context.a0); + OFFSET(KVM_ARCH_HOST_A1, kvm_vcpu_arch, host_context.a1); + OFFSET(KVM_ARCH_HOST_A2, kvm_vcpu_arch, host_context.a2); + OFFSET(KVM_ARCH_HOST_A3, kvm_vcpu_arch, host_context.a3); + OFFSET(KVM_ARCH_HOST_A4, kvm_vcpu_arch, host_context.a4); + OFFSET(KVM_ARCH_HOST_A5, kvm_vcpu_arch, host_context.a5); + OFFSET(KVM_ARCH_HOST_A6, kvm_vcpu_arch, host_context.a6); + OFFSET(KVM_ARCH_HOST_A7, kvm_vcpu_arch, host_context.a7); + OFFSET(KVM_ARCH_HOST_S2, kvm_vcpu_arch, host_context.s2); + OFFSET(KVM_ARCH_HOST_S3, kvm_vcpu_arch, host_context.s3); + OFFSET(KVM_ARCH_HOST_S4, kvm_vcpu_arch, host_context.s4); + OFFSET(KVM_ARCH_HOST_S5, kvm_vcpu_arch, host_context.s5); + OFFSET(KVM_ARCH_HOST_S6, kvm_vcpu_arch, host_context.s6); + OFFSET(KVM_ARCH_HOST_S7, kvm_vcpu_arch, host_context.s7); + OFFSET(KVM_ARCH_HOST_S8, kvm_vcpu_arch, host_context.s8); + OFFSET(KVM_ARCH_HOST_S9, kvm_vcpu_arch, host_context.s9); + OFFSET(KVM_ARCH_HOST_S10, kvm_vcpu_arch, host_context.s10); + OFFSET(KVM_ARCH_HOST_S11, kvm_vcpu_arch, host_context.s11); + OFFSET(KVM_ARCH_HOST_T3, kvm_vcpu_arch, host_context.t3); + OFFSET(KVM_ARCH_HOST_T4, kvm_vcpu_arch, host_context.t4); + OFFSET(KVM_ARCH_HOST_T5, kvm_vcpu_arch, host_context.t5); + OFFSET(KVM_ARCH_HOST_T6, kvm_vcpu_arch, host_context.t6); + OFFSET(KVM_ARCH_HOST_SEPC, kvm_vcpu_arch, host_context.sepc); + OFFSET(KVM_ARCH_HOST_SSTATUS, kvm_vcpu_arch, host_context.sstatus); + OFFSET(KVM_ARCH_HOST_HSTATUS, kvm_vcpu_arch, host_context.hstatus); + OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch); + OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec); + /* * THREAD_{F,X}* might be larger than a S-type offset can handle, but * these are used in performance-sensitive assembly so we can't resort diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile index 37b5a59d4f4f..845579273727 100644 --- a/arch/riscv/kvm/Makefile +++ b/arch/riscv/kvm/Makefile @@ -8,6 +8,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o +kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
obj-$(CONFIG_KVM) += kvm.o diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 1a692ac4a6f3..faa6f9673691 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -580,14 +580,40 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - /* TODO: */ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + + csr_write(CSR_VSSTATUS, csr->vsstatus); + csr_write(CSR_HIE, csr->hie); + csr_write(CSR_VSTVEC, csr->vstvec); + csr_write(CSR_VSSCRATCH, csr->vsscratch); + csr_write(CSR_VSEPC, csr->vsepc); + csr_write(CSR_VSCAUSE, csr->vscause); + csr_write(CSR_VSTVAL, csr->vstval); + csr_write(CSR_HVIP, csr->hvip); + csr_write(CSR_VSATP, csr->vsatp);
kvm_riscv_stage2_update_hgatp(vcpu); + + vcpu->cpu = cpu; }
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { - /* TODO: */ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + + vcpu->cpu = -1; + + csr_write(CSR_HGATP, 0); + + csr->vsstatus = csr_read(CSR_VSSTATUS); + csr->hie = csr_read(CSR_HIE); + csr->vstvec = csr_read(CSR_VSTVEC); + csr->vsscratch = csr_read(CSR_VSSCRATCH); + csr->vsepc = csr_read(CSR_VSEPC); + csr->vscause = csr_read(CSR_VSCAUSE); + csr->vstval = csr_read(CSR_VSTVAL); + csr->hvip = csr_read(CSR_HVIP); + csr->vsatp = csr_read(CSR_VSATP); }
static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S new file mode 100644 index 000000000000..e1a17df1b379 --- /dev/null +++ b/arch/riscv/kvm/vcpu_switch.S @@ -0,0 +1,194 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/linkage.h> +#include <asm/asm.h> +#include <asm/asm-offsets.h> +#include <asm/csr.h> + + .text + .altmacro + .option norelax + +ENTRY(__kvm_riscv_switch_to) + /* Save Host GPRs (except A0 and T0-T6) */ + REG_S ra, (KVM_ARCH_HOST_RA)(a0) + REG_S sp, (KVM_ARCH_HOST_SP)(a0) + REG_S gp, (KVM_ARCH_HOST_GP)(a0) + REG_S tp, (KVM_ARCH_HOST_TP)(a0) + REG_S s0, (KVM_ARCH_HOST_S0)(a0) + REG_S s1, (KVM_ARCH_HOST_S1)(a0) + REG_S a1, (KVM_ARCH_HOST_A1)(a0) + REG_S a2, (KVM_ARCH_HOST_A2)(a0) + REG_S a3, (KVM_ARCH_HOST_A3)(a0) + REG_S a4, (KVM_ARCH_HOST_A4)(a0) + REG_S a5, (KVM_ARCH_HOST_A5)(a0) + REG_S a6, (KVM_ARCH_HOST_A6)(a0) + REG_S a7, (KVM_ARCH_HOST_A7)(a0) + REG_S s2, (KVM_ARCH_HOST_S2)(a0) + REG_S s3, (KVM_ARCH_HOST_S3)(a0) + REG_S s4, (KVM_ARCH_HOST_S4)(a0) + REG_S s5, (KVM_ARCH_HOST_S5)(a0) + REG_S s6, (KVM_ARCH_HOST_S6)(a0) + REG_S s7, (KVM_ARCH_HOST_S7)(a0) + REG_S s8, (KVM_ARCH_HOST_S8)(a0) + REG_S s9, (KVM_ARCH_HOST_S9)(a0) + REG_S s10, (KVM_ARCH_HOST_S10)(a0) + REG_S s11, (KVM_ARCH_HOST_S11)(a0) + + /* Save Host SSTATUS, HSTATUS, SCRATCH and STVEC */ + csrr t0, CSR_SSTATUS + REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0) + csrr t1, CSR_HSTATUS + REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0) + csrr t2, CSR_SSCRATCH + REG_S t2, (KVM_ARCH_HOST_SSCRATCH)(a0) + csrr t3, CSR_STVEC + REG_S t3, (KVM_ARCH_HOST_STVEC)(a0) + + /* Change Host exception vector to return path */ + la t4, __kvm_switch_return + csrw CSR_STVEC, t4 + + /* Restore Guest HSTATUS, SSTATUS and SEPC */ + REG_L t4, (KVM_ARCH_GUEST_SEPC)(a0) + csrw CSR_SEPC, t4 + REG_L t5, (KVM_ARCH_GUEST_SSTATUS)(a0) + csrw CSR_SSTATUS, t5 + REG_L t6, (KVM_ARCH_GUEST_HSTATUS)(a0) + csrw CSR_HSTATUS, t6 + + /* Restore Guest GPRs (except A0) */ + REG_L ra, (KVM_ARCH_GUEST_RA)(a0) + REG_L sp, (KVM_ARCH_GUEST_SP)(a0) + REG_L gp, (KVM_ARCH_GUEST_GP)(a0) + REG_L tp, (KVM_ARCH_GUEST_TP)(a0) + REG_L t0, (KVM_ARCH_GUEST_T0)(a0) + REG_L t1, (KVM_ARCH_GUEST_T1)(a0) + REG_L t2, (KVM_ARCH_GUEST_T2)(a0) + REG_L s0, (KVM_ARCH_GUEST_S0)(a0) + REG_L s1, (KVM_ARCH_GUEST_S1)(a0) + REG_L a1, (KVM_ARCH_GUEST_A1)(a0) + REG_L a2, (KVM_ARCH_GUEST_A2)(a0) + REG_L a3, (KVM_ARCH_GUEST_A3)(a0) + REG_L a4, (KVM_ARCH_GUEST_A4)(a0) + REG_L a5, (KVM_ARCH_GUEST_A5)(a0) + REG_L a6, (KVM_ARCH_GUEST_A6)(a0) + REG_L a7, (KVM_ARCH_GUEST_A7)(a0) + REG_L s2, (KVM_ARCH_GUEST_S2)(a0) + REG_L s3, (KVM_ARCH_GUEST_S3)(a0) + REG_L s4, (KVM_ARCH_GUEST_S4)(a0) + REG_L s5, (KVM_ARCH_GUEST_S5)(a0) + REG_L s6, (KVM_ARCH_GUEST_S6)(a0) + REG_L s7, (KVM_ARCH_GUEST_S7)(a0) + REG_L s8, (KVM_ARCH_GUEST_S8)(a0) + REG_L s9, (KVM_ARCH_GUEST_S9)(a0) + REG_L s10, (KVM_ARCH_GUEST_S10)(a0) + REG_L s11, (KVM_ARCH_GUEST_S11)(a0) + REG_L t3, (KVM_ARCH_GUEST_T3)(a0) + REG_L t4, (KVM_ARCH_GUEST_T4)(a0) + REG_L t5, (KVM_ARCH_GUEST_T5)(a0) + REG_L t6, (KVM_ARCH_GUEST_T6)(a0) + + /* Save Host A0 in SSCRATCH */ + csrw CSR_SSCRATCH, a0 + + /* Restore Guest A0 */ + REG_L a0, (KVM_ARCH_GUEST_A0)(a0) + + /* Resume Guest */ + sret + + /* Back to Host */ + .align 2 +__kvm_switch_return: + /* Swap Guest A0 with SSCRATCH */ + csrrw a0, CSR_SSCRATCH, a0 + + /* Save Guest GPRs (except A0) */ + REG_S ra, (KVM_ARCH_GUEST_RA)(a0) + REG_S sp, (KVM_ARCH_GUEST_SP)(a0) + REG_S gp, (KVM_ARCH_GUEST_GP)(a0) + REG_S tp, (KVM_ARCH_GUEST_TP)(a0) + REG_S t0, (KVM_ARCH_GUEST_T0)(a0) + REG_S t1, (KVM_ARCH_GUEST_T1)(a0) + REG_S t2, (KVM_ARCH_GUEST_T2)(a0) + REG_S s0, (KVM_ARCH_GUEST_S0)(a0) + REG_S s1, (KVM_ARCH_GUEST_S1)(a0) + REG_S a1, (KVM_ARCH_GUEST_A1)(a0) + REG_S a2, (KVM_ARCH_GUEST_A2)(a0) + REG_S a3, (KVM_ARCH_GUEST_A3)(a0) + REG_S a4, (KVM_ARCH_GUEST_A4)(a0) + REG_S a5, (KVM_ARCH_GUEST_A5)(a0) + REG_S a6, (KVM_ARCH_GUEST_A6)(a0) + REG_S a7, (KVM_ARCH_GUEST_A7)(a0) + REG_S s2, (KVM_ARCH_GUEST_S2)(a0) + REG_S s3, (KVM_ARCH_GUEST_S3)(a0) + REG_S s4, (KVM_ARCH_GUEST_S4)(a0) + REG_S s5, (KVM_ARCH_GUEST_S5)(a0) + REG_S s6, (KVM_ARCH_GUEST_S6)(a0) + REG_S s7, (KVM_ARCH_GUEST_S7)(a0) + REG_S s8, (KVM_ARCH_GUEST_S8)(a0) + REG_S s9, (KVM_ARCH_GUEST_S9)(a0) + REG_S s10, (KVM_ARCH_GUEST_S10)(a0) + REG_S s11, (KVM_ARCH_GUEST_S11)(a0) + REG_S t3, (KVM_ARCH_GUEST_T3)(a0) + REG_S t4, (KVM_ARCH_GUEST_T4)(a0) + REG_S t5, (KVM_ARCH_GUEST_T5)(a0) + REG_S t6, (KVM_ARCH_GUEST_T6)(a0) + + /* Save Guest A0 */ + csrr t0, CSR_SSCRATCH + REG_S t0, (KVM_ARCH_GUEST_A0)(a0) + + /* Save Guest HSTATUS, SSTATUS, and SEPC */ + csrr t0, CSR_SEPC + REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0) + csrr t1, CSR_SSTATUS + REG_S t1, (KVM_ARCH_GUEST_SSTATUS)(a0) + csrr t2, CSR_HSTATUS + REG_S t2, (KVM_ARCH_GUEST_HSTATUS)(a0) + + /* Restore Host SSTATUS, HSTATUS, SCRATCH and STVEC */ + REG_L t3, (KVM_ARCH_HOST_SSTATUS)(a0) + csrw CSR_SSTATUS, t3 + REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0) + csrw CSR_HSTATUS, t4 + REG_L t5, (KVM_ARCH_HOST_SSCRATCH)(a0) + csrw CSR_SSCRATCH, t5 + REG_L t6, (KVM_ARCH_HOST_STVEC)(a0) + csrw CSR_STVEC, t6 + + /* Restore Host GPRs (except A0 and T0-T6) */ + REG_L ra, (KVM_ARCH_HOST_RA)(a0) + REG_L sp, (KVM_ARCH_HOST_SP)(a0) + REG_L gp, (KVM_ARCH_HOST_GP)(a0) + REG_L tp, (KVM_ARCH_HOST_TP)(a0) + REG_L s0, (KVM_ARCH_HOST_S0)(a0) + REG_L s1, (KVM_ARCH_HOST_S1)(a0) + REG_L a1, (KVM_ARCH_HOST_A1)(a0) + REG_L a2, (KVM_ARCH_HOST_A2)(a0) + REG_L a3, (KVM_ARCH_HOST_A3)(a0) + REG_L a4, (KVM_ARCH_HOST_A4)(a0) + REG_L a5, (KVM_ARCH_HOST_A5)(a0) + REG_L a6, (KVM_ARCH_HOST_A6)(a0) + REG_L a7, (KVM_ARCH_HOST_A7)(a0) + REG_L s2, (KVM_ARCH_HOST_S2)(a0) + REG_L s3, (KVM_ARCH_HOST_S3)(a0) + REG_L s4, (KVM_ARCH_HOST_S4)(a0) + REG_L s5, (KVM_ARCH_HOST_S5)(a0) + REG_L s6, (KVM_ARCH_HOST_S6)(a0) + REG_L s7, (KVM_ARCH_HOST_S7)(a0) + REG_L s8, (KVM_ARCH_HOST_S8)(a0) + REG_L s9, (KVM_ARCH_HOST_S9)(a0) + REG_L s10, (KVM_ARCH_HOST_S10)(a0) + REG_L s11, (KVM_ARCH_HOST_S11)(a0) + + /* Return to C code */ + ret +ENDPROC(__kvm_riscv_switch_to)
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
We will get stage2 page faults whenever Guest/VM access SW emulated MMIO device or unmapped Guest RAM.
This patch implements MMIO read/write emulation by extracting MMIO details from the trapped load/store instruction and forwarding the MMIO read/write to user-space. The actual MMIO emulation will happen in user-space and KVM kernel module will only take care of register updates before resuming the trapped VCPU.
The handling for stage2 page faults for unmapped Guest RAM will be implemeted by a separate patch later.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 21 ++ arch/riscv/kernel/asm-offsets.c | 6 + arch/riscv/kvm/mmu.c | 8 + arch/riscv/kvm/vcpu_exit.c | 561 +++++++++++++++++++++++++++++- arch/riscv/kvm/vcpu_switch.S | 23 ++ 5 files changed, 616 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 3a8f7b5eafd6..b57a7a861cc1 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -55,6 +55,13 @@ struct kvm_arch { phys_addr_t pgd_phys; };
+struct kvm_mmio_decode { + unsigned long insn; + int len; + int shift; + int return_handled; +}; + struct kvm_cpu_trap { unsigned long sepc; unsigned long scause; @@ -151,6 +158,9 @@ struct kvm_vcpu_arch { unsigned long irqs_pending; unsigned long irqs_pending_mask;
+ /* MMIO instruction details */ + struct kvm_mmio_decode mmio_decode; + /* VCPU power-off state */ bool power_off;
@@ -167,11 +177,22 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, + gpa_t gpa, unsigned long hva, bool is_write); void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); void kvm_riscv_stage2_free_pgd(struct kvm *kvm); void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+void __kvm_riscv_unpriv_trap(void); + +unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, + bool read_insn, + unsigned long guest_addr, + struct kvm_cpu_trap *trap); +void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu, + struct kvm_cpu_trap *trap); int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run); int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_cpu_trap *trap); diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 3a230882b91e..f7e43fe55335 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -185,6 +185,12 @@ void asm_offsets(void) OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch); OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
+ OFFSET(KVM_ARCH_TRAP_SEPC, kvm_cpu_trap, sepc); + OFFSET(KVM_ARCH_TRAP_SCAUSE, kvm_cpu_trap, scause); + OFFSET(KVM_ARCH_TRAP_STVAL, kvm_cpu_trap, stval); + OFFSET(KVM_ARCH_TRAP_HTVAL, kvm_cpu_trap, htval); + OFFSET(KVM_ARCH_TRAP_HTINST, kvm_cpu_trap, htinst); + /* * THREAD_{F,X}* might be larger than a S-type offset can handle, but * these are used in performance-sensitive assembly so we can't resort diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index ec13507e8a18..8fb356e68cc5 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -64,6 +64,14 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, return 0; }
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, + gpa_t gpa, unsigned long hva, bool is_write) +{ + /* TODO: */ + return 0; +} + void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu) { /* TODO: */ diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index 4484e9200fe4..7fd603acd97a 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -6,9 +6,487 @@ * Anup Patel anup.patel@wdc.com */
+#include <linux/bitops.h> #include <linux/errno.h> #include <linux/err.h> #include <linux/kvm_host.h> +#include <asm/csr.h> + +#define INSN_MATCH_LB 0x3 +#define INSN_MASK_LB 0x707f +#define INSN_MATCH_LH 0x1003 +#define INSN_MASK_LH 0x707f +#define INSN_MATCH_LW 0x2003 +#define INSN_MASK_LW 0x707f +#define INSN_MATCH_LD 0x3003 +#define INSN_MASK_LD 0x707f +#define INSN_MATCH_LBU 0x4003 +#define INSN_MASK_LBU 0x707f +#define INSN_MATCH_LHU 0x5003 +#define INSN_MASK_LHU 0x707f +#define INSN_MATCH_LWU 0x6003 +#define INSN_MASK_LWU 0x707f +#define INSN_MATCH_SB 0x23 +#define INSN_MASK_SB 0x707f +#define INSN_MATCH_SH 0x1023 +#define INSN_MASK_SH 0x707f +#define INSN_MATCH_SW 0x2023 +#define INSN_MASK_SW 0x707f +#define INSN_MATCH_SD 0x3023 +#define INSN_MASK_SD 0x707f + +#define INSN_MATCH_C_LD 0x6000 +#define INSN_MASK_C_LD 0xe003 +#define INSN_MATCH_C_SD 0xe000 +#define INSN_MASK_C_SD 0xe003 +#define INSN_MATCH_C_LW 0x4000 +#define INSN_MASK_C_LW 0xe003 +#define INSN_MATCH_C_SW 0xc000 +#define INSN_MASK_C_SW 0xe003 +#define INSN_MATCH_C_LDSP 0x6002 +#define INSN_MASK_C_LDSP 0xe003 +#define INSN_MATCH_C_SDSP 0xe002 +#define INSN_MASK_C_SDSP 0xe003 +#define INSN_MATCH_C_LWSP 0x4002 +#define INSN_MASK_C_LWSP 0xe003 +#define INSN_MATCH_C_SWSP 0xc002 +#define INSN_MASK_C_SWSP 0xe003 + +#define INSN_16BIT_MASK 0x3 + +#define INSN_IS_16BIT(insn) (((insn) & INSN_16BIT_MASK) != INSN_16BIT_MASK) + +#define INSN_LEN(insn) (INSN_IS_16BIT(insn) ? 2 : 4) + +#ifdef CONFIG_64BIT +#define LOG_REGBYTES 3 +#else +#define LOG_REGBYTES 2 +#endif +#define REGBYTES (1 << LOG_REGBYTES) + +#define SH_RD 7 +#define SH_RS1 15 +#define SH_RS2 20 +#define SH_RS2C 2 + +#define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1)) +#define RVC_LW_IMM(x) ((RV_X(x, 6, 1) << 2) | \ + (RV_X(x, 10, 3) << 3) | \ + (RV_X(x, 5, 1) << 6)) +#define RVC_LD_IMM(x) ((RV_X(x, 10, 3) << 3) | \ + (RV_X(x, 5, 2) << 6)) +#define RVC_LWSP_IMM(x) ((RV_X(x, 4, 3) << 2) | \ + (RV_X(x, 12, 1) << 5) | \ + (RV_X(x, 2, 2) << 6)) +#define RVC_LDSP_IMM(x) ((RV_X(x, 5, 2) << 3) | \ + (RV_X(x, 12, 1) << 5) | \ + (RV_X(x, 2, 3) << 6)) +#define RVC_SWSP_IMM(x) ((RV_X(x, 9, 4) << 2) | \ + (RV_X(x, 7, 2) << 6)) +#define RVC_SDSP_IMM(x) ((RV_X(x, 10, 3) << 3) | \ + (RV_X(x, 7, 3) << 6)) +#define RVC_RS1S(insn) (8 + RV_X(insn, SH_RD, 3)) +#define RVC_RS2S(insn) (8 + RV_X(insn, SH_RS2C, 3)) +#define RVC_RS2(insn) RV_X(insn, SH_RS2C, 5) + +#define SHIFT_RIGHT(x, y) \ + ((y) < 0 ? ((x) << -(y)) : ((x) >> (y))) + +#define REG_MASK \ + ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES)) + +#define REG_OFFSET(insn, pos) \ + (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK) + +#define REG_PTR(insn, pos, regs) \ + (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos)) + +#define GET_RM(insn) (((insn) >> 12) & 7) + +#define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs)) +#define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs)) +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs)) +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs)) +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs)) +#define GET_SP(regs) (*REG_PTR(2, 0, regs)) +#define SET_RD(insn, regs, val) (*REG_PTR(insn, SH_RD, regs) = (val)) +#define IMM_I(insn) ((s32)(insn) >> 20) +#define IMM_S(insn) (((s32)(insn) >> 25 << 5) | \ + (s32)(((insn) >> 7) & 0x1f)) +#define MASK_FUNCT3 0x7000 + +static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, + unsigned long fault_addr, unsigned long htinst) +{ + unsigned long insn; + int shift = 0, len = 0; + struct kvm_cpu_trap utrap = { 0 }; + struct kvm_cpu_context *ct = &vcpu->arch.guest_context; + + /* Determine trapped instruction */ + if (htinst & 0x1) { + /* + * Bit[0] == 1 implies trapped instruction value is + * transformed instruction or custom instruction. + */ + insn = htinst | INSN_16BIT_MASK; + } else { + /* + * Bit[0] == 0 implies trapped instruction value is + * zero or special value. + */ + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc, + &utrap); + if (utrap.scause) { + /* Redirect trap if we failed to read instruction */ + utrap.sepc = ct->sepc; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + return 1; + } + } + + /* Decode length of MMIO and shift */ + if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) { + len = 4; + shift = 8 * (sizeof(ulong) - len); + } else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) { + len = 1; + shift = 8 * (sizeof(ulong) - len); + } else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) { + len = 1; + shift = 8 * (sizeof(ulong) - len); +#ifdef CONFIG_64BIT + } else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) { + len = 8; + shift = 8 * (sizeof(ulong) - len); + } else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) { + len = 4; +#endif + } else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) { + len = 2; + shift = 8 * (sizeof(ulong) - len); + } else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) { + len = 2; +#ifdef CONFIG_64BIT + } else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) { + len = 8; + shift = 8 * (sizeof(ulong) - len); + insn = RVC_RS2S(insn) << SH_RD; + } else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP && + ((insn >> SH_RD) & 0x1f)) { + len = 8; + shift = 8 * (sizeof(ulong) - len); +#endif + } else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) { + len = 4; + shift = 8 * (sizeof(ulong) - len); + insn = RVC_RS2S(insn) << SH_RD; + } else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP && + ((insn >> SH_RD) & 0x1f)) { + len = 4; + shift = 8 * (sizeof(ulong) - len); + } else { + return -ENOTSUPP; + } + + /* Fault address should be aligned to length of MMIO */ + if (fault_addr & (len - 1)) + return -EIO; + + /* Save instruction decode info */ + vcpu->arch.mmio_decode.insn = insn; + vcpu->arch.mmio_decode.shift = shift; + vcpu->arch.mmio_decode.len = len; + vcpu->arch.mmio_decode.return_handled = 0; + + /* Exit to userspace for MMIO emulation */ + vcpu->stat.mmio_exit_user++; + run->exit_reason = KVM_EXIT_MMIO; + run->mmio.is_write = false; + run->mmio.phys_addr = fault_addr; + run->mmio.len = len; + + return 0; +} + +static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run, + unsigned long fault_addr, unsigned long htinst) +{ + u8 data8; + u16 data16; + u32 data32; + u64 data64; + ulong data; + int len = 0; + unsigned long insn; + struct kvm_cpu_trap utrap = { 0 }; + struct kvm_cpu_context *ct = &vcpu->arch.guest_context; + + /* Determine trapped instruction */ + if (htinst & 0x1) { + /* + * Bit[0] == 1 implies trapped instruction value is + * transformed instruction or custom instruction. + */ + insn = htinst | INSN_16BIT_MASK; + } else { + /* + * Bit[0] == 0 implies trapped instruction value is + * zero or special value. + */ + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc, + &utrap); + if (utrap.scause) { + /* Redirect trap if we failed to read instruction */ + utrap.sepc = ct->sepc; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + return 1; + } + } + + data = GET_RS2(insn, &vcpu->arch.guest_context); + data8 = data16 = data32 = data64 = data; + + if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) { + len = 4; + } else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) { + len = 1; +#ifdef CONFIG_64BIT + } else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) { + len = 8; +#endif + } else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) { + len = 2; +#ifdef CONFIG_64BIT + } else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) { + len = 8; + data64 = GET_RS2S(insn, &vcpu->arch.guest_context); + } else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP && + ((insn >> SH_RD) & 0x1f)) { + len = 8; + data64 = GET_RS2C(insn, &vcpu->arch.guest_context); +#endif + } else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) { + len = 4; + data32 = GET_RS2S(insn, &vcpu->arch.guest_context); + } else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP && + ((insn >> SH_RD) & 0x1f)) { + len = 4; + data32 = GET_RS2C(insn, &vcpu->arch.guest_context); + } else { + return -ENOTSUPP; + } + + /* Fault address should be aligned to length of MMIO */ + if (fault_addr & (len - 1)) + return -EIO; + + /* Save instruction decode info */ + vcpu->arch.mmio_decode.insn = insn; + vcpu->arch.mmio_decode.shift = 0; + vcpu->arch.mmio_decode.len = len; + vcpu->arch.mmio_decode.return_handled = 0; + + /* Copy data to kvm_run instance */ + switch (len) { + case 1: + *((u8 *)run->mmio.data) = data8; + break; + case 2: + *((u16 *)run->mmio.data) = data16; + break; + case 4: + *((u32 *)run->mmio.data) = data32; + break; + case 8: + *((u64 *)run->mmio.data) = data64; + break; + default: + return -ENOTSUPP; + }; + + /* Exit to userspace for MMIO emulation */ + vcpu->stat.mmio_exit_user++; + run->exit_reason = KVM_EXIT_MMIO; + run->mmio.is_write = true; + run->mmio.phys_addr = fault_addr; + run->mmio.len = len; + + return 0; +} + +static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_cpu_trap *trap) +{ + struct kvm_memory_slot *memslot; + unsigned long hva, fault_addr; + bool writable; + gfn_t gfn; + int ret; + + fault_addr = (trap->htval << 2) | (trap->stval & 0x3); + gfn = fault_addr >> PAGE_SHIFT; + memslot = gfn_to_memslot(vcpu->kvm, gfn); + hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable); + + if (kvm_is_error_hva(hva) || + (trap->scause == EXC_STORE_GUEST_PAGE_FAULT && !writable)) { + switch (trap->scause) { + case EXC_LOAD_GUEST_PAGE_FAULT: + return emulate_load(vcpu, run, fault_addr, + trap->htinst); + case EXC_STORE_GUEST_PAGE_FAULT: + return emulate_store(vcpu, run, fault_addr, + trap->htinst); + default: + return -ENOTSUPP; + }; + } + + ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva, + (trap->scause == EXC_STORE_GUEST_PAGE_FAULT) ? true : false); + if (ret < 0) + return ret; + + return 1; +} + +/** + * kvm_riscv_vcpu_unpriv_read -- Read machine word from Guest memory + * + * @vcpu: The VCPU pointer + * @read_insn: Flag representing whether we are reading instruction + * @guest_addr: Guest address to read + * @trap: Output pointer to trap details + */ +unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, + bool read_insn, + unsigned long guest_addr, + struct kvm_cpu_trap *trap) +{ + register unsigned long taddr asm("a0") = (unsigned long)trap; + register unsigned long ttmp asm("a1"); + register unsigned long val asm("t0"); + register unsigned long tmp asm("t1"); + register unsigned long addr asm("t2") = guest_addr; + unsigned long flags; + unsigned long old_stvec; + + local_irq_save(flags); + + old_stvec = csr_swap(CSR_STVEC, (ulong)&__kvm_riscv_unpriv_trap); + + if (read_insn) { + /* + * HLVX.HU instruction + * 0110010 00011 rs1 100 rd 1110011 + */ + asm volatile ("\n" + ".option push\n" + ".option norvc\n" + "add %[ttmp], %[taddr], 0\n" + /* + * HLVX.HU %[val], (%[addr]) + * HLVX.HU t0, (t2) + * 0110010 00011 00111 100 00101 1110011 + */ + ".word 0x6433c2f3\n" + "andi %[tmp], %[val], 3\n" + "addi %[tmp], %[tmp], -3\n" + "bne %[tmp], zero, 2f\n" + "addi %[addr], %[addr], 2\n" + /* + * HLVX.HU %[tmp], (%[addr]) + * HLVX.HU t1, (t2) + * 0110010 00011 00111 100 00110 1110011 + */ + ".word 0x6433c373\n" + "sll %[tmp], %[tmp], 16\n" + "add %[val], %[val], %[tmp]\n" + "2:\n" + ".option pop" + : [val] "=&r" (val), [tmp] "=&r" (tmp), + [taddr] "+&r" (taddr), [ttmp] "+&r" (ttmp), + [addr] "+&r" (addr) : : "memory"); + + if (trap->scause == EXC_LOAD_PAGE_FAULT) + trap->scause = EXC_INST_PAGE_FAULT; + } else { + /* + * HLV.D instruction + * 0110110 00000 rs1 100 rd 1110011 + * + * HLV.W instruction + * 0110100 00000 rs1 100 rd 1110011 + */ + asm volatile ("\n" + ".option push\n" + ".option norvc\n" + "add %[ttmp], %[taddr], 0\n" +#ifdef CONFIG_64BIT + /* + * HLV.D %[val], (%[addr]) + * HLV.D t0, (t2) + * 0110110 00000 00111 100 00101 1110011 + */ + ".word 0x6c03c2f3\n" +#else + /* + * HLV.W %[val], (%[addr]) + * HLV.W t0, (t2) + * 0110100 00000 00111 100 00101 1110011 + */ + ".word 0x6803c2f3\n" +#endif + ".option pop" + : [val] "=&r" (val), + [taddr] "+&r" (taddr), [ttmp] "+&r" (ttmp) + : [addr] "r" (addr) : "memory"); + } + + csr_write(CSR_STVEC, old_stvec); + + local_irq_restore(flags); + + return val; +} + +/** + * kvm_riscv_vcpu_trap_redirect -- Redirect trap to Guest + * + * @vcpu: The VCPU pointer + * @trap: Trap details + */ +void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu, + struct kvm_cpu_trap *trap) +{ + unsigned long vsstatus = csr_read(CSR_VSSTATUS); + + /* Change Guest SSTATUS.SPP bit */ + vsstatus &= ~SR_SPP; + if (vcpu->arch.guest_context.sstatus & SR_SPP) + vsstatus |= SR_SPP; + + /* Change Guest SSTATUS.SPIE bit */ + vsstatus &= ~SR_SPIE; + if (vsstatus & SR_SIE) + vsstatus |= SR_SPIE; + + /* Clear Guest SSTATUS.SIE bit */ + vsstatus &= ~SR_SIE; + + /* Update Guest SSTATUS */ + csr_write(CSR_VSSTATUS, vsstatus); + + /* Update Guest SCAUSE, STVAL, and SEPC */ + csr_write(CSR_VSCAUSE, trap->scause); + csr_write(CSR_VSTVAL, trap->stval); + csr_write(CSR_VSEPC, trap->sepc); + + /* Set Guest PC to Guest exception vector */ + vcpu->arch.guest_context.sepc = csr_read(CSR_VSTVEC); +}
/** * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation @@ -19,7 +497,54 @@ */ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) { - /* TODO: */ + u8 data8; + u16 data16; + u32 data32; + u64 data64; + ulong insn; + int len, shift; + + if (vcpu->arch.mmio_decode.return_handled) + return 0; + + vcpu->arch.mmio_decode.return_handled = 1; + insn = vcpu->arch.mmio_decode.insn; + + if (run->mmio.is_write) + goto done; + + len = vcpu->arch.mmio_decode.len; + shift = vcpu->arch.mmio_decode.shift; + + switch (len) { + case 1: + data8 = *((u8 *)run->mmio.data); + SET_RD(insn, &vcpu->arch.guest_context, + (ulong)data8 << shift >> shift); + break; + case 2: + data16 = *((u16 *)run->mmio.data); + SET_RD(insn, &vcpu->arch.guest_context, + (ulong)data16 << shift >> shift); + break; + case 4: + data32 = *((u32 *)run->mmio.data); + SET_RD(insn, &vcpu->arch.guest_context, + (ulong)data32 << shift >> shift); + break; + case 8: + data64 = *((u64 *)run->mmio.data); + SET_RD(insn, &vcpu->arch.guest_context, + (ulong)data64 << shift >> shift); + break; + default: + return -ENOTSUPP; + }; + +done: + /* Move to next instruction */ + vcpu->arch.guest_context.sepc += INSN_LEN(insn); + return 0; }
@@ -30,6 +555,36 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_cpu_trap *trap) { - /* TODO: */ - return 0; + int ret; + + /* If we got host interrupt then do nothing */ + if (trap->scause & CAUSE_IRQ_FLAG) + return 1; + + /* Handle guest traps */ + ret = -EFAULT; + run->exit_reason = KVM_EXIT_UNKNOWN; + switch (trap->scause) { + case EXC_INST_GUEST_PAGE_FAULT: + case EXC_LOAD_GUEST_PAGE_FAULT: + case EXC_STORE_GUEST_PAGE_FAULT: + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) + ret = stage2_page_fault(vcpu, run, trap); + break; + default: + break; + }; + + /* Print details in-case of error */ + if (ret < 0) { + kvm_err("VCPU exit error %d\n", ret); + kvm_err("SEPC=0x%lx SSTATUS=0x%lx HSTATUS=0x%lx\n", + vcpu->arch.guest_context.sepc, + vcpu->arch.guest_context.sstatus, + vcpu->arch.guest_context.hstatus); + kvm_err("SCAUSE=0x%lx STVAL=0x%lx HTVAL=0x%lx HTINST=0x%lx\n", + trap->scause, trap->stval, trap->htval, trap->htinst); + } + + return ret; } diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S index e1a17df1b379..ee2a2326e281 100644 --- a/arch/riscv/kvm/vcpu_switch.S +++ b/arch/riscv/kvm/vcpu_switch.S @@ -192,3 +192,26 @@ __kvm_switch_return: /* Return to C code */ ret ENDPROC(__kvm_riscv_switch_to) + +ENTRY(__kvm_riscv_unpriv_trap) + /* + * We assume that faulting unpriv load/store instruction is + * is 4-byte long and blindly increment SEPC by 4. + * + * The trap details will be saved at address pointed by 'A0' + * register and we use 'A1' register as temporary. + */ + csrr a1, CSR_SEPC + REG_S a1, (KVM_ARCH_TRAP_SEPC)(a0) + addi a1, a1, 4 + csrw CSR_SEPC, a1 + csrr a1, CSR_SCAUSE + REG_S a1, (KVM_ARCH_TRAP_SCAUSE)(a0) + csrr a1, CSR_STVAL + REG_S a1, (KVM_ARCH_TRAP_STVAL)(a0) + csrr a1, CSR_HTVAL + REG_S a1, (KVM_ARCH_TRAP_HTVAL)(a0) + csrr a1, CSR_HTINST + REG_S a1, (KVM_ARCH_TRAP_HTINST)(a0) + sret +ENDPROC(__kvm_riscv_unpriv_trap)
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
We get illegal instruction trap whenever Guest/VM executes WFI instruction.
This patch handles WFI trap by blocking the trapped VCPU using kvm_vcpu_block() API. The blocked VCPU will be automatically resumed whenever a VCPU interrupt is injected from user-space or from in-kernel IRQCHIP emulation.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/kvm/vcpu_exit.c | 76 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+)
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index 7fd603acd97a..6ef833e29e71 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -12,6 +12,13 @@ #include <linux/kvm_host.h> #include <asm/csr.h>
+#define INSN_OPCODE_MASK 0x007c +#define INSN_OPCODE_SHIFT 2 +#define INSN_OPCODE_SYSTEM 28 + +#define INSN_MASK_WFI 0xffffff00 +#define INSN_MATCH_WFI 0x10500000 + #define INSN_MATCH_LB 0x3 #define INSN_MASK_LB 0x707f #define INSN_MATCH_LH 0x1003 @@ -116,6 +123,71 @@ (s32)(((insn) >> 7) & 0x1f)) #define MASK_FUNCT3 0x7000
+static int truly_illegal_insn(struct kvm_vcpu *vcpu, + struct kvm_run *run, + ulong insn) +{ + struct kvm_cpu_trap utrap = { 0 }; + + /* Redirect trap to Guest VCPU */ + utrap.sepc = vcpu->arch.guest_context.sepc; + utrap.scause = EXC_INST_ILLEGAL; + utrap.stval = insn; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + + return 1; +} + +static int system_opcode_insn(struct kvm_vcpu *vcpu, + struct kvm_run *run, + ulong insn) +{ + if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) { + vcpu->stat.wfi_exit_stat++; + if (!kvm_arch_vcpu_runnable(vcpu)) { + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + kvm_vcpu_block(vcpu); + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + kvm_clear_request(KVM_REQ_UNHALT, vcpu); + } + vcpu->arch.guest_context.sepc += INSN_LEN(insn); + return 1; + } + + return truly_illegal_insn(vcpu, run, insn); +} + +static int virtual_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_cpu_trap *trap) +{ + unsigned long insn = trap->stval; + struct kvm_cpu_trap utrap = { 0 }; + struct kvm_cpu_context *ct; + + if (unlikely(INSN_IS_16BIT(insn))) { + if (insn == 0) { + ct = &vcpu->arch.guest_context; + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, + ct->sepc, + &utrap); + if (utrap.scause) { + utrap.sepc = ct->sepc; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + return 1; + } + } + if (INSN_IS_16BIT(insn)) + return truly_illegal_insn(vcpu, run, insn); + } + + switch ((insn & INSN_OPCODE_MASK) >> INSN_OPCODE_SHIFT) { + case INSN_OPCODE_SYSTEM: + return system_opcode_insn(vcpu, run, insn); + default: + return truly_illegal_insn(vcpu, run, insn); + } +} + static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, unsigned long fault_addr, unsigned long htinst) { @@ -565,6 +637,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, ret = -EFAULT; run->exit_reason = KVM_EXIT_UNKNOWN; switch (trap->scause) { + case EXC_VIRTUAL_INST_FAULT: + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) + ret = virtual_inst_fault(vcpu, run, trap); + break; case EXC_INST_GUEST_PAGE_FAULT: case EXC_LOAD_GUEST_PAGE_FAULT: case EXC_STORE_GUEST_PAGE_FAULT:
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
We implement a simple VMID allocator for Guests/VMs which: 1. Detects number of VMID bits at boot-time 2. Uses atomic number to track VMID version and increments VMID version whenever we run-out of VMIDs 3. Flushes Guest TLBs on all host CPUs whenever we run-out of VMIDs 4. Force updates HW Stage2 VMID for each Guest VCPU whenever VMID changes using VCPU request KVM_REQ_UPDATE_HGATP
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 24 ++++++ arch/riscv/kvm/Makefile | 3 +- arch/riscv/kvm/main.c | 4 + arch/riscv/kvm/tlb.S | 74 ++++++++++++++++++ arch/riscv/kvm/vcpu.c | 9 +++ arch/riscv/kvm/vm.c | 6 ++ arch/riscv/kvm/vmid.c | 120 ++++++++++++++++++++++++++++++ 7 files changed, 239 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/kvm/tlb.S create mode 100644 arch/riscv/kvm/vmid.c
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index b57a7a861cc1..981f3cade325 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -27,6 +27,7 @@ #define KVM_REQ_SLEEP \ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1) +#define KVM_REQ_UPDATE_HGATP KVM_ARCH_REQ(2)
struct kvm_vm_stat { ulong remote_tlb_flush; @@ -49,7 +50,19 @@ struct kvm_vcpu_stat { struct kvm_arch_memory_slot { };
+struct kvm_vmid { + /* + * Writes to vmid_version and vmid happen with vmid_lock held + * whereas reads happen without any lock held. + */ + unsigned long vmid_version; + unsigned long vmid; +}; + struct kvm_arch { + /* stage2 vmid */ + struct kvm_vmid vmid; + /* stage2 page table */ pgd_t *pgd; phys_addr_t pgd_phys; @@ -177,6 +190,11 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa, unsigned long vmid); +void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid); +void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa); +void __kvm_riscv_hfence_gvma_all(void); + int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gpa_t gpa, unsigned long hva, bool is_write); @@ -185,6 +203,12 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); void kvm_riscv_stage2_free_pgd(struct kvm *kvm); void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+void kvm_riscv_stage2_vmid_detect(void); +unsigned long kvm_riscv_stage2_vmid_bits(void); +int kvm_riscv_stage2_vmid_init(struct kvm *kvm); +bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid); +void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu); + void __kvm_riscv_unpriv_trap(void);
unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile index 845579273727..c0f57f26c13d 100644 --- a/arch/riscv/kvm/Makefile +++ b/arch/riscv/kvm/Makefile @@ -8,6 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o +kvm-objs += main.o vm.o vmid.o tlb.o mmu.o +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
obj-$(CONFIG_KVM) += kvm.o diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 47926f0c175d..6f213bcec0e8 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -79,8 +79,12 @@ int kvm_arch_init(void *opaque) return -ENODEV; }
+ kvm_riscv_stage2_vmid_detect(); + kvm_info("hypervisor extension available\n");
+ kvm_info("host has %ld VMID bits\n", kvm_riscv_stage2_vmid_bits()); + return 0; }
diff --git a/arch/riscv/kvm/tlb.S b/arch/riscv/kvm/tlb.S new file mode 100644 index 000000000000..c858570f0856 --- /dev/null +++ b/arch/riscv/kvm/tlb.S @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/linkage.h> +#include <asm/asm.h> + + .text + .altmacro + .option norelax + + /* + * Instruction encoding of hfence.gvma is: + * HFENCE.GVMA rs1, rs2 + * HFENCE.GVMA zero, rs2 + * HFENCE.GVMA rs1 + * HFENCE.GVMA + * + * rs1!=zero and rs2!=zero ==> HFENCE.GVMA rs1, rs2 + * rs1==zero and rs2!=zero ==> HFENCE.GVMA zero, rs2 + * rs1!=zero and rs2==zero ==> HFENCE.GVMA rs1 + * rs1==zero and rs2==zero ==> HFENCE.GVMA + * + * Instruction encoding of HFENCE.GVMA is: + * 0110001 rs2(5) rs1(5) 000 00000 1110011 + */ + +ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa) + /* + * rs1 = a0 (GPA) + * rs2 = a1 (VMID) + * HFENCE.GVMA a0, a1 + * 0110001 01011 01010 000 00000 1110011 + */ + .word 0x62b50073 + ret +ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa) + +ENTRY(__kvm_riscv_hfence_gvma_vmid) + /* + * rs1 = zero + * rs2 = a0 (VMID) + * HFENCE.GVMA zero, a0 + * 0110001 01010 00000 000 00000 1110011 + */ + .word 0x62a00073 + ret +ENDPROC(__kvm_riscv_hfence_gvma_vmid) + +ENTRY(__kvm_riscv_hfence_gvma_gpa) + /* + * rs1 = a0 (GPA) + * rs2 = zero + * HFENCE.GVMA a0 + * 0110001 00000 01010 000 00000 1110011 + */ + .word 0x62050073 + ret +ENDPROC(__kvm_riscv_hfence_gvma_gpa) + +ENTRY(__kvm_riscv_hfence_gvma_all) + /* + * rs1 = zero + * rs2 = zero + * HFENCE.GVMA + * 0110001 00000 00000 000 00000 1110011 + */ + .word 0x62000073 + ret +ENDPROC(__kvm_riscv_hfence_gvma_all) diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index faa6f9673691..82228e8b6cab 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -637,6 +637,12 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) kvm_riscv_reset_vcpu(vcpu); + + if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu)) + kvm_riscv_stage2_update_hgatp(vcpu); + + if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) + __kvm_riscv_hfence_gvma_all(); } }
@@ -682,6 +688,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) /* Check conditions before entering the guest */ cond_resched();
+ kvm_riscv_stage2_vmid_update(vcpu); + kvm_riscv_check_vcpu_requests(vcpu);
preempt_disable(); @@ -718,6 +726,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) kvm_riscv_update_hvip(vcpu);
if (ret <= 0 || + kvm_riscv_stage2_vmid_ver_changed(&vcpu->kvm->arch.vmid) || kvm_request_pending(vcpu)) { vcpu->mode = OUTSIDE_GUEST_MODE; local_irq_enable(); diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index ac0211820521..c5aab5478c38 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -26,6 +26,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) if (r) return r;
+ r = kvm_riscv_stage2_vmid_init(kvm); + if (r) { + kvm_riscv_stage2_free_pgd(kvm); + return r; + } + return 0; }
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c new file mode 100644 index 000000000000..2c6253b293bc --- /dev/null +++ b/arch/riscv/kvm/vmid.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Anup Patel anup.patel@wdc.com + */ + +#include <linux/bitops.h> +#include <linux/cpumask.h> +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/kvm_host.h> +#include <asm/csr.h> +#include <asm/sbi.h> + +static unsigned long vmid_version = 1; +static unsigned long vmid_next; +static unsigned long vmid_bits; +static DEFINE_SPINLOCK(vmid_lock); + +void kvm_riscv_stage2_vmid_detect(void) +{ + unsigned long old; + + /* Figure-out number of VMID bits in HW */ + old = csr_read(CSR_HGATP); + csr_write(CSR_HGATP, old | HGATP_VMID_MASK); + vmid_bits = csr_read(CSR_HGATP); + vmid_bits = (vmid_bits & HGATP_VMID_MASK) >> HGATP_VMID_SHIFT; + vmid_bits = fls_long(vmid_bits); + csr_write(CSR_HGATP, old); + + /* We polluted local TLB so flush all guest TLB */ + __kvm_riscv_hfence_gvma_all(); + + /* We don't use VMID bits if they are not sufficient */ + if ((1UL << vmid_bits) < num_possible_cpus()) + vmid_bits = 0; +} + +unsigned long kvm_riscv_stage2_vmid_bits(void) +{ + return vmid_bits; +} + +int kvm_riscv_stage2_vmid_init(struct kvm *kvm) +{ + /* Mark the initial VMID and VMID version invalid */ + kvm->arch.vmid.vmid_version = 0; + kvm->arch.vmid.vmid = 0; + + return 0; +} + +bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid) +{ + if (!vmid_bits) + return false; + + return unlikely(READ_ONCE(vmid->vmid_version) != + READ_ONCE(vmid_version)); +} + +void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu) +{ + int i; + struct kvm_vcpu *v; + struct cpumask hmask; + struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid; + + if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) + return; + + spin_lock(&vmid_lock); + + /* + * We need to re-check the vmid_version here to ensure that if + * another vcpu already allocated a valid vmid for this vm. + */ + if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) { + spin_unlock(&vmid_lock); + return; + } + + /* First user of a new VMID version? */ + if (unlikely(vmid_next == 0)) { + WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1); + vmid_next = 1; + + /* + * We ran out of VMIDs so we increment vmid_version and + * start assigning VMIDs from 1. + * + * This also means existing VMIDs assignement to all Guest + * instances is invalid and we have force VMID re-assignement + * for all Guest instances. The Guest instances that were not + * running will automatically pick-up new VMIDs because will + * call kvm_riscv_stage2_vmid_update() whenever they enter + * in-kernel run loop. For Guest instances that are already + * running, we force VM exits on all host CPUs using IPI and + * flush all Guest TLBs. + */ + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); + sbi_remote_hfence_gvma(cpumask_bits(&hmask), 0, 0); + } + + vmid->vmid = vmid_next; + vmid_next++; + vmid_next &= (1 << vmid_bits) - 1; + + WRITE_ONCE(vmid->vmid_version, READ_ONCE(vmid_version)); + + spin_unlock(&vmid_lock); + + /* Request stage2 page table update for all VCPUs */ + kvm_for_each_vcpu(i, v, vcpu->kvm) + kvm_make_request(KVM_REQ_UPDATE_HGATP, v); +}
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch implements all required functions for programming the stage2 page table for each Guest/VM.
At high-level, the flow of stage2 related functions is similar from KVM ARM/ARM64 implementation but the stage2 page table format is quite different for KVM RISC-V.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 10 + arch/riscv/include/asm/pgtable-bits.h | 1 + arch/riscv/kvm/mmu.c | 574 +++++++++++++++++++++++++- 3 files changed, 575 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 981f3cade325..9e754c0848b3 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -75,6 +75,13 @@ struct kvm_mmio_decode { int return_handled; };
+#define KVM_MMU_PAGE_CACHE_NR_OBJS 32 + +struct kvm_mmu_page_cache { + int nobjs; + void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS]; +}; + struct kvm_cpu_trap { unsigned long sepc; unsigned long scause; @@ -174,6 +181,9 @@ struct kvm_vcpu_arch { /* MMIO instruction details */ struct kvm_mmio_decode mmio_decode;
+ /* Cache pages needed to program page tables with spinlock held */ + struct kvm_mmu_page_cache mmu_page_cache; + /* VCPU power-off state */ bool power_off;
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h index bbaeb5d35842..be49d62fcc2b 100644 --- a/arch/riscv/include/asm/pgtable-bits.h +++ b/arch/riscv/include/asm/pgtable-bits.h @@ -26,6 +26,7 @@
#define _PAGE_SPECIAL _PAGE_SOFT #define _PAGE_TABLE _PAGE_PRESENT +#define _PAGE_LEAF (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
/* * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 8fb356e68cc5..3d13e15e7555 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -17,6 +17,357 @@ #include <linux/sched/signal.h> #include <asm/page.h> #include <asm/pgtable.h> +#include <asm/sbi.h> + +#ifdef CONFIG_64BIT +#define stage2_have_pmd true +#define stage2_gpa_size ((gpa_t)(1ULL << 39)) +#define stage2_pgd_levels 3 +#define stage2_index_bits 9 +#else +#define stage2_have_pmd false +#define stage2_gpa_size ((gpa_t)(1ULL << 32)) +#define stage2_pgd_levels 2 +#define stage2_index_bits 10 +#endif + +#define stage2_pte_index(addr, level) \ +(((addr) >> (PAGE_SHIFT + stage2_index_bits * (level))) & (PTRS_PER_PTE - 1)) + +static inline unsigned long stage2_pte_page_vaddr(pte_t pte) +{ + return (unsigned long)pfn_to_virt(pte_val(pte) >> _PAGE_PFN_SHIFT); +} + +static int stage2_page_size_to_level(unsigned long page_size, u32 *out_level) +{ + if (page_size == PAGE_SIZE) + *out_level = 0; + else if (page_size == PMD_SIZE) + *out_level = 1; + else if (page_size == PGDIR_SIZE) + *out_level = (stage2_have_pmd) ? 2 : 1; + else + return -EINVAL; + + return 0; +} + +static int stage2_level_to_page_size(u32 level, unsigned long *out_pgsize) +{ + switch (level) { + case 0: + *out_pgsize = PAGE_SIZE; + break; + case 1: + *out_pgsize = (stage2_have_pmd) ? PMD_SIZE : PGDIR_SIZE; + break; + case 2: + *out_pgsize = PGDIR_SIZE; + break; + default: + return -EINVAL; + } + + return 0; +} + +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache, + int min, int max) +{ + void *page; + + BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS); + if (pcache->nobjs >= min) + return 0; + while (pcache->nobjs < max) { + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + pcache->objects[pcache->nobjs++] = page; + } + + return 0; +} + +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache) +{ + while (pcache && pcache->nobjs) + free_page((unsigned long)pcache->objects[--pcache->nobjs]); +} + +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache) +{ + void *p; + + if (!pcache) + return NULL; + + BUG_ON(!pcache->nobjs); + p = pcache->objects[--pcache->nobjs]; + + return p; +} + +static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr, + pte_t **ptepp, u32 *ptep_level) +{ + pte_t *ptep; + u32 current_level = stage2_pgd_levels - 1; + + *ptep_level = current_level; + ptep = (pte_t *)kvm->arch.pgd; + ptep = &ptep[stage2_pte_index(addr, current_level)]; + while (ptep && pte_val(*ptep)) { + if (pte_val(*ptep) & _PAGE_LEAF) { + *ptep_level = current_level; + *ptepp = ptep; + return true; + } + + if (current_level) { + current_level--; + *ptep_level = current_level; + ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); + ptep = &ptep[stage2_pte_index(addr, current_level)]; + } else { + ptep = NULL; + } + } + + return false; +} + +static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr) +{ + struct cpumask hmask; + unsigned long size = PAGE_SIZE; + struct kvm_vmid *vmid = &kvm->arch.vmid; + + if (stage2_level_to_page_size(level, &size)) + return; + addr &= ~(size - 1); + + /* + * TODO: Instead of cpu_online_mask, we should only target CPUs + * where the Guest/VM is running. + */ + preempt_disable(); + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); + sbi_remote_hfence_gvma_vmid(cpumask_bits(&hmask), addr, size, + READ_ONCE(vmid->vmid)); + preempt_enable(); +} + +static int stage2_set_pte(struct kvm *kvm, u32 level, + struct kvm_mmu_page_cache *pcache, + gpa_t addr, const pte_t *new_pte) +{ + u32 current_level = stage2_pgd_levels - 1; + pte_t *next_ptep = (pte_t *)kvm->arch.pgd; + pte_t *ptep = &next_ptep[stage2_pte_index(addr, current_level)]; + + if (current_level < level) + return -EINVAL; + + while (current_level != level) { + if (pte_val(*ptep) & _PAGE_LEAF) + return -EEXIST; + + if (!pte_val(*ptep)) { + next_ptep = stage2_cache_alloc(pcache); + if (!next_ptep) + return -ENOMEM; + *ptep = pfn_pte(PFN_DOWN(__pa(next_ptep)), + __pgprot(_PAGE_TABLE)); + } else { + if (pte_val(*ptep) & _PAGE_LEAF) + return -EEXIST; + next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); + } + + current_level--; + ptep = &next_ptep[stage2_pte_index(addr, current_level)]; + } + + *ptep = *new_pte; + if (pte_val(*ptep) & _PAGE_LEAF) + stage2_remote_tlb_flush(kvm, current_level, addr); + + return 0; +} + +static int stage2_map_page(struct kvm *kvm, + struct kvm_mmu_page_cache *pcache, + gpa_t gpa, phys_addr_t hpa, + unsigned long page_size, pgprot_t prot) +{ + int ret; + u32 level = 0; + pte_t new_pte; + + ret = stage2_page_size_to_level(page_size, &level); + if (ret) + return ret; + + new_pte = pfn_pte(PFN_DOWN(hpa), prot); + return stage2_set_pte(kvm, level, pcache, gpa, &new_pte); +} + +enum stage2_op { + STAGE2_OP_NOP = 0, /* Nothing */ + STAGE2_OP_CLEAR, /* Clear/Unmap */ + STAGE2_OP_WP, /* Write-protect */ +}; + +static void stage2_op_pte(struct kvm *kvm, gpa_t addr, + pte_t *ptep, u32 ptep_level, enum stage2_op op) +{ + int i, ret; + pte_t *next_ptep; + u32 next_ptep_level; + unsigned long next_page_size, page_size; + + ret = stage2_level_to_page_size(ptep_level, &page_size); + if (ret) + return; + + BUG_ON(addr & (page_size - 1)); + + if (!pte_val(*ptep)) + return; + + if (ptep_level && !(pte_val(*ptep) & _PAGE_LEAF)) { + next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); + next_ptep_level = ptep_level - 1; + ret = stage2_level_to_page_size(next_ptep_level, + &next_page_size); + if (ret) + return; + + if (op == STAGE2_OP_CLEAR) + set_pte(ptep, __pte(0)); + for (i = 0; i < PTRS_PER_PTE; i++) + stage2_op_pte(kvm, addr + i * next_page_size, + &next_ptep[i], next_ptep_level, op); + if (op == STAGE2_OP_CLEAR) + put_page(virt_to_page(next_ptep)); + } else { + if (op == STAGE2_OP_CLEAR) + set_pte(ptep, __pte(0)); + else if (op == STAGE2_OP_WP) + set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE)); + stage2_remote_tlb_flush(kvm, ptep_level, addr); + } +} + +static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size) +{ + int ret; + pte_t *ptep; + u32 ptep_level; + bool found_leaf; + unsigned long page_size; + gpa_t addr = start, end = start + size; + + while (addr < end) { + found_leaf = stage2_get_leaf_entry(kvm, addr, + &ptep, &ptep_level); + ret = stage2_level_to_page_size(ptep_level, &page_size); + if (ret) + break; + + if (!found_leaf) + goto next; + + if (!(addr & (page_size - 1)) && ((end - addr) >= page_size)) + stage2_op_pte(kvm, addr, ptep, + ptep_level, STAGE2_OP_CLEAR); + +next: + addr += page_size; + } +} + +static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end) +{ + int ret; + pte_t *ptep; + u32 ptep_level; + bool found_leaf; + gpa_t addr = start; + unsigned long page_size; + + while (addr < end) { + found_leaf = stage2_get_leaf_entry(kvm, addr, + &ptep, &ptep_level); + ret = stage2_level_to_page_size(ptep_level, &page_size); + if (ret) + break; + + if (!found_leaf) + goto next; + + if (!(addr & (page_size - 1)) && ((end - addr) >= page_size)) + stage2_op_pte(kvm, addr, ptep, + ptep_level, STAGE2_OP_WP); + +next: + addr += page_size; + } +} + +void stage2_wp_memory_region(struct kvm *kvm, int slot) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + struct kvm_memory_slot *memslot = id_to_memslot(slots, slot); + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT; + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; + + spin_lock(&kvm->mmu_lock); + stage2_wp_range(kvm, start, end); + spin_unlock(&kvm->mmu_lock); + kvm_flush_remote_tlbs(kvm); +} + +int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa, + unsigned long size, bool writable) +{ + pte_t pte; + int ret = 0; + unsigned long pfn; + phys_addr_t addr, end; + struct kvm_mmu_page_cache pcache = { 0, }; + + end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; + pfn = __phys_to_pfn(hpa); + + for (addr = gpa; addr < end; addr += PAGE_SIZE) { + pte = pfn_pte(pfn, PAGE_KERNEL); + + if (!writable) + pte = pte_wrprotect(pte); + + ret = stage2_cache_topup(&pcache, + stage2_pgd_levels, + KVM_MMU_PAGE_CACHE_NR_OBJS); + if (ret) + goto out; + + spin_lock(&kvm->mmu_lock); + ret = stage2_set_pte(kvm, 0, &pcache, addr, &pte); + spin_unlock(&kvm->mmu_lock); + if (ret) + goto out; + + pfn++; + } + +out: + stage2_cache_flush(&pcache); + return ret; + +}
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { @@ -38,7 +389,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
void kvm_arch_flush_shadow_all(struct kvm *kvm) { - /* TODO: */ + kvm_riscv_stage2_free_pgd(kvm); }
void kvm_arch_flush_shadow_memslot(struct kvm *kvm, @@ -52,7 +403,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *new, enum kvm_mr_change change) { - /* TODO: */ + /* + * At this point memslot has been committed and there is an + * allocated dirty_bitmap[], dirty pages will be be tracked while the + * memory slot is write protected. + */ + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) + stage2_wp_memory_region(kvm, mem->slot); }
int kvm_arch_prepare_memory_region(struct kvm *kvm, @@ -60,35 +417,232 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, enum kvm_mr_change change) { - /* TODO: */ - return 0; + hva_t hva = mem->userspace_addr; + hva_t reg_end = hva + mem->memory_size; + bool writable = !(mem->flags & KVM_MEM_READONLY); + int ret = 0; + + if (change != KVM_MR_CREATE && change != KVM_MR_MOVE && + change != KVM_MR_FLAGS_ONLY) + return 0; + + /* + * Prevent userspace from creating a memory region outside of the GPA + * space addressable by the KVM guest GPA space. + */ + if ((memslot->base_gfn + memslot->npages) >= + (stage2_gpa_size >> PAGE_SHIFT)) + return -EFAULT; + + mmap_read_lock(current->mm); + + /* + * A memory region could potentially cover multiple VMAs, and + * any holes between them, so iterate over all of them to find + * out if we can map any of them right now. + * + * +--------------------------------------------+ + * +---------------+----------------+ +----------------+ + * | : VMA 1 | VMA 2 | | VMA 3 : | + * +---------------+----------------+ +----------------+ + * | memory region | + * +--------------------------------------------+ + */ + do { + struct vm_area_struct *vma = find_vma(current->mm, hva); + hva_t vm_start, vm_end; + + if (!vma || vma->vm_start >= reg_end) + break; + + /* + * Mapping a read-only VMA is only allowed if the + * memory region is configured as read-only. + */ + if (writable && !(vma->vm_flags & VM_WRITE)) { + ret = -EPERM; + break; + } + + /* Take the intersection of this VMA with the memory region */ + vm_start = max(hva, vma->vm_start); + vm_end = min(reg_end, vma->vm_end); + + if (vma->vm_flags & VM_PFNMAP) { + gpa_t gpa = mem->guest_phys_addr + + (vm_start - mem->userspace_addr); + phys_addr_t pa; + + pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT; + pa += vm_start - vma->vm_start; + + /* IO region dirty page logging not allowed */ + if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) { + ret = -EINVAL; + goto out; + } + + ret = stage2_ioremap(kvm, gpa, pa, + vm_end - vm_start, writable); + if (ret) + break; + } + hva = vm_end; + } while (hva < reg_end); + + if (change == KVM_MR_FLAGS_ONLY) + goto out; + + spin_lock(&kvm->mmu_lock); + if (ret) + stage2_unmap_range(kvm, mem->guest_phys_addr, + mem->memory_size); + spin_unlock(&kvm->mmu_lock); + +out: + mmap_read_unlock(current->mm); + return ret; }
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gpa_t gpa, unsigned long hva, bool is_write) { - /* TODO: */ - return 0; + int ret; + kvm_pfn_t hfn; + bool writeable; + short vma_pageshift; + gfn_t gfn = gpa >> PAGE_SHIFT; + struct vm_area_struct *vma; + struct kvm *kvm = vcpu->kvm; + struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache; + bool logging = (memslot->dirty_bitmap && + !(memslot->flags & KVM_MEM_READONLY)) ? true : false; + unsigned long vma_pagesize; + + mmap_read_lock(current->mm); + + vma = find_vma_intersection(current->mm, hva, hva + 1); + if (unlikely(!vma)) { + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); + mmap_read_unlock(current->mm); + return -EFAULT; + } + + if (is_vm_hugetlb_page(vma)) + vma_pageshift = huge_page_shift(hstate_vma(vma)); + else + vma_pageshift = PAGE_SHIFT; + vma_pagesize = 1ULL << vma_pageshift; + if (logging || (vma->vm_flags & VM_PFNMAP)) + vma_pagesize = PAGE_SIZE; + + if (vma_pagesize == PMD_SIZE || vma_pagesize == PGDIR_SIZE) + gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; + + mmap_read_unlock(current->mm); + + if (vma_pagesize != PGDIR_SIZE && + vma_pagesize != PMD_SIZE && + vma_pagesize != PAGE_SIZE) { + kvm_err("Invalid VMA page size 0x%lx\n", vma_pagesize); + return -EFAULT; + } + + /* We need minimum second+third level pages */ + ret = stage2_cache_topup(pcache, stage2_pgd_levels, + KVM_MMU_PAGE_CACHE_NR_OBJS); + if (ret) { + kvm_err("Failed to topup stage2 cache\n"); + return ret; + } + + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable); + if (hfn == KVM_PFN_ERR_HWPOISON) { + send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, + vma_pageshift, current); + return 0; + } + if (is_error_noslot_pfn(hfn)) + return -EFAULT; + + /* + * If logging is active then we allow writable pages only + * for write faults. + */ + if (logging && !is_write) + writeable = false; + + spin_lock(&kvm->mmu_lock); + + if (writeable) { + kvm_set_pfn_dirty(hfn); + mark_page_dirty(kvm, gfn); + ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, + vma_pagesize, PAGE_WRITE_EXEC); + } else { + ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, + vma_pagesize, PAGE_READ_EXEC); + } + + if (ret) + kvm_err("Failed to map in stage2\n"); + + spin_unlock(&kvm->mmu_lock); + kvm_set_pfn_accessed(hfn); + kvm_release_pfn_clean(hfn); + return ret; }
void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu) { - /* TODO: */ + stage2_cache_flush(&vcpu->arch.mmu_page_cache); }
int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm) { - /* TODO: */ + if (kvm->arch.pgd != NULL) { + kvm_err("kvm_arch already initialized?\n"); + return -EINVAL; + } + + kvm->arch.pgd = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO); + if (!kvm->arch.pgd) + return -ENOMEM; + kvm->arch.pgd_phys = virt_to_phys(kvm->arch.pgd); + return 0; }
void kvm_riscv_stage2_free_pgd(struct kvm *kvm) { - /* TODO: */ + void *pgd = NULL; + + spin_lock(&kvm->mmu_lock); + if (kvm->arch.pgd) { + stage2_unmap_range(kvm, 0UL, stage2_gpa_size); + pgd = READ_ONCE(kvm->arch.pgd); + kvm->arch.pgd = NULL; + kvm->arch.pgd_phys = 0; + } + spin_unlock(&kvm->mmu_lock); + + /* Free the HW pgd, one page at a time */ + if (pgd) + free_pages_exact(pgd, PAGE_SIZE); }
void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu) { - /* TODO: */ + unsigned long hgatp = HGATP_MODE; + struct kvm_arch *k = &vcpu->kvm->arch; + + hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & + HGATP_VMID_MASK; + hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + + csr_write(CSR_HGATP, hgatp); + + if (!kvm_riscv_stage2_vmid_bits()) + __kvm_riscv_hfence_gvma_all(); }
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch implements MMU notifiers for KVM RISC-V so that Guest physical address space is in-sync with Host physical address space.
This will allow swapping, page migration, etc to work transparently with KVM RISC-V.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 7 ++ arch/riscv/kvm/Kconfig | 1 + arch/riscv/kvm/mmu.c | 129 +++++++++++++++++++++++++++++- arch/riscv/kvm/vm.c | 1 + 4 files changed, 137 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 9e754c0848b3..24985abbe2e7 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -200,6 +200,13 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+#define KVM_ARCH_WANT_MMU_NOTIFIER +int kvm_unmap_hva_range(struct kvm *kvm, + unsigned long start, unsigned long end); +int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end); +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); + void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa, unsigned long vmid); void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid); void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa); diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig index 88edd477b3a8..2356dc52ebb3 100644 --- a/arch/riscv/kvm/Kconfig +++ b/arch/riscv/kvm/Kconfig @@ -20,6 +20,7 @@ if VIRTUALIZATION config KVM tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)" depends on RISCV_SBI && MMU + select MMU_NOTIFIER select PREEMPT_NOTIFIERS select ANON_INODES select KVM_MMIO diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 3d13e15e7555..705c341e68cb 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -369,6 +369,38 @@ int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
}
+static int handle_hva_to_gpa(struct kvm *kvm, + unsigned long start, + unsigned long end, + int (*handler)(struct kvm *kvm, + gpa_t gpa, u64 size, + void *data), + void *data) +{ + struct kvm_memslots *slots; + struct kvm_memory_slot *memslot; + int ret = 0; + + slots = kvm_memslots(kvm); + + /* we only care about the pages that the guest sees */ + kvm_for_each_memslot(memslot, slots) { + unsigned long hva_start, hva_end; + gfn_t gpa; + + hva_start = max(start, memslot->userspace_addr); + hva_end = min(end, memslot->userspace_addr + + (memslot->npages << PAGE_SHIFT)); + if (hva_start >= hva_end) + continue; + + gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT; + ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); + } + + return ret; +} + void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { } @@ -504,6 +536,95 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, return ret; }
+static int kvm_unmap_hva_handler(struct kvm *kvm, + gpa_t gpa, u64 size, void *data) +{ + stage2_unmap_range(kvm, gpa, size); + return 0; +} + +int kvm_unmap_hva_range(struct kvm *kvm, + unsigned long start, unsigned long end) +{ + if (!kvm->arch.pgd) + return 0; + + handle_hva_to_gpa(kvm, start, end, + &kvm_unmap_hva_handler, NULL); + return 0; +} + +static int kvm_set_spte_handler(struct kvm *kvm, + gpa_t gpa, u64 size, void *data) +{ + pte_t *pte = (pte_t *)data; + + WARN_ON(size != PAGE_SIZE); + stage2_set_pte(kvm, 0, NULL, gpa, pte); + + return 0; +} + +int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) +{ + unsigned long end = hva + PAGE_SIZE; + kvm_pfn_t pfn = pte_pfn(pte); + pte_t stage2_pte; + + if (!kvm->arch.pgd) + return 0; + + stage2_pte = pfn_pte(pfn, PAGE_WRITE_EXEC); + handle_hva_to_gpa(kvm, hva, end, + &kvm_set_spte_handler, &stage2_pte); + + return 0; +} + +static int kvm_age_hva_handler(struct kvm *kvm, + gpa_t gpa, u64 size, void *data) +{ + pte_t *ptep; + u32 ptep_level = 0; + + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE); + + if (!stage2_get_leaf_entry(kvm, gpa, &ptep, &ptep_level)) + return 0; + + return ptep_test_and_clear_young(NULL, 0, ptep); +} + +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) +{ + if (!kvm->arch.pgd) + return 0; + + return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL); +} + +static int kvm_test_age_hva_handler(struct kvm *kvm, + gpa_t gpa, u64 size, void *data) +{ + pte_t *ptep; + u32 ptep_level = 0; + + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, &ptep, &ptep_level)) + return 0; + + return pte_young(*ptep); +} + +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) +{ + if (!kvm->arch.pgd) + return 0; + + return handle_hva_to_gpa(kvm, hva, hva, + kvm_test_age_hva_handler, NULL); +} + int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gpa_t gpa, unsigned long hva, bool is_write) @@ -518,7 +639,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache; bool logging = (memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY)) ? true : false; - unsigned long vma_pagesize; + unsigned long vma_pagesize, mmu_seq;
mmap_read_lock(current->mm);
@@ -557,6 +678,8 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, return ret; }
+ mmu_seq = kvm->mmu_notifier_seq; + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable); if (hfn == KVM_PFN_ERR_HWPOISON) { send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, @@ -575,6 +698,9 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
spin_lock(&kvm->mmu_lock);
+ if (mmu_notifier_retry(kvm, mmu_seq)) + goto out_unlock; + if (writeable) { kvm_set_pfn_dirty(hfn); mark_page_dirty(kvm, gfn); @@ -588,6 +714,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, if (ret) kvm_err("Failed to map in stage2\n");
+out_unlock: spin_unlock(&kvm->mmu_lock); kvm_set_pfn_accessed(hfn); kvm_release_pfn_clean(hfn); diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index c5aab5478c38..fd84b4d914dc 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -54,6 +54,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) switch (ext) { case KVM_CAP_DEVICE_CTRL: case KVM_CAP_USER_MEMORY: + case KVM_CAP_SYNC_MMU: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_ONE_REG: case KVM_CAP_READONLY_MEM:
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
The RISC-V hypervisor specification doesn't have any virtual timer feature.
Due to this, the guest VCPU timer will be programmed via SBI calls. The host will use a separate hrtimer event for each guest VCPU to provide timer functionality. We inject a virtual timer interrupt to the guest VCPU whenever the guest VCPU hrtimer event expires.
This patch adds guest VCPU timer implementation along with ONE_REG interface to access VCPU timer state from user space.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 7 + arch/riscv/include/asm/kvm_vcpu_timer.h | 44 +++++ arch/riscv/include/uapi/asm/kvm.h | 17 ++ arch/riscv/kvm/Makefile | 2 +- arch/riscv/kvm/vcpu.c | 14 ++ arch/riscv/kvm/vcpu_timer.c | 225 ++++++++++++++++++++++++ arch/riscv/kvm/vm.c | 2 +- drivers/clocksource/timer-riscv.c | 8 + include/clocksource/timer-riscv.h | 16 ++ 9 files changed, 333 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h create mode 100644 arch/riscv/kvm/vcpu_timer.c create mode 100644 include/clocksource/timer-riscv.h
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 24985abbe2e7..0ce8329c84b0 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -12,6 +12,7 @@ #include <linux/types.h> #include <linux/kvm.h> #include <linux/kvm_types.h> +#include <asm/kvm_vcpu_timer.h>
#ifdef CONFIG_64BIT #define KVM_MAX_VCPUS (1U << 16) @@ -66,6 +67,9 @@ struct kvm_arch { /* stage2 page table */ pgd_t *pgd; phys_addr_t pgd_phys; + + /* Guest Timer */ + struct kvm_guest_timer timer; };
struct kvm_mmio_decode { @@ -178,6 +182,9 @@ struct kvm_vcpu_arch { unsigned long irqs_pending; unsigned long irqs_pending_mask;
+ /* VCPU Timer */ + struct kvm_vcpu_timer timer; + /* MMIO instruction details */ struct kvm_mmio_decode mmio_decode;
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h new file mode 100644 index 000000000000..375281eb49e0 --- /dev/null +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Atish Patra atish.patra@wdc.com + */ + +#ifndef __KVM_VCPU_RISCV_TIMER_H +#define __KVM_VCPU_RISCV_TIMER_H + +#include <linux/hrtimer.h> + +struct kvm_guest_timer { + /* Mult & Shift values to get nanoseconds from cycles */ + u32 nsec_mult; + u32 nsec_shift; + /* Time delta value */ + u64 time_delta; +}; + +struct kvm_vcpu_timer { + /* Flag for whether init is done */ + bool init_done; + /* Flag for whether timer event is configured */ + bool next_set; + /* Next timer event cycles */ + u64 next_cycles; + /* Underlying hrtimer instance */ + struct hrtimer hrt; +}; + +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles); +int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg); +int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg); +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu); +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu); +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu); +void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu); +int kvm_riscv_guest_timer_init(struct kvm *kvm); + +#endif diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index 3a20327242f1..8f15eee35a1e 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -73,6 +73,18 @@ struct kvm_riscv_csr { unsigned long satp; };
+/* TIMER registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ +struct kvm_riscv_timer { + u64 frequency; + u64 time; + u64 compare; + u64 state; +}; + +/* Possible states for kvm_riscv_timer */ +#define KVM_RISCV_TIMER_STATE_OFF 0 +#define KVM_RISCV_TIMER_STATE_ON 1 + #define KVM_REG_SIZE(id) \ (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
@@ -95,6 +107,11 @@ struct kvm_riscv_csr { #define KVM_REG_RISCV_CSR_REG(name) \ (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))
+/* Timer registers are mapped as type 4 */ +#define KVM_REG_RISCV_TIMER (0x04 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_TIMER_REG(name) \ + (offsetof(struct kvm_riscv_timer, name) / sizeof(u64)) + #endif
#endif /* __LINUX_KVM_RISCV_H */ diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile index c0f57f26c13d..3e0c7558320d 100644 --- a/arch/riscv/kvm/Makefile +++ b/arch/riscv/kvm/Makefile @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm kvm-objs := $(common-objs-y)
kvm-objs += main.o vm.o vmid.o tlb.o mmu.o -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
obj-$(CONFIG_KVM) += kvm.o diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 82228e8b6cab..f2de6f120d50 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -55,6 +55,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(cntx, reset_cntx, sizeof(*cntx));
+ kvm_riscv_vcpu_timer_reset(vcpu); + WRITE_ONCE(vcpu->arch.irqs_pending, 0); WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); } @@ -82,6 +84,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) cntx->hstatus |= HSTATUS_SPVP; cntx->hstatus |= HSTATUS_SPV;
+ /* Setup VCPU timer */ + kvm_riscv_vcpu_timer_init(vcpu); + /* Reset VCPU */ kvm_riscv_reset_vcpu(vcpu);
@@ -99,6 +104,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) { + /* Cleanup VCPU timer */ + kvm_riscv_vcpu_timer_deinit(vcpu); + /* Flush the pages pre-allocated for Stage2 page table mappings */ kvm_riscv_stage2_flush_cache(vcpu); } @@ -349,6 +357,8 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, return kvm_riscv_vcpu_set_reg_core(vcpu, reg); else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) return kvm_riscv_vcpu_set_reg_csr(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) + return kvm_riscv_vcpu_set_reg_timer(vcpu, reg);
return -EINVAL; } @@ -362,6 +372,8 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, return kvm_riscv_vcpu_get_reg_core(vcpu, reg); else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) return kvm_riscv_vcpu_get_reg_csr(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) + return kvm_riscv_vcpu_get_reg_timer(vcpu, reg);
return -EINVAL; } @@ -594,6 +606,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_riscv_stage2_update_hgatp(vcpu);
+ kvm_riscv_vcpu_timer_restore(vcpu); + vcpu->cpu = cpu; }
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c new file mode 100644 index 000000000000..5fb9fe378800 --- /dev/null +++ b/arch/riscv/kvm/vcpu_timer.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Atish Patra atish.patra@wdc.com + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/kvm_host.h> +#include <linux/uaccess.h> +#include <clocksource/timer-riscv.h> +#include <asm/csr.h> +#include <asm/delay.h> +#include <asm/kvm_vcpu_timer.h> + +static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt) +{ + return get_cycles64() + gt->time_delta; +} + +static u64 kvm_riscv_delta_cycles2ns(u64 cycles, + struct kvm_guest_timer *gt, + struct kvm_vcpu_timer *t) +{ + unsigned long flags; + u64 cycles_now, cycles_delta, delta_ns; + + local_irq_save(flags); + cycles_now = kvm_riscv_current_cycles(gt); + if (cycles_now < cycles) + cycles_delta = cycles - cycles_now; + else + cycles_delta = 0; + delta_ns = (cycles_delta * gt->nsec_mult) >> gt->nsec_shift; + local_irq_restore(flags); + + return delta_ns; +} + +static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h) +{ + u64 delta_ns; + struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt); + struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer); + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; + + if (kvm_riscv_current_cycles(gt) < t->next_cycles) { + delta_ns = kvm_riscv_delta_cycles2ns(t->next_cycles, gt, t); + hrtimer_forward_now(&t->hrt, ktime_set(0, delta_ns)); + return HRTIMER_RESTART; + } + + t->next_set = false; + kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_TIMER); + + return HRTIMER_NORESTART; +} + +static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t) +{ + if (!t->init_done || !t->next_set) + return -EINVAL; + + hrtimer_cancel(&t->hrt); + t->next_set = false; + + return 0; +} + +int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles) +{ + struct kvm_vcpu_timer *t = &vcpu->arch.timer; + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; + u64 delta_ns; + + if (!t->init_done) + return -EINVAL; + + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_TIMER); + + delta_ns = kvm_riscv_delta_cycles2ns(ncycles, gt, t); + t->next_cycles = ncycles; + hrtimer_start(&t->hrt, ktime_set(0, delta_ns), HRTIMER_MODE_REL); + t->next_set = true; + + return 0; +} + +int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_timer *t = &vcpu->arch.timer; + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; + u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_TIMER); + u64 reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64)) + return -EINVAL; + + switch (reg_num) { + case KVM_REG_RISCV_TIMER_REG(frequency): + reg_val = riscv_timebase; + break; + case KVM_REG_RISCV_TIMER_REG(time): + reg_val = kvm_riscv_current_cycles(gt); + break; + case KVM_REG_RISCV_TIMER_REG(compare): + reg_val = t->next_cycles; + break; + case KVM_REG_RISCV_TIMER_REG(state): + reg_val = (t->next_set) ? KVM_RISCV_TIMER_STATE_ON : + KVM_RISCV_TIMER_STATE_OFF; + break; + default: + return -EINVAL; + }; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_timer *t = &vcpu->arch.timer; + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; + u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_TIMER); + u64 reg_val; + int ret = 0; + + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + switch (reg_num) { + case KVM_REG_RISCV_TIMER_REG(frequency): + ret = -ENOTSUPP; + break; + case KVM_REG_RISCV_TIMER_REG(time): + gt->time_delta = reg_val - get_cycles64(); + break; + case KVM_REG_RISCV_TIMER_REG(compare): + t->next_cycles = reg_val; + break; + case KVM_REG_RISCV_TIMER_REG(state): + if (reg_val == KVM_RISCV_TIMER_STATE_ON) + ret = kvm_riscv_vcpu_timer_next_event(vcpu, reg_val); + else + ret = kvm_riscv_vcpu_timer_cancel(t); + break; + default: + ret = -EINVAL; + break; + }; + + return ret; +} + +int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu) +{ + struct kvm_vcpu_timer *t = &vcpu->arch.timer; + + if (t->init_done) + return -EINVAL; + + hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + t->hrt.function = kvm_riscv_vcpu_hrtimer_expired; + t->init_done = true; + t->next_set = false; + + return 0; +} + +int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu) +{ + int ret; + + ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer); + vcpu->arch.timer.init_done = false; + + return ret; +} + +int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu) +{ + return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer); +} + +void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu) +{ + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; + +#ifdef CONFIG_64BIT + csr_write(CSR_HTIMEDELTA, gt->time_delta); +#else + csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta)); + csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32)); +#endif +} + +int kvm_riscv_guest_timer_init(struct kvm *kvm) +{ + struct kvm_guest_timer *gt = &kvm->arch.timer; + + riscv_cs_get_mult_shift(>->nsec_mult, >->nsec_shift); + gt->time_delta = -get_cycles64(); + + return 0; +} diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index fd84b4d914dc..4f2498198cb5 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -32,7 +32,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return r; }
- return 0; + return kvm_riscv_guest_timer_init(kvm); }
void kvm_arch_destroy_vm(struct kvm *kvm) diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c index c4f15c4068c0..ab6605c33d7b 100644 --- a/drivers/clocksource/timer-riscv.c +++ b/drivers/clocksource/timer-riscv.c @@ -12,6 +12,7 @@ #include <linux/cpu.h> #include <linux/delay.h> #include <linux/irq.h> +#include <linux/module.h> #include <linux/sched_clock.h> #include <linux/io-64-nonatomic-lo-hi.h> #include <asm/smp.h> @@ -86,6 +87,13 @@ static int riscv_timer_dying_cpu(unsigned int cpu) return 0; }
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift) +{ + *mult = riscv_clocksource.mult; + *shift = riscv_clocksource.shift; +} +EXPORT_SYMBOL_GPL(riscv_cs_get_mult_shift); + /* called directly from the low-level interrupt handler */ void riscv_timer_interrupt(void) { diff --git a/include/clocksource/timer-riscv.h b/include/clocksource/timer-riscv.h new file mode 100644 index 000000000000..e94e4feecbe8 --- /dev/null +++ b/include/clocksource/timer-riscv.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Atish Patra atish.patra@wdc.com + */ + +#ifndef __TIMER_RISCV_H +#define __TIMER_RISCV_H + +#include <linux/types.h> + +void riscv_cs_get_mult_shift(u32 *mult, u32 *shift); + +#endif
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This patch adds floating point (F and D extension) context save/restore for guest VCPUs. The FP context is saved and restored lazily only when kernel enter/exits the in-kernel run loop and not during the KVM world switch. This way FP save/restore has minimal impact on KVM performance.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 5 + arch/riscv/kernel/asm-offsets.c | 72 +++++++++++++ arch/riscv/kvm/vcpu.c | 91 ++++++++++++++++ arch/riscv/kvm/vcpu_switch.S | 174 ++++++++++++++++++++++++++++++ 4 files changed, 342 insertions(+)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 0ce8329c84b0..4571938e937f 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -130,6 +130,7 @@ struct kvm_cpu_context { unsigned long sepc; unsigned long sstatus; unsigned long hstatus; + union __riscv_fp_state fp; };
struct kvm_vcpu_csr { @@ -246,6 +247,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_cpu_trap *trap);
void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch); +void __kvm_riscv_fp_f_save(struct kvm_cpu_context *context); +void __kvm_riscv_fp_f_restore(struct kvm_cpu_context *context); +void __kvm_riscv_fp_d_save(struct kvm_cpu_context *context); +void __kvm_riscv_fp_d_restore(struct kvm_cpu_context *context);
int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index f7e43fe55335..80673f7ef5cf 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -191,6 +191,78 @@ void asm_offsets(void) OFFSET(KVM_ARCH_TRAP_HTVAL, kvm_cpu_trap, htval); OFFSET(KVM_ARCH_TRAP_HTINST, kvm_cpu_trap, htinst);
+ /* F extension */ + + OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]); + OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]); + OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]); + OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]); + OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]); + OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]); + OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]); + OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]); + OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]); + OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]); + OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]); + OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]); + OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]); + OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]); + OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]); + OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]); + OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]); + OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]); + OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]); + OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]); + OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]); + OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]); + OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]); + OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]); + OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]); + OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]); + OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]); + OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]); + OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]); + OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]); + OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]); + OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]); + OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr); + + /* D extension */ + + OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]); + OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]); + OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]); + OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]); + OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]); + OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]); + OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]); + OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]); + OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]); + OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]); + OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]); + OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_context, fp.d.f[11]); + OFFSET(KVM_ARCH_FP_D_F12, kvm_cpu_context, fp.d.f[12]); + OFFSET(KVM_ARCH_FP_D_F13, kvm_cpu_context, fp.d.f[13]); + OFFSET(KVM_ARCH_FP_D_F14, kvm_cpu_context, fp.d.f[14]); + OFFSET(KVM_ARCH_FP_D_F15, kvm_cpu_context, fp.d.f[15]); + OFFSET(KVM_ARCH_FP_D_F16, kvm_cpu_context, fp.d.f[16]); + OFFSET(KVM_ARCH_FP_D_F17, kvm_cpu_context, fp.d.f[17]); + OFFSET(KVM_ARCH_FP_D_F18, kvm_cpu_context, fp.d.f[18]); + OFFSET(KVM_ARCH_FP_D_F19, kvm_cpu_context, fp.d.f[19]); + OFFSET(KVM_ARCH_FP_D_F20, kvm_cpu_context, fp.d.f[20]); + OFFSET(KVM_ARCH_FP_D_F21, kvm_cpu_context, fp.d.f[21]); + OFFSET(KVM_ARCH_FP_D_F22, kvm_cpu_context, fp.d.f[22]); + OFFSET(KVM_ARCH_FP_D_F23, kvm_cpu_context, fp.d.f[23]); + OFFSET(KVM_ARCH_FP_D_F24, kvm_cpu_context, fp.d.f[24]); + OFFSET(KVM_ARCH_FP_D_F25, kvm_cpu_context, fp.d.f[25]); + OFFSET(KVM_ARCH_FP_D_F26, kvm_cpu_context, fp.d.f[26]); + OFFSET(KVM_ARCH_FP_D_F27, kvm_cpu_context, fp.d.f[27]); + OFFSET(KVM_ARCH_FP_D_F28, kvm_cpu_context, fp.d.f[28]); + OFFSET(KVM_ARCH_FP_D_F29, kvm_cpu_context, fp.d.f[29]); + OFFSET(KVM_ARCH_FP_D_F30, kvm_cpu_context, fp.d.f[30]); + OFFSET(KVM_ARCH_FP_D_F31, kvm_cpu_context, fp.d.f[31]); + OFFSET(KVM_ARCH_FP_D_FCSR, kvm_cpu_context, fp.d.fcsr); + /* * THREAD_{F,X}* might be larger than a S-type offset can handle, but * these are used in performance-sensitive assembly so we can't resort diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index f2de6f120d50..8c35775c0722 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -35,6 +35,86 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } };
+#ifdef CONFIG_FPU +static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) +{ + unsigned long isa = vcpu->arch.isa; + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + + cntx->sstatus &= ~SR_FS; + if (riscv_isa_extension_available(&isa, f) || + riscv_isa_extension_available(&isa, d)) + cntx->sstatus |= SR_FS_INITIAL; + else + cntx->sstatus |= SR_FS_OFF; +} + +static void kvm_riscv_vcpu_fp_clean(struct kvm_cpu_context *cntx) +{ + cntx->sstatus &= ~SR_FS; + cntx->sstatus |= SR_FS_CLEAN; +} + +static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx, + unsigned long isa) +{ + if ((cntx->sstatus & SR_FS) == SR_FS_DIRTY) { + if (riscv_isa_extension_available(&isa, d)) + __kvm_riscv_fp_d_save(cntx); + else if (riscv_isa_extension_available(&isa, f)) + __kvm_riscv_fp_f_save(cntx); + kvm_riscv_vcpu_fp_clean(cntx); + } +} + +static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx, + unsigned long isa) +{ + if ((cntx->sstatus & SR_FS) != SR_FS_OFF) { + if (riscv_isa_extension_available(&isa, d)) + __kvm_riscv_fp_d_restore(cntx); + else if (riscv_isa_extension_available(&isa, f)) + __kvm_riscv_fp_f_restore(cntx); + kvm_riscv_vcpu_fp_clean(cntx); + } +} + +static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) +{ + /* No need to check host sstatus as it can be modified outside */ + if (riscv_isa_extension_available(NULL, d)) + __kvm_riscv_fp_d_save(cntx); + else if (riscv_isa_extension_available(NULL, f)) + __kvm_riscv_fp_f_save(cntx); +} + +static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx) +{ + if (riscv_isa_extension_available(NULL, d)) + __kvm_riscv_fp_d_restore(cntx); + else if (riscv_isa_extension_available(NULL, f)) + __kvm_riscv_fp_f_restore(cntx); +} +#else +static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) +{ +} +static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx, + unsigned long isa) +{ +} +static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx, + unsigned long isa) +{ +} +static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) +{ +} +static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx) +{ +} +#endif + #define KVM_RISCV_ISA_ALLOWED (riscv_isa_extension_mask(a) | \ riscv_isa_extension_mask(c) | \ riscv_isa_extension_mask(d) | \ @@ -55,6 +135,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(cntx, reset_cntx, sizeof(*cntx));
+ kvm_riscv_vcpu_fp_reset(vcpu); + kvm_riscv_vcpu_timer_reset(vcpu);
WRITE_ONCE(vcpu->arch.irqs_pending, 0); @@ -204,6 +286,7 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu, vcpu->arch.isa = reg_val; vcpu->arch.isa &= riscv_isa_extension_base(NULL); vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED; + kvm_riscv_vcpu_fp_reset(vcpu); } else { return -ENOTSUPP; } @@ -608,6 +691,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_riscv_vcpu_timer_restore(vcpu);
+ kvm_riscv_vcpu_host_fp_save(&vcpu->arch.host_context); + kvm_riscv_vcpu_guest_fp_restore(&vcpu->arch.guest_context, + vcpu->arch.isa); + vcpu->cpu = cpu; }
@@ -617,6 +704,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
vcpu->cpu = -1;
+ kvm_riscv_vcpu_guest_fp_save(&vcpu->arch.guest_context, + vcpu->arch.isa); + kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context); + csr_write(CSR_HGATP, 0);
csr->vsstatus = csr_read(CSR_VSSTATUS); diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S index ee2a2326e281..a474979de8b4 100644 --- a/arch/riscv/kvm/vcpu_switch.S +++ b/arch/riscv/kvm/vcpu_switch.S @@ -215,3 +215,177 @@ ENTRY(__kvm_riscv_unpriv_trap) REG_S a1, (KVM_ARCH_TRAP_HTINST)(a0) sret ENDPROC(__kvm_riscv_unpriv_trap) + +#ifdef CONFIG_FPU + .align 3 + .global __kvm_riscv_fp_f_save +__kvm_riscv_fp_f_save: + csrr t2, CSR_SSTATUS + li t1, SR_FS + csrs CSR_SSTATUS, t1 + frcsr t0 + fsw f0, KVM_ARCH_FP_F_F0(a0) + fsw f1, KVM_ARCH_FP_F_F1(a0) + fsw f2, KVM_ARCH_FP_F_F2(a0) + fsw f3, KVM_ARCH_FP_F_F3(a0) + fsw f4, KVM_ARCH_FP_F_F4(a0) + fsw f5, KVM_ARCH_FP_F_F5(a0) + fsw f6, KVM_ARCH_FP_F_F6(a0) + fsw f7, KVM_ARCH_FP_F_F7(a0) + fsw f8, KVM_ARCH_FP_F_F8(a0) + fsw f9, KVM_ARCH_FP_F_F9(a0) + fsw f10, KVM_ARCH_FP_F_F10(a0) + fsw f11, KVM_ARCH_FP_F_F11(a0) + fsw f12, KVM_ARCH_FP_F_F12(a0) + fsw f13, KVM_ARCH_FP_F_F13(a0) + fsw f14, KVM_ARCH_FP_F_F14(a0) + fsw f15, KVM_ARCH_FP_F_F15(a0) + fsw f16, KVM_ARCH_FP_F_F16(a0) + fsw f17, KVM_ARCH_FP_F_F17(a0) + fsw f18, KVM_ARCH_FP_F_F18(a0) + fsw f19, KVM_ARCH_FP_F_F19(a0) + fsw f20, KVM_ARCH_FP_F_F20(a0) + fsw f21, KVM_ARCH_FP_F_F21(a0) + fsw f22, KVM_ARCH_FP_F_F22(a0) + fsw f23, KVM_ARCH_FP_F_F23(a0) + fsw f24, KVM_ARCH_FP_F_F24(a0) + fsw f25, KVM_ARCH_FP_F_F25(a0) + fsw f26, KVM_ARCH_FP_F_F26(a0) + fsw f27, KVM_ARCH_FP_F_F27(a0) + fsw f28, KVM_ARCH_FP_F_F28(a0) + fsw f29, KVM_ARCH_FP_F_F29(a0) + fsw f30, KVM_ARCH_FP_F_F30(a0) + fsw f31, KVM_ARCH_FP_F_F31(a0) + sw t0, KVM_ARCH_FP_F_FCSR(a0) + csrw CSR_SSTATUS, t2 + ret + + .align 3 + .global __kvm_riscv_fp_d_save +__kvm_riscv_fp_d_save: + csrr t2, CSR_SSTATUS + li t1, SR_FS + csrs CSR_SSTATUS, t1 + frcsr t0 + fsd f0, KVM_ARCH_FP_D_F0(a0) + fsd f1, KVM_ARCH_FP_D_F1(a0) + fsd f2, KVM_ARCH_FP_D_F2(a0) + fsd f3, KVM_ARCH_FP_D_F3(a0) + fsd f4, KVM_ARCH_FP_D_F4(a0) + fsd f5, KVM_ARCH_FP_D_F5(a0) + fsd f6, KVM_ARCH_FP_D_F6(a0) + fsd f7, KVM_ARCH_FP_D_F7(a0) + fsd f8, KVM_ARCH_FP_D_F8(a0) + fsd f9, KVM_ARCH_FP_D_F9(a0) + fsd f10, KVM_ARCH_FP_D_F10(a0) + fsd f11, KVM_ARCH_FP_D_F11(a0) + fsd f12, KVM_ARCH_FP_D_F12(a0) + fsd f13, KVM_ARCH_FP_D_F13(a0) + fsd f14, KVM_ARCH_FP_D_F14(a0) + fsd f15, KVM_ARCH_FP_D_F15(a0) + fsd f16, KVM_ARCH_FP_D_F16(a0) + fsd f17, KVM_ARCH_FP_D_F17(a0) + fsd f18, KVM_ARCH_FP_D_F18(a0) + fsd f19, KVM_ARCH_FP_D_F19(a0) + fsd f20, KVM_ARCH_FP_D_F20(a0) + fsd f21, KVM_ARCH_FP_D_F21(a0) + fsd f22, KVM_ARCH_FP_D_F22(a0) + fsd f23, KVM_ARCH_FP_D_F23(a0) + fsd f24, KVM_ARCH_FP_D_F24(a0) + fsd f25, KVM_ARCH_FP_D_F25(a0) + fsd f26, KVM_ARCH_FP_D_F26(a0) + fsd f27, KVM_ARCH_FP_D_F27(a0) + fsd f28, KVM_ARCH_FP_D_F28(a0) + fsd f29, KVM_ARCH_FP_D_F29(a0) + fsd f30, KVM_ARCH_FP_D_F30(a0) + fsd f31, KVM_ARCH_FP_D_F31(a0) + sw t0, KVM_ARCH_FP_D_FCSR(a0) + csrw CSR_SSTATUS, t2 + ret + + .align 3 + .global __kvm_riscv_fp_f_restore +__kvm_riscv_fp_f_restore: + csrr t2, CSR_SSTATUS + li t1, SR_FS + lw t0, KVM_ARCH_FP_F_FCSR(a0) + csrs CSR_SSTATUS, t1 + flw f0, KVM_ARCH_FP_F_F0(a0) + flw f1, KVM_ARCH_FP_F_F1(a0) + flw f2, KVM_ARCH_FP_F_F2(a0) + flw f3, KVM_ARCH_FP_F_F3(a0) + flw f4, KVM_ARCH_FP_F_F4(a0) + flw f5, KVM_ARCH_FP_F_F5(a0) + flw f6, KVM_ARCH_FP_F_F6(a0) + flw f7, KVM_ARCH_FP_F_F7(a0) + flw f8, KVM_ARCH_FP_F_F8(a0) + flw f9, KVM_ARCH_FP_F_F9(a0) + flw f10, KVM_ARCH_FP_F_F10(a0) + flw f11, KVM_ARCH_FP_F_F11(a0) + flw f12, KVM_ARCH_FP_F_F12(a0) + flw f13, KVM_ARCH_FP_F_F13(a0) + flw f14, KVM_ARCH_FP_F_F14(a0) + flw f15, KVM_ARCH_FP_F_F15(a0) + flw f16, KVM_ARCH_FP_F_F16(a0) + flw f17, KVM_ARCH_FP_F_F17(a0) + flw f18, KVM_ARCH_FP_F_F18(a0) + flw f19, KVM_ARCH_FP_F_F19(a0) + flw f20, KVM_ARCH_FP_F_F20(a0) + flw f21, KVM_ARCH_FP_F_F21(a0) + flw f22, KVM_ARCH_FP_F_F22(a0) + flw f23, KVM_ARCH_FP_F_F23(a0) + flw f24, KVM_ARCH_FP_F_F24(a0) + flw f25, KVM_ARCH_FP_F_F25(a0) + flw f26, KVM_ARCH_FP_F_F26(a0) + flw f27, KVM_ARCH_FP_F_F27(a0) + flw f28, KVM_ARCH_FP_F_F28(a0) + flw f29, KVM_ARCH_FP_F_F29(a0) + flw f30, KVM_ARCH_FP_F_F30(a0) + flw f31, KVM_ARCH_FP_F_F31(a0) + fscsr t0 + csrw CSR_SSTATUS, t2 + ret + + .align 3 + .global __kvm_riscv_fp_d_restore +__kvm_riscv_fp_d_restore: + csrr t2, CSR_SSTATUS + li t1, SR_FS + lw t0, KVM_ARCH_FP_D_FCSR(a0) + csrs CSR_SSTATUS, t1 + fld f0, KVM_ARCH_FP_D_F0(a0) + fld f1, KVM_ARCH_FP_D_F1(a0) + fld f2, KVM_ARCH_FP_D_F2(a0) + fld f3, KVM_ARCH_FP_D_F3(a0) + fld f4, KVM_ARCH_FP_D_F4(a0) + fld f5, KVM_ARCH_FP_D_F5(a0) + fld f6, KVM_ARCH_FP_D_F6(a0) + fld f7, KVM_ARCH_FP_D_F7(a0) + fld f8, KVM_ARCH_FP_D_F8(a0) + fld f9, KVM_ARCH_FP_D_F9(a0) + fld f10, KVM_ARCH_FP_D_F10(a0) + fld f11, KVM_ARCH_FP_D_F11(a0) + fld f12, KVM_ARCH_FP_D_F12(a0) + fld f13, KVM_ARCH_FP_D_F13(a0) + fld f14, KVM_ARCH_FP_D_F14(a0) + fld f15, KVM_ARCH_FP_D_F15(a0) + fld f16, KVM_ARCH_FP_D_F16(a0) + fld f17, KVM_ARCH_FP_D_F17(a0) + fld f18, KVM_ARCH_FP_D_F18(a0) + fld f19, KVM_ARCH_FP_D_F19(a0) + fld f20, KVM_ARCH_FP_D_F20(a0) + fld f21, KVM_ARCH_FP_D_F21(a0) + fld f22, KVM_ARCH_FP_D_F22(a0) + fld f23, KVM_ARCH_FP_D_F23(a0) + fld f24, KVM_ARCH_FP_D_F24(a0) + fld f25, KVM_ARCH_FP_D_F25(a0) + fld f26, KVM_ARCH_FP_D_F26(a0) + fld f27, KVM_ARCH_FP_D_F27(a0) + fld f28, KVM_ARCH_FP_D_F28(a0) + fld f29, KVM_ARCH_FP_D_F29(a0) + fld f30, KVM_ARCH_FP_D_F30(a0) + fld f31, KVM_ARCH_FP_D_F31(a0) + fscsr t0 + csrw CSR_SSTATUS, t2 + ret +#endif
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating point registers such as F0-F31 and FCSR. This support is added for both 'F' and 'D' extensions.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/uapi/asm/kvm.h | 10 +++ arch/riscv/kvm/vcpu.c | 104 ++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index 8f15eee35a1e..f4274c2e5cdc 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -112,6 +112,16 @@ struct kvm_riscv_timer { #define KVM_REG_RISCV_TIMER_REG(name) \ (offsetof(struct kvm_riscv_timer, name) / sizeof(u64))
+/* F extension registers are mapped as type 5 */ +#define KVM_REG_RISCV_FP_F (0x05 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_FP_F_REG(name) \ + (offsetof(struct __riscv_f_ext_state, name) / sizeof(u32)) + +/* D extension registers are mapped as type 6 */ +#define KVM_REG_RISCV_FP_D (0x06 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_FP_D_REG(name) \ + (offsetof(struct __riscv_d_ext_state, name) / sizeof(u64)) + #endif
#endif /* __LINUX_KVM_RISCV_H */ diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 8c35775c0722..96605f773933 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -431,6 +431,98 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu, return 0; }
+static int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg, + unsigned long rtype) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long isa = vcpu->arch.isa; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + rtype); + void *reg_val; + + if ((rtype == KVM_REG_RISCV_FP_F) && + riscv_isa_extension_available(&isa, f)) { + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) + return -EINVAL; + if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) + reg_val = &cntx->fp.f.fcsr; + else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) + reg_val = &cntx->fp.f.f[reg_num]; + else + return -EINVAL; + } else if ((rtype == KVM_REG_RISCV_FP_D) && + riscv_isa_extension_available(&isa, d)) { + if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) { + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) + return -EINVAL; + reg_val = &cntx->fp.d.fcsr; + } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) && + reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) + return -EINVAL; + reg_val = &cntx->fp.d.f[reg_num]; + } else + return -EINVAL; + } else + return -EINVAL; + + if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg, + unsigned long rtype) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long isa = vcpu->arch.isa; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + rtype); + void *reg_val; + + if ((rtype == KVM_REG_RISCV_FP_F) && + riscv_isa_extension_available(&isa, f)) { + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) + return -EINVAL; + if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) + reg_val = &cntx->fp.f.fcsr; + else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) + reg_val = &cntx->fp.f.f[reg_num]; + else + return -EINVAL; + } else if ((rtype == KVM_REG_RISCV_FP_D) && + riscv_isa_extension_available(&isa, d)) { + if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) { + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) + return -EINVAL; + reg_val = &cntx->fp.d.fcsr; + } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) && + reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) + return -EINVAL; + reg_val = &cntx->fp.d.f[reg_num]; + } else + return -EINVAL; + } else + return -EINVAL; + + if (copy_from_user(reg_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) { @@ -442,6 +534,12 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, return kvm_riscv_vcpu_set_reg_csr(vcpu, reg); else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) return kvm_riscv_vcpu_set_reg_timer(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F) + return kvm_riscv_vcpu_set_reg_fp(vcpu, reg, + KVM_REG_RISCV_FP_F); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D) + return kvm_riscv_vcpu_set_reg_fp(vcpu, reg, + KVM_REG_RISCV_FP_D);
return -EINVAL; } @@ -457,6 +555,12 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, return kvm_riscv_vcpu_get_reg_csr(vcpu, reg); else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) return kvm_riscv_vcpu_get_reg_timer(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F) + return kvm_riscv_vcpu_get_reg_fp(vcpu, reg, + KVM_REG_RISCV_FP_F); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D) + return kvm_riscv_vcpu_get_reg_fp(vcpu, reg, + KVM_REG_RISCV_FP_D);
return -EINVAL; }
From: Atish Patra atish.patra@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
The KVM host kernel is running in HS-mode needs so we need to handle the SBI calls coming from guest kernel running in VS-mode.
This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1 calls are implemented in KVM kernel module except GETCHAR and PUTCHART calls which are forwarded to user space because these calls cannot be implemented in kernel space. In future, when we implement SBI v0.2 for Guest, we will forward SBI v0.2 experimental and vendor extension calls to user space.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 10 ++ arch/riscv/kvm/Makefile | 2 +- arch/riscv/kvm/vcpu.c | 9 ++ arch/riscv/kvm/vcpu_exit.c | 4 + arch/riscv/kvm/vcpu_sbi.c | 173 ++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 8 ++ 6 files changed, 205 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/kvm/vcpu_sbi.c
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 4571938e937f..ae43bd204284 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -79,6 +79,10 @@ struct kvm_mmio_decode { int return_handled; };
+struct kvm_sbi_context { + int return_handled; +}; + #define KVM_MMU_PAGE_CACHE_NR_OBJS 32
struct kvm_mmu_page_cache { @@ -189,6 +193,9 @@ struct kvm_vcpu_arch { /* MMIO instruction details */ struct kvm_mmio_decode mmio_decode;
+ /* SBI context */ + struct kvm_sbi_context sbi_context; + /* Cache pages needed to program page tables with spinlock held */ struct kvm_mmu_page_cache mmu_page_cache;
@@ -260,4 +267,7 @@ bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask); void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run); +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run); + #endif /* __RISCV_KVM_HOST_H__ */ diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile index 3e0c7558320d..b56dc1650d2c 100644 --- a/arch/riscv/kvm/Makefile +++ b/arch/riscv/kvm/Makefile @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm kvm-objs := $(common-objs-y)
kvm-objs += main.o vm.o vmid.o tlb.o mmu.o -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
obj-$(CONFIG_KVM) += kvm.o diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 96605f773933..adb0815951aa 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -882,6 +882,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) } }
+ /* Process SBI value returned from user-space */ + if (run->exit_reason == KVM_EXIT_RISCV_SBI) { + ret = kvm_riscv_vcpu_sbi_return(vcpu, vcpu->run); + if (ret) { + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); + return ret; + } + } + if (run->immediate_exit) { srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); return -EINTR; diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index 6ef833e29e71..defe92594cd2 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -647,6 +647,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) ret = stage2_page_fault(vcpu, run, trap); break; + case EXC_SUPERVISOR_SYSCALL: + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) + ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run); + break; default: break; }; diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c new file mode 100644 index 000000000000..9d1d25cf217f --- /dev/null +++ b/arch/riscv/kvm/vcpu_sbi.c @@ -0,0 +1,173 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * Copyright (c) 2019 Western Digital Corporation or its affiliates. + * + * Authors: + * Atish Patra atish.patra@wdc.com + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/kvm_host.h> +#include <asm/csr.h> +#include <asm/sbi.h> +#include <asm/kvm_vcpu_timer.h> + +#define SBI_VERSION_MAJOR 0 +#define SBI_VERSION_MINOR 1 + +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, + struct kvm_run *run, u32 type) +{ + int i; + struct kvm_vcpu *tmp; + + kvm_for_each_vcpu(i, tmp, vcpu->kvm) + tmp->arch.power_off = true; + kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP); + + memset(&run->system_event, 0, sizeof(run->system_event)); + run->system_event.type = type; + run->exit_reason = KVM_EXIT_SYSTEM_EVENT; +} + +static void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, + struct kvm_run *run) +{ + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; + + vcpu->arch.sbi_context.return_handled = 0; + vcpu->stat.ecall_exit_stat++; + run->exit_reason = KVM_EXIT_RISCV_SBI; + run->riscv_sbi.extension_id = cp->a7; + run->riscv_sbi.function_id = cp->a6; + run->riscv_sbi.args[0] = cp->a0; + run->riscv_sbi.args[1] = cp->a1; + run->riscv_sbi.args[2] = cp->a2; + run->riscv_sbi.args[3] = cp->a3; + run->riscv_sbi.args[4] = cp->a4; + run->riscv_sbi.args[5] = cp->a5; + run->riscv_sbi.ret[0] = cp->a0; + run->riscv_sbi.ret[1] = cp->a1; +} + +int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run) +{ + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; + + /* Handle SBI return only once */ + if (vcpu->arch.sbi_context.return_handled) + return 0; + vcpu->arch.sbi_context.return_handled = 1; + + /* Update return values */ + cp->a0 = run->riscv_sbi.ret[0]; + cp->a1 = run->riscv_sbi.ret[1]; + + /* Move to next instruction */ + vcpu->arch.guest_context.sepc += 4; + + return 0; +} + +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run) +{ + ulong hmask; + int i, ret = 1; + u64 next_cycle; + struct kvm_vcpu *rvcpu; + bool next_sepc = true; + struct cpumask cm, hm; + struct kvm *kvm = vcpu->kvm; + struct kvm_cpu_trap utrap = { 0 }; + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; + + if (!cp) + return -EINVAL; + + switch (cp->a7) { + case SBI_EXT_0_1_CONSOLE_GETCHAR: + case SBI_EXT_0_1_CONSOLE_PUTCHAR: + /* + * The CONSOLE_GETCHAR/CONSOLE_PUTCHAR SBI calls cannot be + * handled in kernel so we forward these to user-space + */ + kvm_riscv_vcpu_sbi_forward(vcpu, run); + next_sepc = false; + ret = 0; + break; + case SBI_EXT_0_1_SET_TIMER: +#if __riscv_xlen == 32 + next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0; +#else + next_cycle = (u64)cp->a0; +#endif + kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle); + break; + case SBI_EXT_0_1_CLEAR_IPI: + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_SOFT); + break; + case SBI_EXT_0_1_SEND_IPI: + if (cp->a0) + hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0, + &utrap); + else + hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1; + if (utrap.scause) { + utrap.sepc = cp->sepc; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + next_sepc = false; + break; + } + for_each_set_bit(i, &hmask, BITS_PER_LONG) { + rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i); + kvm_riscv_vcpu_set_interrupt(rvcpu, IRQ_VS_SOFT); + } + break; + case SBI_EXT_0_1_SHUTDOWN: + kvm_sbi_system_shutdown(vcpu, run, KVM_SYSTEM_EVENT_SHUTDOWN); + next_sepc = false; + ret = 0; + break; + case SBI_EXT_0_1_REMOTE_FENCE_I: + case SBI_EXT_0_1_REMOTE_SFENCE_VMA: + case SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID: + if (cp->a0) + hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0, + &utrap); + else + hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1; + if (utrap.scause) { + utrap.sepc = cp->sepc; + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); + next_sepc = false; + break; + } + cpumask_clear(&cm); + for_each_set_bit(i, &hmask, BITS_PER_LONG) { + rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i); + if (rvcpu->cpu < 0) + continue; + cpumask_set_cpu(rvcpu->cpu, &cm); + } + riscv_cpuid_to_hartid_mask(&cm, &hm); + if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I) + sbi_remote_fence_i(cpumask_bits(&hm)); + else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA) + sbi_remote_hfence_vvma(cpumask_bits(&hm), + cp->a1, cp->a2); + else + sbi_remote_hfence_vvma_asid(cpumask_bits(&hm), + cp->a1, cp->a2, cp->a3); + break; + default: + /* Return error for unsupported SBI calls */ + cp->a0 = SBI_ERR_NOT_SUPPORTED; + break; + }; + + if (next_sepc) + cp->sepc += 4; + + return ret; +} diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f0a16b4adbbd..b6a90dd8f779 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -236,6 +236,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 #define KVM_EXIT_ARM_NISV 28 +#define KVM_EXIT_RISCV_SBI 28
/* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -400,6 +401,13 @@ struct kvm_run { __u64 esr_iss; __u64 fault_ipa; } arm_nisv; + /* KVM_EXIT_RISCV_SBI */ + struct { + unsigned long extension_id; + unsigned long function_id; + unsigned long args[6]; + unsigned long ret[2]; + } riscv_sbi; /* Fix the size of the union. */ char padding[256]; };
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Document RISC-V specific parts of the KVM API, such as: - The interrupt numbers passed to the KVM_INTERRUPT ioctl. - The states supported by the KVM_{GET,SET}_MP_STATE ioctls. - The registers supported by the KVM_{GET,SET}_ONE_REG interface and the encoding of those register ids. - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to userspace tool.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y CC: Jonathan Corbet corbet@lwn.net CC: linux-doc@vger.kernel.org Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- Documentation/virt/kvm/api.txt | 169 +++++++++++++++++++++++++++++++-- 1 file changed, 162 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index ebb37b34dcfc..381a2e6df8c6 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -471,7 +471,7 @@ struct kvm_translation { 4.16 KVM_INTERRUPT
Capability: basic -Architectures: x86, ppc, mips +Architectures: x86, ppc, mips, riscv Type: vcpu ioctl Parameters: struct kvm_interrupt (in) Returns: 0 on success, negative on failure. @@ -531,6 +531,22 @@ interrupt number dequeues the interrupt.
This is an asynchronous vcpu ioctl and can be invoked from any thread.
+RISC-V: + +Queues an external interrupt to be injected into the virutal CPU. This ioctl +is overloaded with 2 different irq values: + +a) KVM_INTERRUPT_SET + + This sets external interrupt for a virtual CPU and it will receive + once it is ready. + +b) KVM_INTERRUPT_UNSET + + This clears pending external interrupt for a virtual CPU. + +This is an asynchronous vcpu ioctl and can be invoked from any thread. +
4.17 KVM_DEBUG_GUEST
@@ -1239,7 +1255,7 @@ for vm-wide capabilities. 4.38 KVM_GET_MP_STATE
Capability: KVM_CAP_MP_STATE -Architectures: x86, s390, arm, arm64 +Architectures: x86, s390, arm, arm64, riscv Type: vcpu ioctl Parameters: struct kvm_mp_state (out) Returns: 0 on success; -1 on error @@ -1253,7 +1269,8 @@ uniprocessor guests).
Possible values are:
- - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64] + - KVM_MP_STATE_RUNNABLE: the vcpu is currently running + [x86,arm/arm64,riscv] - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) which has not yet received an INIT signal [x86] - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is @@ -1262,7 +1279,7 @@ Possible values are: is waiting for an interrupt [x86] - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector accessible via KVM_GET_VCPU_EVENTS) [x86] - - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64] + - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64,riscv] - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390] - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted) [s390] @@ -1273,7 +1290,7 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures.
-For arm/arm64: +For arm/arm64/riscv:
The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. @@ -1281,7 +1298,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. 4.39 KVM_SET_MP_STATE
Capability: KVM_CAP_MP_STATE -Architectures: x86, s390, arm, arm64 +Architectures: x86, s390, arm, arm64, riscv Type: vcpu ioctl Parameters: struct kvm_mp_state (in) Returns: 0 on success; -1 on error @@ -1293,7 +1310,7 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures.
-For arm/arm64: +For arm/arm64/riscv:
The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. @@ -2302,6 +2319,127 @@ following id bit patterns: 0x7020 0000 0003 02 <0:3> reg:5
+RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of +that is the register group type. + +RISC-V config registers are meant for configuring a Guest VCPU and it has +the following id bit patterns: + 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host) + 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host) + +Following are the RISC-V config registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU + +The isa config register can be read anytime but can only be written before +a Guest VCPU runs. It will have ISA feature bits matching underlying host +set by default. + +RISC-V core registers represent the general excution state of a Guest VCPU +and it has the following id bit patterns: + 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host) + 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host) + +Following are the RISC-V core registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x80x0 0000 0200 0000 regs.pc Program counter + 0x80x0 0000 0200 0001 regs.ra Return address + 0x80x0 0000 0200 0002 regs.sp Stack pointer + 0x80x0 0000 0200 0003 regs.gp Global pointer + 0x80x0 0000 0200 0004 regs.tp Task pointer + 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0 + 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1 + 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2 + 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0 + 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1 + 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0 + 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1 + 0x80x0 0000 0200 000c regs.a2 Function argument 2 + 0x80x0 0000 0200 000d regs.a3 Function argument 3 + 0x80x0 0000 0200 000e regs.a4 Function argument 4 + 0x80x0 0000 0200 000f regs.a5 Function argument 5 + 0x80x0 0000 0200 0010 regs.a6 Function argument 6 + 0x80x0 0000 0200 0011 regs.a7 Function argument 7 + 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2 + 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3 + 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4 + 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5 + 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6 + 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7 + 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8 + 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9 + 0x80x0 0000 0200 001a regs.s10 Callee saved register 10 + 0x80x0 0000 0200 001b regs.s11 Callee saved register 11 + 0x80x0 0000 0200 001c regs.t3 Caller saved register 3 + 0x80x0 0000 0200 001d regs.t4 Caller saved register 4 + 0x80x0 0000 0200 001e regs.t5 Caller saved register 5 + 0x80x0 0000 0200 001f regs.t6 Caller saved register 6 + 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode) + +RISC-V csr registers represent the supervisor mode control/status registers +of a Guest VCPU and it has the following id bit patterns: + 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host) + 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host) + +Following are the RISC-V csr registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x80x0 0000 0300 0000 sstatus Supervisor status + 0x80x0 0000 0300 0001 sie Supervisor interrupt enable + 0x80x0 0000 0300 0002 stvec Supervisor trap vector base + 0x80x0 0000 0300 0003 sscratch Supervisor scratch register + 0x80x0 0000 0300 0004 sepc Supervisor exception program counter + 0x80x0 0000 0300 0005 scause Supervisor trap cause + 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction + 0x80x0 0000 0300 0007 sip Supervisor interrupt pending + 0x80x0 0000 0300 0008 satp Supervisor address translation and protection + +RISC-V timer registers represent the timer state of a Guest VCPU and it has +the following id bit patterns: + 0x8030 0000 04 <index into the kvm_riscv_timer struct:24> + +Following are the RISC-V timer registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x8030 0000 0400 0000 frequency Time base frequency (read-only) + 0x8030 0000 0400 0001 time Time value visible to Guest + 0x8030 0000 0400 0002 compare Time compare programmed by Guest + 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF) + +RISC-V F-extension registers represent the single precision floating point +state of a Guest VCPU and it has the following id bit patterns: + 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24> + +Following are the RISC-V F-extension registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x8020 0000 0500 0000 f[0] Floating point register 0 + ... + 0x8020 0000 0500 001f f[31] Floating point register 31 + 0x8020 0000 0500 0020 fcsr Floating point control and status register + +RISC-V D-extension registers represent the double precision floating point +state of a Guest VCPU and it has the following id bit patterns: + 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr) + 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr) + +Following are the RISC-V D-extension registers: + + Encoding Register Description +------------------------------------------------------------------ + 0x8030 0000 0600 0000 f[0] Floating point register 0 + ... + 0x8030 0000 0600 001f f[31] Floating point register 31 + 0x8020 0000 0600 0020 fcsr Floating point control and status register + + 4.69 KVM_GET_ONE_REG
Capability: KVM_CAP_ONE_REG @@ -4542,6 +4680,23 @@ Note that KVM does not skip the faulting instruction as it does for KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state if it decides to decode and emulate the instruction.
+ /* KVM_EXIT_RISCV_SBI */ + struct { + unsigned long extension_id; + unsigned long function_id; + unsigned long args[6]; + unsigned long ret[2]; + } riscv_sbi; +If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has +done a SBI call which is not handled by KVM RISC-V kernel module. The details +of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The +'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the +'function_id' field represents function ID of given SBI extension. The 'args' +array field of 'riscv_sbi' represents parameters for the SBI call and 'ret' +array field represents return values. The userspace should update the return +values of SBI call before resuming the VCPU. For more details on RISC-V SBI +spec refer, https://github.com/riscv/riscv-sbi-doc. + /* Fix the size of the union. */ char padding[256]; };
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Anup Patel anup.patel@wdc.com Acked-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Alexander Graf graf@amazon.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- MAINTAINERS | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index e73a47a881b0..9c4a76b989f0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9099,6 +9099,17 @@ F: arch/powerpc/include/asm/kvm* F: arch/powerpc/kvm/ F: arch/powerpc/kernel/kvm*
+KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv) +M: Anup Patel anup.patel@wdc.com +R: Atish Patra atish.patra@wdc.com +L: kvm@vger.kernel.org +L: kvm-riscv@lists.infradead.org +S: Maintained +T: git git://github.com/kvm-riscv/linux.git +F: arch/riscv/include/asm/kvm* +F: arch/riscv/include/uapi/asm/kvm* +F: arch/riscv/kvm/ + KERNEL VIRTUAL MACHINE for s390 (KVM/s390) M: Christian Borntraeger borntraeger@de.ibm.com M: Janosch Frank frankja@linux.ibm.com
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
enable kvm in riscv deconfig
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/configs/defconfig | 3 +++ arch/riscv/configs/rv32_defconfig | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index 1e48b9c81e77..1b8375b63c38 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -16,6 +16,8 @@ CONFIG_EXPERT=y CONFIG_BPF_SYSCALL=y CONFIG_SOC_SIFIVE=y CONFIG_SMP=y +CONFIG_VIRTUALIZATION=y +CONFIG_KVM=y CONFIG_HOTPLUG_CPU=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y @@ -92,6 +94,7 @@ CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y +CONFIG_HUGETLBFS=y CONFIG_NFS_FS=y CONFIG_NFS_V4=y CONFIG_NFS_V4_1=y diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig index de073a3b9150..dfc35c60229a 100644 --- a/arch/riscv/configs/rv32_defconfig +++ b/arch/riscv/configs/rv32_defconfig @@ -16,6 +16,8 @@ CONFIG_EXPERT=y CONFIG_BPF_SYSCALL=y CONFIG_ARCH_RV32I=y CONFIG_SMP=y +CONFIG_VIRTUALIZATION=y +CONFIG_KVM=y CONFIG_HOTPLUG_CPU=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y @@ -89,6 +91,7 @@ CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y +CONFIG_HUGETLBFS=y CONFIG_NFS_FS=y CONFIG_NFS_V4=y CONFIG_NFS_V4_1=y
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This changes will be included in next revision of KVM RISC-V series.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/kvm/mmu.c | 34 +++++++++++++++++++++++++++++++--- arch/riscv/kvm/vcpu_exit.c | 4 +++- 2 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 705c341e68cb..88bce80ee983 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -200,17 +200,45 @@ static int stage2_set_pte(struct kvm *kvm, u32 level, static int stage2_map_page(struct kvm *kvm, struct kvm_mmu_page_cache *pcache, gpa_t gpa, phys_addr_t hpa, - unsigned long page_size, pgprot_t prot) + unsigned long page_size, + bool page_rdonly, bool page_exec) { int ret; u32 level = 0; pte_t new_pte; + pgprot_t prot;
ret = stage2_page_size_to_level(page_size, &level); if (ret) return ret;
+ /* + * A RISC-V implmentation can choose to either: + * 1) Update 'A' and 'D' PTE bits in hardware + * 2) Generate page fault when 'A' and/or 'D' bits are not set + * PTE so that software can update these bits. + * + * We support both options mentioned above. To achieve this, we + * always set 'A' and 'D' PTE bits at time of creating stage2 + * mapping. To support KVM dirty page logging with both options + * mentioned above, we will write-protect stage2 PTEs to track + * dirty pages. + */ + + if (page_exec) { + if (page_rdonly) + prot = PAGE_READ_EXEC; + else + prot = PAGE_WRITE_EXEC; + } else { + if (page_rdonly) + prot = PAGE_READ; + else + prot = PAGE_WRITE; + } new_pte = pfn_pte(PFN_DOWN(hpa), prot); + new_pte = pte_mkdirty(new_pte); + return stage2_set_pte(kvm, level, pcache, gpa, &new_pte); }
@@ -705,10 +733,10 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, kvm_set_pfn_dirty(hfn); mark_page_dirty(kvm, gfn); ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, - vma_pagesize, PAGE_WRITE_EXEC); + vma_pagesize, false, true); } else { ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, - vma_pagesize, PAGE_READ_EXEC); + vma_pagesize, true, true); }
if (ret) diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index defe92594cd2..3908f7bacdf3 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -443,10 +443,11 @@ unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, register unsigned long tmp asm("t1"); register unsigned long addr asm("t2") = guest_addr; unsigned long flags; - unsigned long old_stvec; + unsigned long old_stvec, old_hstatus;
local_irq_save(flags);
+ old_hstatus = csr_swap(CSR_HSTATUS, vcpu->arch.guest_context.hstatus); old_stvec = csr_swap(CSR_STVEC, (ulong)&__kvm_riscv_unpriv_trap);
if (read_insn) { @@ -518,6 +519,7 @@ unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, }
csr_write(CSR_STVEC, old_stvec); + csr_write(CSR_HSTATUS, old_hstatus);
local_irq_restore(flags);
From: Anup Patel anup.patel@wdc.com
euleros inclusion category: feature bugzilla: NA CVE: NA
This changes will be included in next revision of KVM RISC-V series.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Anup Patel anup.patel@wdc.com Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/include/asm/kvm_host.h | 1 + arch/riscv/kvm/vcpu_exit.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index ae43bd204284..e330512985a7 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -74,6 +74,7 @@ struct kvm_arch {
struct kvm_mmio_decode { unsigned long insn; + int insn_len; int len; int shift; int return_handled; diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index 3908f7bacdf3..f3a3acdc8db0 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -192,7 +192,7 @@ static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, unsigned long fault_addr, unsigned long htinst) { unsigned long insn; - int shift = 0, len = 0; + int shift = 0, len = 0, insn_len = 0; struct kvm_cpu_trap utrap = { 0 }; struct kvm_cpu_context *ct = &vcpu->arch.guest_context;
@@ -203,6 +203,7 @@ static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, * transformed instruction or custom instruction. */ insn = htinst | INSN_16BIT_MASK; + insn_len = (htinst & BIT(1)) ? INSN_LEN(insn) : 2; } else { /* * Bit[0] == 0 implies trapped instruction value is @@ -216,6 +217,7 @@ static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); return 1; } + insn_len = INSN_LEN(insn); }
/* Decode length of MMIO and shift */ @@ -268,6 +270,7 @@ static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
/* Save instruction decode info */ vcpu->arch.mmio_decode.insn = insn; + vcpu->arch.mmio_decode.insn_len = insn_len; vcpu->arch.mmio_decode.shift = shift; vcpu->arch.mmio_decode.len = len; vcpu->arch.mmio_decode.return_handled = 0; @@ -290,7 +293,7 @@ static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run, u32 data32; u64 data64; ulong data; - int len = 0; + int len = 0, insn_len = 0; unsigned long insn; struct kvm_cpu_trap utrap = { 0 }; struct kvm_cpu_context *ct = &vcpu->arch.guest_context; @@ -302,6 +305,7 @@ static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run, * transformed instruction or custom instruction. */ insn = htinst | INSN_16BIT_MASK; + insn_len = (htinst & BIT(1)) ? INSN_LEN(insn) : 2; } else { /* * Bit[0] == 0 implies trapped instruction value is @@ -315,6 +319,7 @@ static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run, kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); return 1; } + insn_len = INSN_LEN(insn); }
data = GET_RS2(insn, &vcpu->arch.guest_context); @@ -356,6 +361,7 @@ static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
/* Save instruction decode info */ vcpu->arch.mmio_decode.insn = insn; + vcpu->arch.mmio_decode.insn_len = insn_len; vcpu->arch.mmio_decode.shift = 0; vcpu->arch.mmio_decode.len = len; vcpu->arch.mmio_decode.return_handled = 0; @@ -617,7 +623,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
done: /* Move to next instruction */ - vcpu->arch.guest_context.sepc += INSN_LEN(insn); + vcpu->arch.guest_context.sepc += vcpu->arch.mmio_decode.insn_len;
return 0; }
From: limingwang limingwang@huawei.com
euleros inclusion category: feature bugzilla: NA CVE: NA
riscv_kvm_v13 developed on linux 5.8-rc4, this tag is v5.5.19, so here are some interfaces that need to be adapted.
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y Signed-off-by: Mingwang Li limingwang@huawei.com Reviewed-by: Yifei Jiang jiangyifei@huawei.com Signed-off-by: Xie XiuQi xiexiuqi@huawei.com --- arch/riscv/kvm/main.c | 4 ++-- arch/riscv/kvm/mmu.c | 15 ++++++------- arch/riscv/kvm/vcpu.c | 49 ++++++++++++++++++++++++++++++++----------- 3 files changed, 47 insertions(+), 21 deletions(-)
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 6f213bcec0e8..9460e291e906 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -20,12 +20,12 @@ long kvm_arch_dev_ioctl(struct file *filp, return -EINVAL; }
-int kvm_arch_check_processor_compat(void *opaque) +int kvm_arch_check_processor_compat(void) { return 0; }
-int kvm_arch_hardware_setup(void *opaque) +int kvm_arch_hardware_setup(void) { return 0; } diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 88bce80ee983..81d15ca5264a 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -433,7 +433,8 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { }
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free) +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, + struct kvm_memory_slot *dont) { }
@@ -459,7 +460,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - struct kvm_memory_slot *old, + const struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { @@ -494,7 +495,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, (stage2_gpa_size >> PAGE_SHIFT)) return -EFAULT;
- mmap_read_lock(current->mm); + down_read(¤t->mm->mmap_sem);
/* * A memory region could potentially cover multiple VMAs, and @@ -560,7 +561,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, spin_unlock(&kvm->mmu_lock);
out: - mmap_read_unlock(current->mm); + up_read(¤t->mm->mmap_sem); return ret; }
@@ -669,12 +670,12 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, !(memslot->flags & KVM_MEM_READONLY)) ? true : false; unsigned long vma_pagesize, mmu_seq;
- mmap_read_lock(current->mm); + down_read(¤t->mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1); if (unlikely(!vma)) { kvm_err("Failed to find VMA for hva 0x%lx\n", hva); - mmap_read_unlock(current->mm); + up_read(¤t->mm->mmap_sem); return -EFAULT; }
@@ -689,7 +690,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, if (vma_pagesize == PMD_SIZE || vma_pagesize == PGDIR_SIZE) gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
- mmap_read_unlock(current->mm); + up_read(¤t->mm->mmap_sem);
if (vma_pagesize != PGDIR_SIZE && vma_pagesize != PMD_SIZE && diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index adb0815951aa..d10905c18cb3 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -20,6 +20,9 @@ #include <asm/csr.h> #include <asm/hwcap.h>
+#define VCPU_STAT(n, x, ...) \ + { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ } + struct kvm_stats_debugfs_item debugfs_entries[] = { VCPU_STAT("halt_successful_poll", halt_successful_poll), VCPU_STAT("halt_attempted_poll", halt_attempted_poll), @@ -148,7 +151,35 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) return 0; }
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) +struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) +{ + int err; + struct kvm_vcpu *vcpu; + + vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + if (!vcpu) { + err = -ENOMEM; + goto out; + } + + err = kvm_vcpu_init(vcpu, kvm, id); + if (err) + goto free_vcpu; + + return vcpu; + +free_vcpu: + kmem_cache_free(kvm_vcpu_cache, vcpu); +out: + return ERR_PTR(err); +} + +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) +{ + return 0; +} + +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct kvm_cpu_context *cntx;
@@ -175,11 +206,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; }
-int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) -{ - return 0; -} - void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) { } @@ -827,13 +853,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) { - struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); + struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) { - rcuwait_wait_event(wait, - (!vcpu->arch.power_off) && (!vcpu->arch.pause), - TASK_INTERRUPTIBLE); + swait_event_interruptible_exclusive(*wq, + ((!vcpu->arch.power_off) && + (!vcpu->arch.pause)));
if (vcpu->arch.power_off || vcpu->arch.pause) { /* @@ -862,11 +888,10 @@ static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu) csr_write(CSR_HVIP, csr->hvip); }
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { int ret; struct kvm_cpu_trap trap; - struct kvm_run *run = vcpu->run;
/* Mark this VCPU ran at least once */ vcpu->arch.ran_atleast_once = true;
For kernel-5.5 branch.
On 2020/8/18 20:13, Xie XiuQi wrote:
This series adds initial KVM RISC-V support. Currently, we are able to boot RISC-V 64bit Linux Guests with multiple VCPUs.
This series can be found in riscv_kvm_v13 branch at: https//github.com/avpatel/linux.git
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y
This is backported by Mingwang Li limingwang@huawei.com.
Alistair Francis (1): Revert "riscv: Use latest system call ABI"
Anup Patel (19): RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Add fragmented config for debug options RISC-V: Enable CPU Hotplug in defconfigs RISC-V: Add hypervisor extension related CSR defines RISC-V: Add initial skeletal KVM support RISC-V: KVM: Implement VCPU create, init and destroy functions RISC-V: KVM: Implement VCPU interrupts and requests handling RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls RISC-V: KVM: Implement VCPU world-switch RISC-V: KVM: Handle MMIO exits for VCPU RISC-V: KVM: Handle WFI exits for VCPU RISC-V: KVM: Implement VMID allocator RISC-V: KVM: Implement stage2 page table programming RISC-V: KVM: Implement MMU notifiers RISC-V: KVM: Document RISC-V specific parts of KVM API. RISC-V: KVM: Add MAINTAINERS entry RISC-V: Enable KVM for RV64 and RV32 RISC-V: KVM: Fix stage2_map_page() and kvm_riscv_vcpu_unpriv_read() RISC-V: KVM: Determine instruction correctly from HTINST CSR
Atish Patra (9): RISC-V: Mark existing SBI as 0.1 SBI. RISC-V: Add basic support for SBI v0.2 RISC-V: Add SBI v0.2 extension definitions RISC-V: Introduce a new config for SBI v0.1 RISC-V: Implement new SBI v0.2 extensions RISC-V: KVM: Add timer functionality RISC-V: KVM: FP lazy save/restore RISC-V: KVM: Implement ONE REG interface for FP registers RISC-V: KVM: Add SBI v0.1 support
limingwang (1): RISC-V: KVM: fix some interfaces problems
Documentation/virt/kvm/api.txt | 169 +++- MAINTAINERS | 11 + arch/riscv/Kconfig | 9 + arch/riscv/Makefile | 2 + arch/riscv/configs/defconfig | 24 +- arch/riscv/configs/extra_debug.config | 21 + arch/riscv/configs/rv32_defconfig | 24 +- arch/riscv/include/asm/csr.h | 90 +- arch/riscv/include/asm/hwcap.h | 22 + arch/riscv/include/asm/kvm_host.h | 274 ++++++ arch/riscv/include/asm/kvm_vcpu_timer.h | 44 + arch/riscv/include/asm/pgtable-bits.h | 1 + arch/riscv/include/asm/sbi.h | 178 ++-- arch/riscv/include/uapi/asm/kvm.h | 127 +++ arch/riscv/include/uapi/asm/unistd.h | 5 +- arch/riscv/kernel/asm-offsets.c | 154 ++++ arch/riscv/kernel/cpufeature.c | 83 +- arch/riscv/kernel/sbi.c | 522 +++++++++++- arch/riscv/kernel/setup.c | 2 + arch/riscv/kvm/Kconfig | 34 + arch/riscv/kvm/Makefile | 14 + arch/riscv/kvm/main.c | 99 +++ arch/riscv/kvm/mmu.c | 804 ++++++++++++++++++ arch/riscv/kvm/tlb.S | 74 ++ arch/riscv/kvm/vcpu.c | 1037 +++++++++++++++++++++++ arch/riscv/kvm/vcpu_exit.c | 678 +++++++++++++++ arch/riscv/kvm/vcpu_sbi.c | 173 ++++ arch/riscv/kvm/vcpu_switch.S | 391 +++++++++ arch/riscv/kvm/vcpu_timer.c | 225 +++++ arch/riscv/kvm/vm.c | 86 ++ arch/riscv/kvm/vmid.c | 120 +++ drivers/clocksource/timer-riscv.c | 8 + include/clocksource/timer-riscv.h | 16 + include/uapi/linux/kvm.h | 8 + 34 files changed, 5402 insertions(+), 127 deletions(-) create mode 100644 arch/riscv/configs/extra_debug.config create mode 100644 arch/riscv/include/asm/kvm_host.h create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h create mode 100644 arch/riscv/include/uapi/asm/kvm.h create mode 100644 arch/riscv/kvm/Kconfig create mode 100644 arch/riscv/kvm/Makefile create mode 100644 arch/riscv/kvm/main.c create mode 100644 arch/riscv/kvm/mmu.c create mode 100644 arch/riscv/kvm/tlb.S create mode 100644 arch/riscv/kvm/vcpu.c create mode 100644 arch/riscv/kvm/vcpu_exit.c create mode 100644 arch/riscv/kvm/vcpu_sbi.c create mode 100644 arch/riscv/kvm/vcpu_switch.S create mode 100644 arch/riscv/kvm/vcpu_timer.c create mode 100644 arch/riscv/kvm/vm.c create mode 100644 arch/riscv/kvm/vmid.c create mode 100644 include/clocksource/timer-riscv.h
Applied to kernel-5.5
On 2020/8/18 20:14, Xie XiuQi wrote:
For kernel-5.5 branch.
On 2020/8/18 20:13, Xie XiuQi wrote:
This series adds initial KVM RISC-V support. Currently, we are able to boot RISC-V 64bit Linux Guests with multiple VCPUs.
This series can be found in riscv_kvm_v13 branch at: https//github.com/avpatel/linux.git
Link: https://gitee.com/openeuler/kernel/issues/I1RR1Y
This is backported by Mingwang Li limingwang@huawei.com.
Alistair Francis (1): Revert "riscv: Use latest system call ABI"
Anup Patel (19): RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Add fragmented config for debug options RISC-V: Enable CPU Hotplug in defconfigs RISC-V: Add hypervisor extension related CSR defines RISC-V: Add initial skeletal KVM support RISC-V: KVM: Implement VCPU create, init and destroy functions RISC-V: KVM: Implement VCPU interrupts and requests handling RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls RISC-V: KVM: Implement VCPU world-switch RISC-V: KVM: Handle MMIO exits for VCPU RISC-V: KVM: Handle WFI exits for VCPU RISC-V: KVM: Implement VMID allocator RISC-V: KVM: Implement stage2 page table programming RISC-V: KVM: Implement MMU notifiers RISC-V: KVM: Document RISC-V specific parts of KVM API. RISC-V: KVM: Add MAINTAINERS entry RISC-V: Enable KVM for RV64 and RV32 RISC-V: KVM: Fix stage2_map_page() and kvm_riscv_vcpu_unpriv_read() RISC-V: KVM: Determine instruction correctly from HTINST CSR
Atish Patra (9): RISC-V: Mark existing SBI as 0.1 SBI. RISC-V: Add basic support for SBI v0.2 RISC-V: Add SBI v0.2 extension definitions RISC-V: Introduce a new config for SBI v0.1 RISC-V: Implement new SBI v0.2 extensions RISC-V: KVM: Add timer functionality RISC-V: KVM: FP lazy save/restore RISC-V: KVM: Implement ONE REG interface for FP registers RISC-V: KVM: Add SBI v0.1 support
limingwang (1): RISC-V: KVM: fix some interfaces problems
Documentation/virt/kvm/api.txt | 169 +++- MAINTAINERS | 11 + arch/riscv/Kconfig | 9 + arch/riscv/Makefile | 2 + arch/riscv/configs/defconfig | 24 +- arch/riscv/configs/extra_debug.config | 21 + arch/riscv/configs/rv32_defconfig | 24 +- arch/riscv/include/asm/csr.h | 90 +- arch/riscv/include/asm/hwcap.h | 22 + arch/riscv/include/asm/kvm_host.h | 274 ++++++ arch/riscv/include/asm/kvm_vcpu_timer.h | 44 + arch/riscv/include/asm/pgtable-bits.h | 1 + arch/riscv/include/asm/sbi.h | 178 ++-- arch/riscv/include/uapi/asm/kvm.h | 127 +++ arch/riscv/include/uapi/asm/unistd.h | 5 +- arch/riscv/kernel/asm-offsets.c | 154 ++++ arch/riscv/kernel/cpufeature.c | 83 +- arch/riscv/kernel/sbi.c | 522 +++++++++++- arch/riscv/kernel/setup.c | 2 + arch/riscv/kvm/Kconfig | 34 + arch/riscv/kvm/Makefile | 14 + arch/riscv/kvm/main.c | 99 +++ arch/riscv/kvm/mmu.c | 804 ++++++++++++++++++ arch/riscv/kvm/tlb.S | 74 ++ arch/riscv/kvm/vcpu.c | 1037 +++++++++++++++++++++++ arch/riscv/kvm/vcpu_exit.c | 678 +++++++++++++++ arch/riscv/kvm/vcpu_sbi.c | 173 ++++ arch/riscv/kvm/vcpu_switch.S | 391 +++++++++ arch/riscv/kvm/vcpu_timer.c | 225 +++++ arch/riscv/kvm/vm.c | 86 ++ arch/riscv/kvm/vmid.c | 120 +++ drivers/clocksource/timer-riscv.c | 8 + include/clocksource/timer-riscv.h | 16 + include/uapi/linux/kvm.h | 8 + 34 files changed, 5402 insertions(+), 127 deletions(-) create mode 100644 arch/riscv/configs/extra_debug.config create mode 100644 arch/riscv/include/asm/kvm_host.h create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h create mode 100644 arch/riscv/include/uapi/asm/kvm.h create mode 100644 arch/riscv/kvm/Kconfig create mode 100644 arch/riscv/kvm/Makefile create mode 100644 arch/riscv/kvm/main.c create mode 100644 arch/riscv/kvm/mmu.c create mode 100644 arch/riscv/kvm/tlb.S create mode 100644 arch/riscv/kvm/vcpu.c create mode 100644 arch/riscv/kvm/vcpu_exit.c create mode 100644 arch/riscv/kvm/vcpu_sbi.c create mode 100644 arch/riscv/kvm/vcpu_switch.S create mode 100644 arch/riscv/kvm/vcpu_timer.c create mode 100644 arch/riscv/kvm/vm.c create mode 100644 arch/riscv/kvm/vmid.c create mode 100644 include/clocksource/timer-riscv.h
Kernel mailing list -- kernel@openeuler.org To unsubscribe send an email to kernel-leave@openeuler.org