v2: 1. full test in this branch. 2. correct implict declare function in arch/arm64/kvm/hyp/include/hyp/switch.h.
v1: backport from mainline
Alexandru Elisei (1): arm64: Do not trap PMSNEVFR_EL1
Marc Zyngier (10): KVM: arm64: Let vcpu_sve_pffr() handle HYP VAs KVM: arm64: Provide KVM's own save/restore SVE primitives KVM: arm64: Use {read,write}_sysreg_el1 to access ZCR_EL1 KVM: arm64: Introduce vcpu_sve_vq() helper arm64: sve: Provide a conditional update accessor for ZCR_ELx KVM: arm64: Map SVE context at EL2 when available KVM: arm64: Rework SVE host-save/guest-restore KVM: arm64: Save guest's ZCR_EL1 before saving the FPSIMD state KVM: arm64: Save/restore SVE state for nVHE KVM: arm64: Always start with clearing SME flag on load
Mark Brown (54): arm64/sve: Remove redundant system_supports_sve() tests arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/sve: Rework SVE access trap to convert state in registers arm64/sve: Split _sve_flush macro into separate Z and predicate flushes arm64/sve: Skip flushing Z registers with 128 bit vectors arm64/sve: Use the sve_flush macros in sve_load_from_fpsimd_state() arm64/sve: Remove sve_load_from_fpsimd_state() arm64/sve: Better handle failure to allocate SVE register storage arm64/fp: Reindent fpsimd_save() arm64/sve: Make access to FFR optional arm64/sve: Rename find_supported_vector_length() arm64/sve: Use accessor functions for vector lengths in thread_struct arm64/sve: Put system wide vector length information into structs arm64/sve: Explicitly load vector length when restoring SVE state arm64/sve: Track vector lengths for tasks in an array arm64/sve: Make sysctl interface for SVE reusable by SME arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Minor clarification of ABI documentation arm64: cpufeature: Always specify and use a field width for capabilities arm64/sme: Provide ABI documentation for SME arm64/sme: System register and exception syndrome definitions arm64/sme: Manually encode SME instructions arm64: Disable fine grained traps on boot arm64/sme: Early CPU setup for SME arm64/sme: Basic enumeration support arm64/sme: Identify supported SME vector lengths at boot arm64/sme: Implement sysctl to set the default vector length arm64/sme: Implement vector length configuration prctl()s arm64/sme: Implement support for TPIDR2 arm64/sme: Implement SVCR context switching arm64/sme: Implement streaming SVE context switching arm64/sme: Implement ZA context switching arm64/sme: Implement traps and syscall handling for SME arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement streaming SVE signal handling arm64/sme: Implement ZA signal handling arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Add ptrace support for ZA arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Provide Kconfig for SME arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: More sensibly define the size for the ZA register set KVM: arm64: Hide SME system registers from guests KVM: arm64: Trap SME usage in guest KVM: arm64: Handle SME host state when running guests arm64/fp: Make SVE and SME length register definition match architecture arm64/fp: Rename SVE and SME LEN field name to _WIDTH arm64/sme: Drop SYS_ from SMIDR_EL1 defines arm64/sme: Standardise bitfield names for SVCR arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h arm64/sme: Fix tests for 0b1111 value ID registers arm64/sme: Fix SVE/SME typo in ABI documentation arm64/sme: Fix EFI save/restore
Wan Jiabing (1): arm64/sme: Fix NULL check after kzalloc
Xiaofei Tan (1): arm64: sve: Provide sve_cond_update_zcr_vq fallback when !ARM64_SVE
Documentation/arm64/elf_hwcaps.rst | 34 + Documentation/arm64/index.rst | 1 + Documentation/arm64/sme.rst | 428 ++++++++++ Documentation/arm64/sve.rst | 72 +- arch/arm64/Kconfig | 11 + arch/arm64/include/asm/cpu.h | 4 + arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 25 + arch/arm64/include/asm/esr.h | 13 +- arch/arm64/include/asm/exception.h | 1 + arch/arm64/include/asm/fpsimd.h | 262 ++++++- arch/arm64/include/asm/fpsimdmacros.h | 108 ++- arch/arm64/include/asm/hwcap.h | 8 + arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_host.h | 12 +- arch/arm64/include/asm/kvm_hyp.h | 2 + arch/arm64/include/asm/processor.h | 78 +- arch/arm64/include/asm/signal_common.h | 19 +- arch/arm64/include/asm/sysreg.h | 78 +- arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/include/uapi/asm/hwcap.h | 8 + arch/arm64/include/uapi/asm/ptrace.h | 69 +- arch/arm64/include/uapi/asm/sigcontext.h | 55 +- arch/arm64/kernel/cpufeature.c | 280 +++++-- arch/arm64/kernel/cpuinfo.c | 13 + arch/arm64/kernel/entry-common.c | 11 + arch/arm64/kernel/entry-fpsimd.S | 72 +- arch/arm64/kernel/fpsimd.c | 944 ++++++++++++++++++----- arch/arm64/kernel/head.S | 87 +++ arch/arm64/kernel/process.c | 45 +- arch/arm64/kernel/ptrace.c | 373 +++++++-- arch/arm64/kernel/signal.c | 175 ++++- arch/arm64/kernel/syscall.c | 29 +- arch/arm64/kernel/traps.c | 1 + arch/arm64/kvm/fpsimd.c | 70 +- arch/arm64/kvm/guest.c | 6 +- arch/arm64/kvm/hyp/fpsimd.S | 12 + arch/arm64/kvm/hyp/include/hyp/switch.h | 75 +- arch/arm64/kvm/hyp/nvhe/switch.c | 38 +- arch/arm64/kvm/hyp/vhe/switch.c | 10 +- arch/arm64/kvm/reset.c | 14 +- arch/arm64/kvm/sys_regs.c | 8 +- include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 9 + kernel/sys.c | 12 + 45 files changed, 3123 insertions(+), 458 deletions(-) create mode 100644 Documentation/arm64/sme.rst
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit ef9c5d09797db874a29a97407c3ea3990210432b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Currently there are a number of places in the SVE code where we check both system_supports_sve() and TIF_SVE. This is a bit redundant given that we should never get into a situation where we have set TIF_SVE without having SVE support and it is not clear that silently ignoring a mistakenly set TIF_SVE flag is the most sensible error handling approach. For now let's just drop the system_supports_sve() checks since this will at least reduce overhead a little.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210412172320.3315-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 5335a6bd1a0d..43c1bd22a2bb 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -285,7 +285,7 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
- if (system_supports_sve() && test_thread_flag(TIF_SVE)) + if (test_thread_flag(TIF_SVE)) sve_load_state(sve_pffr(¤t->thread), ¤t->thread.uw.fpsimd_state.fpsr, sve_vq_from_vl(current->thread.sve_vl) - 1); @@ -307,7 +307,7 @@ static void fpsimd_save(void) WARN_ON(!have_cpu_fpsimd_context());
if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { - if (system_supports_sve() && test_thread_flag(TIF_SVE)) { + if (test_thread_flag(TIF_SVE)) { if (WARN_ON(sve_get_vl() != last->sve_vl)) { /* * Can't save the user regs, so current would @@ -1092,7 +1092,7 @@ void fpsimd_preserve_current_state(void) void fpsimd_signal_preserve_current_state(void) { fpsimd_preserve_current_state(); - if (system_supports_sve() && test_thread_flag(TIF_SVE)) + if (test_thread_flag(TIF_SVE)) sve_to_fpsimd(current); }
@@ -1181,7 +1181,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state) get_cpu_fpsimd_context();
current->thread.uw.fpsimd_state = *state; - if (system_supports_sve() && test_thread_flag(TIF_SVE)) + if (test_thread_flag(TIF_SVE)) fpsimd_to_sve(current);
task_fpsimd_load();
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit 087dfa5ca7d89c3cf6f4e972e279406a5dee5f67 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The FPSIMD code was relying on IS_ENABLED() checks in system_suppors_sve() to cause the compiler to delete references to SVE functions in some places, add explicit IS_ENABLED() checks back.
Fixes: ef9c5d09797d ("arm64/sve: Remove redundant system_supports_sve() tests") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210415121742.36628-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 43c1bd22a2bb..901096d74b7e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -285,7 +285,7 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
- if (test_thread_flag(TIF_SVE)) + if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) sve_load_state(sve_pffr(¤t->thread), ¤t->thread.uw.fpsimd_state.fpsr, sve_vq_from_vl(current->thread.sve_vl) - 1); @@ -307,7 +307,8 @@ static void fpsimd_save(void) WARN_ON(!have_cpu_fpsimd_context());
if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { - if (test_thread_flag(TIF_SVE)) { + if (IS_ENABLED(CONFIG_ARM64_SVE) && + test_thread_flag(TIF_SVE)) { if (WARN_ON(sve_get_vl() != last->sve_vl)) { /* * Can't save the user regs, so current would
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit cccb78ce89c45a4414db712be4986edfb92434bd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When we enable SVE usage in userspace after taking a SVE access trap we need to ensure that the portions of the register state that are not shared with the FPSIMD registers are zeroed. Currently we do this by forcing the FPSIMD registers to be saved to the task struct and converting them there. This is wasteful in the common case where the task state is loaded into the registers and we will immediately return to userspace since we can initialise the SVE state directly in registers instead of accessing multiple copies of the register state in memory.
Instead in that common case do the conversion in the registers and update the task metadata so that we can return to userspace without spilling the register state to memory unless there is some other reason to do so.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210312190313.24598-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/entry-fpsimd.S | 5 +++++ arch/arm64/kernel/fpsimd.c | 26 +++++++++++++++++--------- 3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 34fb33a2a110..7de0b34476f3 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -73,6 +73,7 @@ extern void sve_flush_live(void); extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); +extern void sve_set_vq(unsigned long vq_minus_1);
struct arm64_cpu_capabilities; extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 2ca395c25448..3ecec60d3295 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl) ret SYM_FUNC_END(sve_get_vl)
+SYM_FUNC_START(sve_set_vq) + sve_load_vq x0, x1, x2 + ret +SYM_FUNC_END(sve_set_vq) + /* * Load SVE state from FPSIMD state. * diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 901096d74b7e..9c3edb0899aa 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -927,9 +927,8 @@ void fpsimd_release_task(struct task_struct *dead_task) * Trapped SVE access * * Storage is allocated for the full SVE state, the current FPSIMD - * register contents are migrated across, and TIF_SVE is set so that - * the SVE access trap will be disabled the next time this task - * reaches ret_to_user. + * register contents are migrated across, and the access trap is + * disabled. * * TIF_SVE should be clear on entry: otherwise, fpsimd_restore_current_state() * would have disabled the SVE access trap for userspace during @@ -947,15 +946,24 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
get_cpu_fpsimd_context();
- fpsimd_save(); - - /* Force ret_to_user to reload the registers: */ - fpsimd_flush_task_state(current); - - fpsimd_to_sve(current); if (test_and_set_thread_flag(TIF_SVE)) WARN_ON(1); /* SVE access shouldn't have trapped */
+ /* + * Convert the FPSIMD state to SVE, zeroing all the state that + * is not shared with FPSIMD. If (as is likely) the current + * state is live in the registers then do this there and + * update our metadata for the current task including + * disabling the trap, otherwise update our in-memory copy. + */ + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { + sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1); + sve_flush_live(); + fpsimd_bind_task_to_cpu(); + } else { + fpsimd_to_sve(current); + } + put_cpu_fpsimd_context(); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.14-rc1 commit 483dbf6a35907610597fdc304bd32ecba40cdff0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Trivial refactoring to support further work, no change to generated code.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Dave Martin Dave.Martin@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20210512151131.27877-2-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimdmacros.h | 4 +++- arch/arm64/kernel/entry-fpsimd.S | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index af43367534c7..96a983230bb2 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -211,8 +211,10 @@ mov v\nz().16b, v\nz().16b .endm
-.macro sve_flush +.macro sve_flush_z _for n, 0, 31, _sve_flush_z \n +.endm +.macro sve_flush_p_ffr _for n, 0, 15, _sve_pfalse \n _sve_wrffr 0 .endm diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 3ecec60d3295..7921d58427c2 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -72,7 +72,8 @@ SYM_FUNC_END(sve_load_from_fpsimd_state)
/* Zero all SVE registers but the first 128-bits of each vector */ SYM_FUNC_START(sve_flush_live) - sve_flush + sve_flush_z + sve_flush_p_ffr ret SYM_FUNC_END(sve_flush_live)
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.14-rc1 commit ad4711f962e08eff8d6e9b03f9670b1af6ea9395 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When the SVE vector length is 128 bits then there are no bits in the Z registers which are not shared with the V registers so we can skip them when zeroing state not shared with FPSIMD, this results in a minor performance improvement.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Dave Martin Dave.Martin@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20210512151131.27877-4-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/kernel/entry-fpsimd.S | 12 ++++++++++-- arch/arm64/kernel/fpsimd.c | 6 ++++-- 3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 7de0b34476f3..8dcc31d0db0d 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread) extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); -extern void sve_flush_live(void); +extern void sve_flush_live(unsigned long vq_minus_1); extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 7921d58427c2..9bc201ef5559 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -70,10 +70,18 @@ SYM_FUNC_START(sve_load_from_fpsimd_state) ret SYM_FUNC_END(sve_load_from_fpsimd_state)
-/* Zero all SVE registers but the first 128-bits of each vector */ +/* + * Zero all SVE registers but the first 128-bits of each vector + * + * VQ must already be configured by caller, any further updates of VQ + * will need to ensure that the register state remains valid. + * + * x0 = VQ - 1 + */ SYM_FUNC_START(sve_flush_live) + cbz x0, 1f // A VQ-1 of 0 is 128 bits so no extra Z state sve_flush_z - sve_flush_p_ffr +1: sve_flush_p_ffr ret SYM_FUNC_END(sve_flush_live)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 9c3edb0899aa..3599b9a2f1df 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) * disabling the trap, otherwise update our in-memory copy. */ if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { - sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1); - sve_flush_live(); + unsigned long vq_minus_one = + sve_vq_from_vl(current->thread.sve_vl) - 1; + sve_set_vq(vq_minus_one); + sve_flush_live(vq_minus_one); fpsimd_bind_task_to_cpu(); } else { fpsimd_to_sve(current);
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.14-rc1 commit c9f6890bca111a879a8af1f2390ac49cf05b11df category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
This makes the code a bit clearer and as a result we can also make the indentation more normal, there is no change to the generated code.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Dave Martin Dave.Martin@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20210512151131.27877-3-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/entry-fpsimd.S | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 9bc201ef5559..0a7a64753878 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -63,11 +63,10 @@ SYM_FUNC_END(sve_set_vq) * and the rest zeroed. All the other SVE registers will be zeroed. */ SYM_FUNC_START(sve_load_from_fpsimd_state) - sve_load_vq x1, x2, x3 - fpsimd_restore x0, 8 - _for n, 0, 15, _sve_pfalse \n - _sve_wrffr 0 - ret + sve_load_vq x1, x2, x3 + fpsimd_restore x0, 8 + sve_flush_p_ffr + ret SYM_FUNC_END(sve_load_from_fpsimd_state)
/*
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit b53223e0a4d9fbdba1a1dd1161f7240506666946 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Following optimisations of the SVE register handling we no longer load the SVE state from a saved copy of the FPSIMD registers, we convert directly in registers or from one saved state to another. Remove the function so we don't need to update it during further refactoring.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-3-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 2 -- arch/arm64/kernel/entry-fpsimd.S | 16 ---------------- 2 files changed, 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 8dcc31d0db0d..3ac646014623 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -70,8 +70,6 @@ extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); extern void sve_flush_live(unsigned long vq_minus_1); -extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, - unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); extern void sve_set_vq(unsigned long vq_minus_1);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 0a7a64753878..e40cb29fdfd8 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -53,22 +53,6 @@ SYM_FUNC_START(sve_set_vq) ret SYM_FUNC_END(sve_set_vq)
-/* - * Load SVE state from FPSIMD state. - * - * x0 = pointer to struct fpsimd_state - * x1 = VQ - 1 - * - * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD - * and the rest zeroed. All the other SVE registers will be zeroed. - */ -SYM_FUNC_START(sve_load_from_fpsimd_state) - sve_load_vq x1, x2, x3 - fpsimd_restore x0, 8 - sve_flush_p_ffr - ret -SYM_FUNC_END(sve_load_from_fpsimd_state) - /* * Zero all SVE registers but the first 128-bits of each vector *
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.15-rc1 commit 7559b7d7d651d397debbcd838bd49ec4b9e0a4a4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Currently we "handle" failure to allocate the SVE register storage by doing a BUG_ON() and hoping for the best. This is obviously not great and the memory allocation failure will already be loud enough without the BUG_ON(). As the comment says it is a corner case but let's try to do a bit better, remove the BUG_ON() and add code to handle the failure in the callers.
For the ptrace and signal code we can return -ENOMEM gracefully however we have no real error reporting path available to us for the SVE access trap so instead generate a SIGKILL if the allocation fails there. This at least means that we won't try to soldier on and end up trying to access the nonexistant state and while it's obviously not ideal for userspace SIGKILL doesn't allow any handling so minimises the ABI impact, making it easier to improve the interface later if we come up with a better idea.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210824153417.18371-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 10 ++++------ arch/arm64/kernel/ptrace.c | 5 +++++ arch/arm64/kernel/signal.c | 5 +++++ 3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 3599b9a2f1df..de82f71f63f9 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -518,12 +518,6 @@ void sve_alloc(struct task_struct *task) /* This is a small allocation (maximum ~8KB) and Should Not Fail. */ task->thread.sve_state = kzalloc(sve_state_size(task), GFP_KERNEL); - - /* - * If future SVE revisions can have larger vectors though, - * this may cease to be true: - */ - BUG_ON(!task->thread.sve_state); }
@@ -943,6 +937,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) }
sve_alloc(current); + if (!current->thread.sve_state) { + force_sig(SIGKILL); + return; + }
get_cpu_fpsimd_context();
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 74b925c8d534..03b84ab27519 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -846,6 +846,11 @@ static int sve_set(struct task_struct *target, }
sve_alloc(target); + if (!target->thread.sve_state) { + ret = -ENOMEM; + clear_tsk_thread_flag(target, TIF_SVE); + goto out; + }
/* * Ensure target->thread.sve_state is up to date with target's diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index e5e2f1e888a2..5455816a083b 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -243,6 +243,11 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) /* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sve_alloc(current); + if (!current->thread.sve_state) { + clear_thread_flag(TIF_SVE); + return -ENOMEM; + } + err = __copy_from_user(current->thread.sve_state, (char __user const *)user->sve + SVE_SIG_REGS_OFFSET,
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 985d3a1beab543875e0c857ce263cad8233923bb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The vcpu_sve_pffr() returns a pointer, which can be an interesting thing to do on nVHE. Wrap the pointer with kern_hyp_va(), and take this opportunity to remove the unnecessary casts (sve_state being a void *).
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7e8deca33a02..0259ec092725 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -394,8 +394,8 @@ struct kvm_vcpu_arch { };
/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ -#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \ - sve_ffr_offset((vcpu)->arch.sve_max_vl))) +#define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) + \ + sve_ffr_offset((vcpu)->arch.sve_max_vl))
#define vcpu_sve_state_size(vcpu) ({ \ size_t __size_ret; \
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 297b8603e356ad82c1345cc75fad4d89310a3c34 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
as we are about to change the way KVM deals with SVE, provide KVM with its own save/restore SVE primitives.
No functional change intended.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimdmacros.h | 2 ++ arch/arm64/include/asm/kvm_hyp.h | 2 ++ arch/arm64/kvm/hyp/fpsimd.S | 10 ++++++++++ arch/arm64/kvm/hyp/include/hyp/switch.h | 10 +++++----- 4 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index 96a983230bb2..f6a1cab3cea7 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -6,6 +6,8 @@ * Author: Catalin Marinas catalin.marinas@arm.com */
+#include <asm/assembler.h> + .macro fpsimd_save state, tmpnr stp q0, q1, [\state, #16 * 0] stp q2, q3, [\state, #16 * 2] diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 123e67cb8505..09942966707f 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -89,6 +89,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); +void __sve_save_state(void *sve_pffr, u32 *fpsr); +void __sve_restore_state(void *sve_pffr, u32 *fpsr, unsigned int vqminus1);
#ifndef __KVM_NVHE_HYPERVISOR__ void activate_traps_vhe_load(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S index 01f114aa47b0..95b22e10996c 100644 --- a/arch/arm64/kvm/hyp/fpsimd.S +++ b/arch/arm64/kvm/hyp/fpsimd.S @@ -19,3 +19,13 @@ SYM_FUNC_START(__fpsimd_restore_state) fpsimd_restore x0, 1 ret SYM_FUNC_END(__fpsimd_restore_state) + +SYM_FUNC_START(__sve_restore_state) + sve_load 0, x1, x2, 3, x4 + ret +SYM_FUNC_END(__sve_restore_state) + +SYM_FUNC_START(__sve_save_state) + sve_save 0, x1, 2 + ret +SYM_FUNC_END(__sve_save_state) diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 5d1973b58e3c..d1d2ad0fb89f 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -255,8 +255,8 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) vcpu->arch.host_fpsimd_state, struct thread_struct, uw.fpsimd_state);
- sve_save_state(sve_pffr(thread), - &vcpu->arch.host_fpsimd_state->fpsr); + __sve_save_state(sve_pffr(thread), + &vcpu->arch.host_fpsimd_state->fpsr); } else { __fpsimd_save_state(vcpu->arch.host_fpsimd_state); } @@ -265,9 +265,9 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) }
if (sve_guest) { - sve_load_state(vcpu_sve_pffr(vcpu), - &vcpu->arch.ctxt.fp_regs.fpsr, - sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1); + __sve_restore_state(vcpu_sve_pffr(vcpu), + &vcpu->arch.ctxt.fp_regs.fpsr, + sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1); write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12); } else { __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 83857371d4cbeff8551fa770e045be9c6b04715c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Switch to the unified EL1 accessors for ZCR_EL1, which will make things easier for nVHE support.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/fpsimd.c | 3 ++- arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 3e081d556e81..b7e36a506d3d 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -11,6 +11,7 @@ #include <linux/kvm_host.h> #include <asm/fpsimd.h> #include <asm/kvm_asm.h> +#include <asm/kvm_hyp.h> #include <asm/kvm_mmu.h> #include <asm/sysreg.h>
@@ -112,7 +113,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) fpsimd_save_and_flush_cpu_state();
if (guest_has_sve) - __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12); + __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR); } else if (host_has_sve) { /* * The FPSIMD/SVE state in the CPU has not been touched, and we diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index d1d2ad0fb89f..a08f9bbd2ea8 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -268,7 +268,7 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) __sve_restore_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1); - write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12); + write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR); } else { __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs); }
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 468f3477ef8bda1beeb91dd7f423c9bc248ac39d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The KVM code contains a number of "sve_vq_from_vl(vcpu->arch.sve_max_vl)" instances, and we are about to add more.
Introduce vcpu_sve_vq() as a shorthand for this expression.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Conflicts: correct undef function in switch.h Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 4 +++- arch/arm64/kvm/guest.c | 6 +++--- arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 0259ec092725..299137135c91 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -397,6 +397,8 @@ struct kvm_vcpu_arch { #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) + \ sve_ffr_offset((vcpu)->arch.sve_max_vl))
+#define vcpu_sve_max_vq(vcpu) sve_vq_from_vl((vcpu)->arch.sve_max_vl) + #define vcpu_sve_state_size(vcpu) ({ \ size_t __size_ret; \ unsigned int __vcpu_vq; \ @@ -404,7 +406,7 @@ struct kvm_vcpu_arch { if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \ __size_ret = 0; \ } else { \ - __vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl); \ + __vcpu_vq = vcpu_sve_max_vq(vcpu); \ __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \ } \ \ diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 9f678decc8e5..9e83560bc152 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -311,7 +311,7 @@ static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
memset(vqs, 0, sizeof(vqs));
- max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); + max_vq = vcpu_sve_max_vq(vcpu); for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) if (sve_vq_available(vq)) vqs[vq_word(vq)] |= vq_mask(vq); @@ -439,7 +439,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region, if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) return -ENOENT;
- vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); + vq = vcpu_sve_max_vq(vcpu);
reqoffset = SVE_SIG_ZREG_OFFSET(vq, reg_num) - SVE_SIG_REGS_OFFSET; @@ -449,7 +449,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region, if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) return -ENOENT;
- vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); + vq = vcpu_sve_max_vq(vcpu);
reqoffset = SVE_SIG_PREG_OFFSET(vq, reg_num) - SVE_SIG_REGS_OFFSET; diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index a08f9bbd2ea8..2e31320ec410 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -267,7 +267,7 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) if (sve_guest) { __sve_restore_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, - sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1); + vcpu_sve_max_vq(vcpu) - 1); write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR); } else { __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 71ce1ae56e4d43a0c568e2d4bfb154cd15306a82 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
A common pattern is to conditionally update ZCR_ELx in order to avoid the "self-synchronizing" effect that writing to this register has.
Let's provide an accessor that does exactly this.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 3ac646014623..201ba1a44738 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -129,6 +129,15 @@ static inline void sve_user_enable(void) sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_ZEN_EL0EN); }
+#define sve_cond_update_zcr_vq(val, reg) \ + do { \ + u64 __zcr = read_sysreg_s((reg)); \ + u64 __new = __zcr & ~ZCR_ELx_LEN_MASK; \ + __new |= (val) & ZCR_ELx_LEN_MASK; \ + if (__zcr != __new) \ + write_sysreg_s(__new, (reg)); \ + } while (0) + /* * Probing and setup functions. * Calls to these functions must be serialised with one another.
From: Xiaofei Tan tanxiaofei@huawei.com
mainline inclusion from mainline-v5.13-rc1~76^2 commit a9f8696d4be5228de9d1d4f0e9f027b64d77dab6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Compilation fails when KVM is selected and ARM64_SVE isn't.
The root cause is that sve_cond_update_zcr_vq is not defined when ARM64_SVE is not selected. Fix it by adding an empty definition when CONFIG_ARM64_SVE=n.
Signed-off-by: Xiaofei Tan tanxiaofei@huawei.com [maz: simplified commit message, fleshed out dummy #define] Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/1617183879-48748-1-git-send-email-tanxiaofei@huawe... Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 201ba1a44738..5cc9eaf1d52d 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -167,6 +167,8 @@ static inline int sve_get_current_vl(void) static inline void sve_user_disable(void) { BUILD_BUG(); } static inline void sve_user_enable(void) { BUILD_BUG(); }
+#define sve_cond_update_zcr_vq(val, reg) do { } while (0) + static inline void sve_init_vq_map(void) { } static inline void sve_update_vq_map(void) { } static inline int sve_verify_vq_map(void) { return 0; }
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 0a9a98fda3a24b0775ace4be096290b221f2f6a5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When running on nVHE, and that the vcpu supports SVE, map the SVE state at EL2 so that KVM can access it.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/fpsimd.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index b7e36a506d3d..3c37a419fa82 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -43,6 +43,17 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu) if (ret) goto error;
+ if (vcpu->arch.sve_state) { + void *sve_end; + + sve_end = vcpu->arch.sve_state + vcpu_sve_state_size(vcpu); + + ret = create_hyp_mappings(vcpu->arch.sve_state, sve_end, + PAGE_HYP); + if (ret) + goto error; + } + vcpu->arch.host_thread_info = kern_hyp_va(ti); vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd); error:
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 52029198c1cec1e21513d74f87363a0408f28650 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
In order to keep the code readable, move the host-save/guest-restore sequences in their own functions, with the following changes: - the hypervisor ZCR is now set from C code - ZCR_EL2 is always used as the EL2 accessor
This results in some minor assembler macro rework. No functional change intended.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimdmacros.h | 8 +++-- arch/arm64/include/asm/kvm_hyp.h | 2 +- arch/arm64/kvm/hyp/fpsimd.S | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 40 +++++++++++++++---------- 4 files changed, 32 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index f6a1cab3cea7..059204477ce6 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -234,8 +234,7 @@ str w\nxtmp, [\xpfpsr, #4] .endm
-.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2 - sve_load_vq \xvqminus1, x\nxtmp, \xtmp2 +.macro __sve_load nxbase, xpfpsr, nxtmp _for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34 _sve_ldr_p 0, \nxbase _sve_wrffr 0 @@ -246,3 +245,8 @@ ldr w\nxtmp, [\xpfpsr, #4] msr fpcr, x\nxtmp .endm + +.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2 + sve_load_vq \xvqminus1, x\nxtmp, \xtmp2 + __sve_load \nxbase, \xpfpsr, \nxtmp +.endm diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 09942966707f..1addcb3476f9 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -90,7 +90,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu); void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); void __sve_save_state(void *sve_pffr, u32 *fpsr); -void __sve_restore_state(void *sve_pffr, u32 *fpsr, unsigned int vqminus1); +void __sve_restore_state(void *sve_pffr, u32 *fpsr);
#ifndef __KVM_NVHE_HYPERVISOR__ void activate_traps_vhe_load(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S index 95b22e10996c..3c635929771a 100644 --- a/arch/arm64/kvm/hyp/fpsimd.S +++ b/arch/arm64/kvm/hyp/fpsimd.S @@ -21,7 +21,7 @@ SYM_FUNC_START(__fpsimd_restore_state) SYM_FUNC_END(__fpsimd_restore_state)
SYM_FUNC_START(__sve_restore_state) - sve_load 0, x1, x2, 3, x4 + __sve_load 0, x1, 2 ret SYM_FUNC_END(__sve_restore_state)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 2e31320ec410..37a115f90372 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -194,6 +194,24 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu) return true; }
+static inline void __hyp_sve_save_host(struct kvm_vcpu *vcpu) +{ + struct thread_struct *thread; + + thread = container_of(vcpu->arch.host_fpsimd_state, struct thread_struct, + uw.fpsimd_state); + + __sve_save_state(sve_pffr(thread), &vcpu->arch.host_fpsimd_state->fpsr); +} + +static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu) +{ + sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); + __sve_restore_state(vcpu_sve_pffr(vcpu), + &vcpu->arch.ctxt.fp_regs.fpsr); + write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR); +} + /* Check for an FPSIMD/SVE trap and handle as appropriate */ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) { @@ -250,28 +268,18 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) * In the SVE case, VHE is assumed: it is enforced by * Kconfig and kvm_arch_init(). */ - if (sve_host) { - struct thread_struct *thread = container_of( - vcpu->arch.host_fpsimd_state, - struct thread_struct, uw.fpsimd_state); - - __sve_save_state(sve_pffr(thread), - &vcpu->arch.host_fpsimd_state->fpsr); - } else { + if (sve_host) + __hyp_sve_save_host(vcpu); + else __fpsimd_save_state(vcpu->arch.host_fpsimd_state); - }
vcpu->arch.flags &= ~KVM_ARM64_FP_HOST; }
- if (sve_guest) { - __sve_restore_state(vcpu_sve_pffr(vcpu), - &vcpu->arch.ctxt.fp_regs.fpsr, - vcpu_sve_max_vq(vcpu) - 1); - write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR); - } else { + if (sve_guest) + __hyp_sve_restore_guest(vcpu); + else __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs); - }
/* Skip restoring fpexc32 for AArch64 guests */ if (!(read_sysreg(hcr_el2) & HCR_RW))
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit 2d481bd3b6361ed16c3ddeb58537f149623d30a0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Currently all the active code in fpsimd_save() is inside a check for TIF_FOREIGN_FPSTATE. Reduce the indentation level by changing to return from the function if TIF_FOREIGN_FPSTATE is set.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-2-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index de82f71f63f9..8ad742939859 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -306,24 +306,26 @@ static void fpsimd_save(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
- if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { - if (IS_ENABLED(CONFIG_ARM64_SVE) && - test_thread_flag(TIF_SVE)) { - if (WARN_ON(sve_get_vl() != last->sve_vl)) { - /* - * Can't save the user regs, so current would - * re-enter user with corrupt state. - * There's no way to recover, so kill it: - */ - force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); - return; - } - - sve_save_state((char *)last->sve_state + - sve_ffr_offset(last->sve_vl), - &last->st->fpsr); - } else - fpsimd_save_state(last->st); + if (test_thread_flag(TIF_FOREIGN_FPSTATE)) + return; + + if (IS_ENABLED(CONFIG_ARM64_SVE) && + test_thread_flag(TIF_SVE)) { + if (WARN_ON(sve_get_vl() != last->sve_vl)) { + /* + * Can't save the user regs, so current would + * re-enter user with corrupt state. + * There's no way to recover, so kill it: + */ + force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); + return; + } + + sve_save_state((char *)last->sve_state + + sve_ffr_offset(last->sve_vl), + &last->st->fpsr); + } else { + fpsimd_save_state(last->st); } }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit 9f5848665788a0f07bc175cb2cdd06d367b7556e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
SME introduces streaming SVE mode in which FFR is not present and the instructions for accessing it UNDEF. In preparation for handling this update the low level SVE state access functions to take a flag specifying if FFR should be handled. When saving the register state we store a zero for FFR to guard against uninitialized data being read. No behaviour change should be introduced by this patch.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-5-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 6 +++--- arch/arm64/include/asm/fpsimdmacros.h | 20 ++++++++++++++------ arch/arm64/kernel/entry-fpsimd.S | 15 +++++++++------ arch/arm64/kernel/fpsimd.c | 10 ++++++---- arch/arm64/kvm/hyp/fpsimd.S | 6 ++++-- 5 files changed, 36 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 5cc9eaf1d52d..ed2def998377 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -66,10 +66,10 @@ static inline void *sve_pffr(struct thread_struct *thread) return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl); }
-extern void sve_save_state(void *state, u32 *pfpsr); +extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); extern void sve_load_state(void const *state, u32 const *pfpsr, - unsigned long vq_minus_1); -extern void sve_flush_live(unsigned long vq_minus_1); + int restore_ffr, unsigned long vq_minus_1); +extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); extern void sve_set_vq(unsigned long vq_minus_1);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index 059204477ce6..dc0f0b4df881 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -216,28 +216,36 @@ .macro sve_flush_z _for n, 0, 31, _sve_flush_z \n .endm -.macro sve_flush_p_ffr +.macro sve_flush_p _for n, 0, 15, _sve_pfalse \n +.endm +.macro sve_flush_ffr _sve_wrffr 0 .endm
-.macro sve_save nxbase, xpfpsr, nxtmp +.macro sve_save nxbase, xpfpsr, save_ffr, nxtmp _for n, 0, 31, _sve_str_v \n, \nxbase, \n - 34 _for n, 0, 15, _sve_str_p \n, \nxbase, \n - 16 + cbz \save_ffr, 921f _sve_rdffr 0 _sve_str_p 0, \nxbase _sve_ldr_p 0, \nxbase, -16 - + b 922f +921: + str xzr, [x\nxbase] // Zero out FFR +922: mrs x\nxtmp, fpsr str w\nxtmp, [\xpfpsr] mrs x\nxtmp, fpcr str w\nxtmp, [\xpfpsr, #4] .endm
-.macro __sve_load nxbase, xpfpsr, nxtmp +.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp _for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34 + cbz \restore_ffr, 921f _sve_ldr_p 0, \nxbase _sve_wrffr 0 +921: _for n, 0, 15, _sve_ldr_p \n, \nxbase, \n - 16
ldr w\nxtmp, [\xpfpsr] @@ -246,7 +254,7 @@ msr fpcr, x\nxtmp .endm
-.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2 +.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2 sve_load_vq \xvqminus1, x\nxtmp, \xtmp2 - __sve_load \nxbase, \xpfpsr, \nxtmp + __sve_load \nxbase, \xpfpsr, \restore_ffr, \nxtmp .endm diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index e40cb29fdfd8..5060d91b73e2 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -34,12 +34,12 @@ SYM_FUNC_END(fpsimd_load_state) #ifdef CONFIG_ARM64_SVE
SYM_FUNC_START(sve_save_state) - sve_save 0, x1, 2 + sve_save 0, x1, x2, 3 ret SYM_FUNC_END(sve_save_state)
SYM_FUNC_START(sve_load_state) - sve_load 0, x1, x2, 3, x4 + sve_load 0, x1, x2, x3, 4, x5 ret SYM_FUNC_END(sve_load_state)
@@ -59,13 +59,16 @@ SYM_FUNC_END(sve_set_vq) * VQ must already be configured by caller, any further updates of VQ * will need to ensure that the register state remains valid. * - * x0 = VQ - 1 + * x0 = include FFR? + * x1 = VQ - 1 */ SYM_FUNC_START(sve_flush_live) - cbz x0, 1f // A VQ-1 of 0 is 128 bits so no extra Z state + cbz x1, 1f // A VQ-1 of 0 is 128 bits so no extra Z state sve_flush_z -1: sve_flush_p_ffr - ret +1: sve_flush_p + tbz x0, #0, 2f + sve_flush_ffr +2: ret SYM_FUNC_END(sve_flush_live)
#endif /* CONFIG_ARM64_SVE */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 8ad742939859..a36552149323 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -287,7 +287,7 @@ static void task_fpsimd_load(void)
if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) sve_load_state(sve_pffr(¤t->thread), - ¤t->thread.uw.fpsimd_state.fpsr, + ¤t->thread.uw.fpsimd_state.fpsr, true, sve_vq_from_vl(current->thread.sve_vl) - 1); else fpsimd_load_state(¤t->thread.uw.fpsimd_state); @@ -323,7 +323,7 @@ static void fpsimd_save(void)
sve_save_state((char *)last->sve_state + sve_ffr_offset(last->sve_vl), - &last->st->fpsr); + &last->st->fpsr, true); } else { fpsimd_save_state(last->st); } @@ -960,7 +960,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) unsigned long vq_minus_one = sve_vq_from_vl(current->thread.sve_vl) - 1; sve_set_vq(vq_minus_one); - sve_flush_live(vq_minus_one); + sve_flush_live(true, vq_minus_one); fpsimd_bind_task_to_cpu(); } else { fpsimd_to_sve(current); @@ -1354,7 +1354,8 @@ void __efi_fpsimd_begin(void) __this_cpu_write(efi_sve_state_used, true);
sve_save_state(sve_state + sve_ffr_offset(sve_max_vl), - &this_cpu_ptr(&efi_fpsimd_state)->fpsr); + &this_cpu_ptr(&efi_fpsimd_state)->fpsr, + true); } else { fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state)); } @@ -1380,6 +1381,7 @@ void __efi_fpsimd_end(void)
sve_load_state(sve_state + sve_ffr_offset(sve_max_vl), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, + true, sve_vq_from_vl(sve_get_vl()) - 1);
__this_cpu_write(efi_sve_state_used, false); diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S index 3c635929771a..1bb3b04b84e6 100644 --- a/arch/arm64/kvm/hyp/fpsimd.S +++ b/arch/arm64/kvm/hyp/fpsimd.S @@ -21,11 +21,13 @@ SYM_FUNC_START(__fpsimd_restore_state) SYM_FUNC_END(__fpsimd_restore_state)
SYM_FUNC_START(__sve_restore_state) - __sve_load 0, x1, 2 + mov x2, #1 + __sve_load 0, x1, x2, 3 ret SYM_FUNC_END(__sve_restore_state)
SYM_FUNC_START(__sve_save_state) - sve_save 0, x1, 2 + mov x2, #1 + sve_save 0, x1, x2, 3 ret SYM_FUNC_END(__sve_save_state)
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit b145a8437aab2799969f6ad8e384b557872333c2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Make sure the guest's ZCR_EL1 is saved before we save/flush the state. This will be useful in later patches.
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/fpsimd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 3c37a419fa82..14ea05c5134a 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -121,10 +121,10 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) local_irq_save(flags);
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { - fpsimd_save_and_flush_cpu_state(); - if (guest_has_sve) __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR); + + fpsimd_save_and_flush_cpu_state(); } else if (host_has_sve) { /* * The FPSIMD/SVE state in the CPU has not been touched, and we
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.13-rc1~76^2 commit 8c8010d69c1322734a272eb95dbbf42b5190e565 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Implement the SVE save/restore for nVHE, following a similar logic to that of the VHE implementation:
- the SVE state is switched on trap from EL1 to EL2
- no further changes to ZCR_EL2 occur as long as the guest isn't preempted or exit to userspace
- ZCR_EL2 is reset to its default value on the first SVE access from the host EL1, and ZCR_EL1 restored to the default guest value in vcpu_put()
Acked-by: Will Deacon will@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/fpsimd.c | 10 +++++-- arch/arm64/kvm/hyp/include/hyp/switch.h | 35 +++++++++---------------- arch/arm64/kvm/hyp/nvhe/switch.c | 4 +-- 3 files changed, 23 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 14ea05c5134a..5621020b28de 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -121,11 +121,17 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) local_irq_save(flags);
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { - if (guest_has_sve) + if (guest_has_sve) { __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
+ /* Restore the VL that was saved when bound to the CPU */ + if (!has_vhe()) + sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, + SYS_ZCR_EL1); + } + fpsimd_save_and_flush_cpu_state(); - } else if (host_has_sve) { + } else if (has_vhe() && host_has_sve) { /* * The FPSIMD/SVE state in the CPU has not been touched, and we * have SVE (and VHE): CPACR_EL1 (alias CPTR_EL2) has been diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 37a115f90372..9b173f5102d3 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -215,25 +215,19 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu) /* Check for an FPSIMD/SVE trap and handle as appropriate */ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) { - bool vhe, sve_guest, sve_host; + bool sve_guest, sve_host; u8 esr_ec; + u64 reg;
if (!system_supports_fpsimd()) return false;
- /* - * Currently system_supports_sve() currently implies has_vhe(), - * so the check is redundant. However, has_vhe() can be determined - * statically and helps the compiler remove dead code. - */ - if (has_vhe() && system_supports_sve()) { + if (system_supports_sve()) { sve_guest = vcpu_has_sve(vcpu); sve_host = vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE; - vhe = true; } else { sve_guest = false; sve_host = false; - vhe = has_vhe(); }
esr_ec = kvm_vcpu_trap_get_class(vcpu); @@ -243,31 +237,28 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
vcpu->stat.fp_asimd_exit_stat++; /* Don't handle SVE traps for non-SVE vcpus here: */ - if (!sve_guest) - if (esr_ec != ESR_ELx_EC_FP_ASIMD) - return false; + if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD) + return false;
/* Valid trap. Switch the context: */
- if (vhe) { - u64 reg = read_sysreg(cpacr_el1) | CPACR_EL1_FPEN; - + if (has_vhe()) { + reg = CPACR_EL1_FPEN; if (sve_guest) reg |= CPACR_EL1_ZEN;
- write_sysreg(reg, cpacr_el1); + sysreg_clear_set(cpacr_el1, 0, reg); } else { - write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP, - cptr_el2); + reg = CPTR_EL2_TFP; + if (sve_guest) + reg |= CPTR_EL2_TZ; + + sysreg_clear_set(cptr_el2, reg, 0); }
isb();
if (vcpu->arch.flags & KVM_ARM64_FP_HOST) { - /* - * In the SVE case, VHE is assumed: it is enforced by - * Kconfig and kvm_arch_init(). - */ if (sve_host) __hyp_sve_save_host(vcpu); else diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 6624596846d3..c6d7451a6daf 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -40,9 +40,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu) __activate_traps_common(vcpu);
val = CPTR_EL2_DEFAULT; - val |= CPTR_EL2_TTA | CPTR_EL2_TZ | CPTR_EL2_TAM; + val |= CPTR_EL2_TTA | CPTR_EL2_TAM; if (!update_fp_enabled(vcpu)) { - val |= CPTR_EL2_TFP; + val |= CPTR_EL2_TFP | CPTR_EL2_TZ; __activate_traps_fpsimd32(vcpu); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit 059613f546b67423a5b49cb6e6fa8b72fbaa4e0b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The function has SVE specific checks in it and it will be more trouble to add conditional code for SME than it is to simply rename it to be SVE specific.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-6-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index a36552149323..c356aaa28d70 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -335,7 +335,7 @@ static void fpsimd_save(void) * If things go wrong there's a bug somewhere, but try to fall back to a * safe choice. */ -static unsigned int find_supported_vector_length(unsigned int vl) +static unsigned int find_supported_sve_vector_length(unsigned int vl) { int bit; int max_vl = sve_max_vl; @@ -377,7 +377,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write, if (!sve_vl_valid(vl)) return -EINVAL;
- set_sve_default_vl(find_supported_vector_length(vl)); + set_sve_default_vl(find_supported_sve_vector_length(vl)); return 0; }
@@ -596,7 +596,7 @@ int sve_set_vector_length(struct task_struct *task, if (vl > SVE_VL_ARCH_MAX) vl = SVE_VL_ARCH_MAX;
- vl = find_supported_vector_length(vl); + vl = find_supported_sve_vector_length(vl);
if (flags & (PR_SVE_VL_INHERIT | PR_SVE_SET_VL_ONEXEC)) @@ -871,14 +871,14 @@ void __init sve_setup(void) * Sanity-check that the max VL we determined through CPU features * corresponds properly to sve_vq_map. If not, do our best: */ - if (WARN_ON(sve_max_vl != find_supported_vector_length(sve_max_vl))) - sve_max_vl = find_supported_vector_length(sve_max_vl); + if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl))) + sve_max_vl = find_supported_sve_vector_length(sve_max_vl);
/* * For the default VL, pick the maximum supported value <= 64. * VL == 64 is guaranteed not to grow the signal frame. */ - set_sve_default_vl(find_supported_vector_length(64)); + set_sve_default_vl(find_supported_sve_vector_length(64));
bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX); @@ -1064,7 +1064,7 @@ void fpsimd_flush_thread(void) if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN;
- supported_vl = find_supported_vector_length(vl); + supported_vl = find_supported_sve_vector_length(vl); if (WARN_ON(supported_vl != vl)) vl = supported_vl;
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit 0423eedcf4e1ba49f262a9e925ad9ab8ad8eaa36 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
In a system with SME there are parallel vector length controls for SVE and SME vectors which function in much the same way so it is desirable to share the code for handling them as much as possible. In order to prepare for doing this add a layer of accessor functions for the various VL related operations on tasks.
Since almost all current interactions are actually via task->thread rather than directly with the thread_info the accessors use that. Accessors are provided for both generic and SVE specific usage, the generic accessors should be used for cases where register state is being manipulated since the registers are shared between streaming and regular SVE so we know that when SME support is implemented we will always have to be in the appropriate mode already and hence can generalise now.
Since we are using task_struct and we don't want to cause widespread inclusion of sched.h the acessors are all out of line, it is hoped that none of the uses are in a sufficiently critical path for this to be an issue. Those that are most likely to present an issue are in the same translation unit so hopefully the compiler may be able to inline anyway.
This is purely adding the layer of abstraction, additional work will be needed to support tasks using SME.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-7-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/include/asm/processor.h | 10 ++++++ arch/arm64/kernel/fpsimd.c | 55 +++++++++++++++++++++--------- arch/arm64/kernel/ptrace.c | 4 +-- arch/arm64/kernel/signal.c | 6 ++-- 5 files changed, 54 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index ed2def998377..6f858f7ecaee 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -63,7 +63,7 @@ static inline size_t sve_ffr_offset(int vl)
static inline void *sve_pffr(struct thread_struct *thread) { - return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl); + return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread)); }
extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 617cdd40c61b..c1c59f6fe3f2 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -168,6 +168,16 @@ struct thread_struct { KABI_RESERVE(8) };
+static inline unsigned int thread_get_sve_vl(struct thread_struct *thread) +{ + return thread->sve_vl; +} + +unsigned int task_get_sve_vl(const struct task_struct *task); +void task_set_sve_vl(struct task_struct *task, unsigned long vl); +unsigned int task_get_sve_vl_onexec(const struct task_struct *task); +void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl); + static inline void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size) { diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index c356aaa28d70..31f9652b3a94 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -226,6 +226,26 @@ static void sve_free(struct task_struct *task) __sve_free(task); }
+unsigned int task_get_sve_vl(const struct task_struct *task) +{ + return task->thread.sve_vl; +} + +void task_set_sve_vl(struct task_struct *task, unsigned long vl) +{ + task->thread.sve_vl = vl; +} + +unsigned int task_get_sve_vl_onexec(const struct task_struct *task) +{ + return task->thread.sve_vl_onexec; +} + +void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl) +{ + task->thread.sve_vl_onexec = vl; +} + /* * TIF_SVE controls whether a task can use SVE without trapping while * in userspace, and also the way a task's FPSIMD/SVE state is stored @@ -288,7 +308,7 @@ static void task_fpsimd_load(void) if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) sve_load_state(sve_pffr(¤t->thread), ¤t->thread.uw.fpsimd_state.fpsr, true, - sve_vq_from_vl(current->thread.sve_vl) - 1); + sve_vq_from_vl(task_get_sve_vl(current)) - 1); else fpsimd_load_state(¤t->thread.uw.fpsimd_state); } @@ -456,7 +476,7 @@ static void fpsimd_to_sve(struct task_struct *task) if (!system_supports_sve()) return;
- vq = sve_vq_from_vl(task->thread.sve_vl); + vq = sve_vq_from_vl(task_get_sve_vl(task)); __fpsimd_to_sve(sst, fst, vq); }
@@ -482,7 +502,7 @@ static void sve_to_fpsimd(struct task_struct *task) if (!system_supports_sve()) return;
- vq = sve_vq_from_vl(task->thread.sve_vl); + vq = sve_vq_from_vl(task_get_sve_vl(task)); for (i = 0; i < SVE_NUM_ZREGS; ++i) { p = (__uint128_t const *)ZREG(sst, vq, i); fst->vregs[i] = arm64_le128_to_cpu(*p); @@ -497,7 +517,7 @@ static void sve_to_fpsimd(struct task_struct *task) */ size_t sve_state_size(struct task_struct const *task) { - return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task->thread.sve_vl)); + return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task))); }
/* @@ -572,7 +592,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task) if (!test_tsk_thread_flag(task, TIF_SVE)) return;
- vq = sve_vq_from_vl(task->thread.sve_vl); + vq = sve_vq_from_vl(task_get_sve_vl(task));
memset(sst, 0, SVE_SIG_REGS_SIZE(vq)); __fpsimd_to_sve(sst, fst, vq); @@ -600,16 +620,16 @@ int sve_set_vector_length(struct task_struct *task,
if (flags & (PR_SVE_VL_INHERIT | PR_SVE_SET_VL_ONEXEC)) - task->thread.sve_vl_onexec = vl; + task_set_sve_vl_onexec(task, vl); else /* Reset VL to system default on next exec: */ - task->thread.sve_vl_onexec = 0; + task_set_sve_vl_onexec(task, 0);
/* Only actually set the VL if not deferred: */ if (flags & PR_SVE_SET_VL_ONEXEC) goto out;
- if (vl == task->thread.sve_vl) + if (vl == task_get_sve_vl(task)) goto out;
/* @@ -636,7 +656,7 @@ int sve_set_vector_length(struct task_struct *task, */ sve_free(task);
- task->thread.sve_vl = vl; + task_set_sve_vl(task, vl);
out: update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT, @@ -656,9 +676,9 @@ static int sve_prctl_status(unsigned long flags) int ret;
if (flags & PR_SVE_SET_VL_ONEXEC) - ret = current->thread.sve_vl_onexec; + ret = task_get_sve_vl_onexec(current); else - ret = current->thread.sve_vl; + ret = task_get_sve_vl(current);
if (test_thread_flag(TIF_SVE_VL_INHERIT)) ret |= PR_SVE_VL_INHERIT; @@ -958,7 +978,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) */ if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { unsigned long vq_minus_one = - sve_vq_from_vl(current->thread.sve_vl) - 1; + sve_vq_from_vl(task_get_sve_vl(current)) - 1; sve_set_vq(vq_minus_one); sve_flush_live(true, vq_minus_one); fpsimd_bind_task_to_cpu(); @@ -1058,8 +1078,9 @@ void fpsimd_flush_thread(void) * If a bug causes this to go wrong, we make some noise and * try to fudge thread.sve_vl to a safe value here. */ - vl = current->thread.sve_vl_onexec ? - current->thread.sve_vl_onexec : get_sve_default_vl(); + vl = task_get_sve_vl_onexec(current); + if (!vl) + vl = get_sve_default_vl();
if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN; @@ -1068,14 +1089,14 @@ void fpsimd_flush_thread(void) if (WARN_ON(supported_vl != vl)) vl = supported_vl;
- current->thread.sve_vl = vl; + task_set_sve_vl(current, vl);
/* * If the task is not set to inherit, ensure that the vector * length will be reset by a subsequent exec: */ if (!test_thread_flag(TIF_SVE_VL_INHERIT)) - current->thread.sve_vl_onexec = 0; + task_set_sve_vl_onexec(current, 0); }
put_cpu_fpsimd_context(); @@ -1120,7 +1141,7 @@ void fpsimd_bind_task_to_cpu(void) WARN_ON(!system_supports_fpsimd()); last->st = ¤t->thread.uw.fpsimd_state; last->sve_state = current->thread.sve_state; - last->sve_vl = current->thread.sve_vl; + last->sve_vl = task_get_sve_vl(current); current->thread.fpsimd_cpu = smp_processor_id();
if (system_supports_sve()) { diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 03b84ab27519..39dc37a6784a 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -726,7 +726,7 @@ static void sve_init_header_from_task(struct user_sve_header *header, if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) header->flags |= SVE_PT_VL_INHERIT;
- header->vl = target->thread.sve_vl; + header->vl = task_get_sve_vl(target); vq = sve_vq_from_vl(header->vl);
header->max_vl = sve_max_vl; @@ -821,7 +821,7 @@ static int sve_set(struct task_struct *target, goto out;
/* Actual VL set may be less than the user asked for: */ - vq = sve_vq_from_vl(target->thread.sve_vl); + vq = sve_vq_from_vl(task_get_sve_vl(target));
/* Registers: FPSIMD-only case */
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 5455816a083b..cc1762674ddf 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -180,7 +180,7 @@ int preserve_sve_context(struct sve_context __user *ctx) { int err = 0; u16 reserved[ARRAY_SIZE(ctx->__reserved)]; - unsigned int vl = current->thread.sve_vl; + unsigned int vl = task_get_sve_vl(current); unsigned int vq = 0;
if (test_thread_flag(TIF_SVE)) @@ -219,7 +219,7 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) if (__copy_from_user(&sve, user->sve, sizeof(sve))) return -EFAULT;
- if (sve.vl != current->thread.sve_vl) + if (sve.vl != task_get_sve_vl(current)) return -EINVAL;
if (sve.head.size <= sizeof(*user->sve)) { @@ -474,7 +474,7 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) int vl = sve_max_vl;
if (!add_all) - vl = current->thread.sve_vl; + vl = task_get_sve_vl(current);
vq = sve_vq_from_vl(vl); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit b5bc00ffddc08c20a799514cbcfd2abaa6718014 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
With the introduction of SME we will have a second vector length in the system, enumerated and configured in a very similar fashion to the existing SVE vector length. While there are a few differences in how things are handled this is a relatively small portion of the overall code so in order to avoid code duplication we factor out
We create two structs, one vl_info for the static hardware properties and one vl_config for the runtime configuration, with an array instantiated for each and update all the users to reference these. Some accessor functions are provided where helpful for readability, and the write to set the vector length is put into a function since the system register being updated needs to be chosen at compile time.
This is a mostly mechanical replacement, further work will be required to actually make things generic, ensuring that we handle those places where there are differences properly.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-8-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 101 +++++++++++++++--- arch/arm64/include/asm/processor.h | 5 + arch/arm64/kernel/cpufeature.c | 6 +- arch/arm64/kernel/fpsimd.c | 163 ++++++++++++++++------------- arch/arm64/kernel/ptrace.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kvm/reset.c | 6 +- 7 files changed, 191 insertions(+), 94 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 6f858f7ecaee..c84370e15fed 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -78,10 +78,6 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void);
-extern int __ro_after_init sve_max_vl; -extern int __ro_after_init sve_max_virtualisable_vl; -extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); - /* * Helpers to translate bit indices in sve_vq_map to VQ values (and * vice versa). This allows find_next_bit() to be used to find the @@ -97,11 +93,27 @@ static inline unsigned int __bit_to_vq(unsigned int bit) return SVE_VQ_MAX - bit; }
-/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */ -static inline bool sve_vq_available(unsigned int vq) -{ - return test_bit(__vq_to_bit(vq), sve_vq_map); -} + +struct vl_info { + enum vec_type type; + const char *name; /* For display purposes */ + + /* Minimum supported vector length across all CPUs */ + int min_vl; + + /* Maximum supported vector length across all CPUs */ + int max_vl; + int max_virtualisable_vl; + + /* + * Set of available vector lengths, + * where length vq encoded as bit __vq_to_bit(vq): + */ + DECLARE_BITMAP(vq_map, SVE_VQ_MAX); + + /* Set of vector lengths present on at least one cpu: */ + DECLARE_BITMAP(vq_partial_map, SVE_VQ_MAX); +};
#ifdef CONFIG_ARM64_SVE
@@ -142,11 +154,63 @@ static inline void sve_user_enable(void) * Probing and setup functions. * Calls to these functions must be serialised with one another. */ -extern void __init sve_init_vq_map(void); -extern void sve_update_vq_map(void); -extern int sve_verify_vq_map(void); +enum vec_type; + +extern void __init vec_init_vq_map(enum vec_type type); +extern void vec_update_vq_map(enum vec_type type); +extern int vec_verify_vq_map(enum vec_type type); extern void __init sve_setup(void);
+extern __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX]; + +static inline void write_vl(enum vec_type type, u64 val) +{ + u64 tmp; + + switch (type) { +#ifdef CONFIG_ARM64_SVE + case ARM64_VEC_SVE: + tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK; + write_sysreg_s(tmp | val, SYS_ZCR_EL1); + break; +#endif + default: + WARN_ON_ONCE(1); + break; + } +} + +static inline int vec_max_vl(enum vec_type type) +{ + return vl_info[type].max_vl; +} + +static inline int vec_max_virtualisable_vl(enum vec_type type) +{ + return vl_info[type].max_virtualisable_vl; +} + +static inline int sve_max_vl(void) +{ + return vec_max_vl(ARM64_VEC_SVE); +} + +static inline int sve_max_virtualisable_vl(void) +{ + return vec_max_virtualisable_vl(ARM64_VEC_SVE); +} + +/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */ +static inline bool vq_available(enum vec_type type, unsigned int vq) +{ + return test_bit(__vq_to_bit(vq), vl_info[type].vq_map); +} + +static inline bool sve_vq_available(unsigned int vq) +{ + return vq_available(ARM64_VEC_SVE, vq); +} + #else /* ! CONFIG_ARM64_SVE */
static inline void sve_alloc(struct task_struct *task) { } @@ -164,14 +228,21 @@ static inline int sve_get_current_vl(void) return -EINVAL; }
+static inline int sve_max_vl(void) +{ + return -EINVAL; +} + +static inline bool sve_vq_available(unsigned int vq) { return false; } + static inline void sve_user_disable(void) { BUILD_BUG(); } static inline void sve_user_enable(void) { BUILD_BUG(); }
#define sve_cond_update_zcr_vq(val, reg) do { } while (0)
-static inline void sve_init_vq_map(void) { } -static inline void sve_update_vq_map(void) { } -static inline int sve_verify_vq_map(void) { return 0; } +static inline void vec_init_vq_map(enum vec_type t) { } +static inline void vec_update_vq_map(enum vec_type t) { } +static inline int vec_verify_vq_map(enum vec_type t) { return 0; } static inline void sve_setup(void) { }
#endif /* ! CONFIG_ARM64_SVE */ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index c1c59f6fe3f2..ef4e8142a1b4 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -113,6 +113,11 @@ struct debug_info { #endif };
+enum vec_type { + ARM64_VEC_SVE = 0, + ARM64_VEC_MAX, +}; + struct cpu_context { unsigned long x19; unsigned long x20; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index f5ce1e3a532f..1a21eaf14de6 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -873,7 +873,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) { init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr); - sve_init_vq_map(); + vec_init_vq_map(ARM64_VEC_SVE); }
/* @@ -1098,7 +1098,7 @@ void update_cpu_features(int cpu, /* Probe vector lengths, unless we already gave up on SVE */ if (id_aa64pfr0_sve(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) && !system_capabilities_finalized()) - sve_update_vq_map(); + vec_update_vq_map(ARM64_VEC_SVE); }
/* @@ -2663,7 +2663,7 @@ static void verify_sve_features(void) unsigned int safe_len = safe_zcr & ZCR_ELx_LEN_MASK; unsigned int len = zcr & ZCR_ELx_LEN_MASK;
- if (len < safe_len || sve_verify_vq_map()) { + if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) { pr_crit("CPU%d: SVE: vector length support mismatch\n", smp_processor_id()); cpu_die_early(); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 31f9652b3a94..cb024befaccd 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -121,40 +121,51 @@ struct fpsimd_last_state_struct {
static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
-/* Default VL for tasks that don't set it explicitly: */ -static int __sve_default_vl = -1; +__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = { +#ifdef CONFIG_ARM64_SVE + [ARM64_VEC_SVE] = { + .type = ARM64_VEC_SVE, + .name = "SVE", + .min_vl = SVE_VL_MIN, + .max_vl = SVE_VL_MIN, + .max_virtualisable_vl = SVE_VL_MIN, + }, +#endif +}; + +struct vl_config { + int __default_vl; /* Default VL for tasks */ +}; + +static struct vl_config vl_config[ARM64_VEC_MAX]; + +static int get_default_vl(enum vec_type type) +{ + return READ_ONCE(vl_config[type].__default_vl); +}
static int get_sve_default_vl(void) { - return READ_ONCE(__sve_default_vl); + return get_default_vl(ARM64_VEC_SVE); }
#ifdef CONFIG_ARM64_SVE
-static void set_sve_default_vl(int val) +static void set_default_vl(enum vec_type type, int val) { - WRITE_ONCE(__sve_default_vl, val); + WRITE_ONCE(vl_config[type].__default_vl, val); }
-/* Maximum supported vector length across all CPUs (initially poisoned) */ -int __ro_after_init sve_max_vl = SVE_VL_MIN; -int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN; - -/* - * Set of available vector lengths, - * where length vq encoded as bit __vq_to_bit(vq): - */ -__ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); -/* Set of vector lengths present on at least one cpu: */ -static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX); +static void set_sve_default_vl(int val) +{ + set_default_vl(ARM64_VEC_SVE, val); +}
static void __percpu *efi_sve_state;
#else /* ! CONFIG_ARM64_SVE */
/* Dummy declaration for code that will be optimised out: */ -extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); -extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX); extern void __percpu *efi_sve_state;
#endif /* ! CONFIG_ARM64_SVE */ @@ -355,21 +366,23 @@ static void fpsimd_save(void) * If things go wrong there's a bug somewhere, but try to fall back to a * safe choice. */ -static unsigned int find_supported_sve_vector_length(unsigned int vl) +static unsigned int find_supported_vector_length(enum vec_type type, + unsigned int vl) { + struct vl_info *info = &vl_info[type]; int bit; - int max_vl = sve_max_vl; + int max_vl = info->max_vl;
if (WARN_ON(!sve_vl_valid(vl))) - vl = SVE_VL_MIN; + vl = info->min_vl;
if (WARN_ON(!sve_vl_valid(max_vl))) - max_vl = SVE_VL_MIN; + max_vl = info->min_vl;
if (vl > max_vl) vl = max_vl;
- bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, + bit = find_next_bit(info->vq_map, SVE_VQ_MAX, __vq_to_bit(sve_vq_from_vl(vl))); return sve_vl_from_vq(__bit_to_vq(bit)); } @@ -379,6 +392,7 @@ static unsigned int find_supported_sve_vector_length(unsigned int vl) static int sve_proc_do_default_vl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { + struct vl_info *info = &vl_info[ARM64_VEC_SVE]; int ret; int vl = get_sve_default_vl(); struct ctl_table tmp_table = { @@ -392,12 +406,12 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
/* Writing -1 has the special meaning "set to max": */ if (vl == -1) - vl = sve_max_vl; + vl = info->max_vl;
if (!sve_vl_valid(vl)) return -EINVAL;
- set_sve_default_vl(find_supported_sve_vector_length(vl)); + set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl)); return 0; }
@@ -616,7 +630,7 @@ int sve_set_vector_length(struct task_struct *task, if (vl > SVE_VL_ARCH_MAX) vl = SVE_VL_ARCH_MAX;
- vl = find_supported_sve_vector_length(vl); + vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
if (flags & (PR_SVE_VL_INHERIT | PR_SVE_SET_VL_ONEXEC)) @@ -714,18 +728,15 @@ int sve_get_current_vl(void) return sve_prctl_status(0); }
-static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX)) +static void vec_probe_vqs(struct vl_info *info, + DECLARE_BITMAP(map, SVE_VQ_MAX)) { unsigned int vq, vl; - unsigned long zcr;
bitmap_zero(map, SVE_VQ_MAX);
- zcr = ZCR_ELx_LEN_MASK; - zcr = read_sysreg_s(SYS_ZCR_EL1) & ~zcr; - for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) { - write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */ + write_vl(info->type, vq - 1); /* self-syncing */ vl = sve_get_vl(); vq = sve_vq_from_vl(vl); /* skip intervening lengths */ set_bit(__vq_to_bit(vq), map); @@ -736,10 +747,11 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX)) * Initialise the set of known supported VQs for the boot CPU. * This is called during kernel boot, before secondary CPUs are brought up. */ -void __init sve_init_vq_map(void) +void __init vec_init_vq_map(enum vec_type type) { - sve_probe_vqs(sve_vq_map); - bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX); + struct vl_info *info = &vl_info[type]; + vec_probe_vqs(info, info->vq_map); + bitmap_copy(info->vq_partial_map, info->vq_map, SVE_VQ_MAX); }
/* @@ -747,30 +759,33 @@ void __init sve_init_vq_map(void) * those not supported by the current CPU. * This function is called during the bring-up of early secondary CPUs only. */ -void sve_update_vq_map(void) +void vec_update_vq_map(enum vec_type type) { + struct vl_info *info = &vl_info[type]; DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
- sve_probe_vqs(tmp_map); - bitmap_and(sve_vq_map, sve_vq_map, tmp_map, SVE_VQ_MAX); - bitmap_or(sve_vq_partial_map, sve_vq_partial_map, tmp_map, SVE_VQ_MAX); + vec_probe_vqs(info, tmp_map); + bitmap_and(info->vq_map, info->vq_map, tmp_map, SVE_VQ_MAX); + bitmap_or(info->vq_partial_map, info->vq_partial_map, tmp_map, + SVE_VQ_MAX); }
/* * Check whether the current CPU supports all VQs in the committed set. * This function is called during the bring-up of late secondary CPUs only. */ -int sve_verify_vq_map(void) +int vec_verify_vq_map(enum vec_type type) { + struct vl_info *info = &vl_info[type]; DECLARE_BITMAP(tmp_map, SVE_VQ_MAX); unsigned long b;
- sve_probe_vqs(tmp_map); + vec_probe_vqs(info, tmp_map);
bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX); - if (bitmap_intersects(tmp_map, sve_vq_map, SVE_VQ_MAX)) { - pr_warn("SVE: cpu%d: Required vector length(s) missing\n", - smp_processor_id()); + if (bitmap_intersects(tmp_map, info->vq_map, SVE_VQ_MAX)) { + pr_warn("%s: cpu%d: Required vector length(s) missing\n", + info->name, smp_processor_id()); return -EINVAL; }
@@ -786,7 +801,7 @@ int sve_verify_vq_map(void) /* Recover the set of supported VQs: */ bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX); /* Find VQs supported that are not globally supported: */ - bitmap_andnot(tmp_map, tmp_map, sve_vq_map, SVE_VQ_MAX); + bitmap_andnot(tmp_map, tmp_map, info->vq_map, SVE_VQ_MAX);
/* Find the lowest such VQ, if any: */ b = find_last_bit(tmp_map, SVE_VQ_MAX); @@ -797,9 +812,9 @@ int sve_verify_vq_map(void) * Mismatches above sve_max_virtualisable_vl are fine, since * no guest is allowed to configure ZCR_EL2.LEN to exceed this: */ - if (sve_vl_from_vq(__bit_to_vq(b)) <= sve_max_virtualisable_vl) { - pr_warn("SVE: cpu%d: Unsupported vector length(s) present\n", - smp_processor_id()); + if (sve_vl_from_vq(__bit_to_vq(b)) <= info->max_virtualisable_vl) { + pr_warn("%s: cpu%d: Unsupported vector length(s) present\n", + info->name, smp_processor_id()); return -EINVAL; }
@@ -808,6 +823,8 @@ int sve_verify_vq_map(void)
static void __init sve_efi_setup(void) { + struct vl_info *info = &vl_info[ARM64_VEC_SVE]; + if (!IS_ENABLED(CONFIG_EFI)) return;
@@ -816,11 +833,11 @@ static void __init sve_efi_setup(void) * This is evidence of a crippled system and we are returning void, * so no attempt is made to handle this situation here. */ - if (!sve_vl_valid(sve_max_vl)) + if (!sve_vl_valid(info->max_vl)) goto fail;
efi_sve_state = __alloc_percpu( - SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES); + SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES); if (!efi_sve_state) goto fail;
@@ -869,6 +886,7 @@ u64 read_zcr_features(void)
void __init sve_setup(void) { + struct vl_info *info = &vl_info[ARM64_VEC_SVE]; u64 zcr; DECLARE_BITMAP(tmp_map, SVE_VQ_MAX); unsigned long b; @@ -881,49 +899,52 @@ void __init sve_setup(void) * so sve_vq_map must have at least SVE_VQ_MIN set. * If something went wrong, at least try to patch it up: */ - if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map))) - set_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map); + if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map))) + set_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map);
zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1); - sve_max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1); + info->max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
/* * Sanity-check that the max VL we determined through CPU features * corresponds properly to sve_vq_map. If not, do our best: */ - if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl))) - sve_max_vl = find_supported_sve_vector_length(sve_max_vl); + if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SVE, + info->max_vl))) + info->max_vl = find_supported_vector_length(ARM64_VEC_SVE, + info->max_vl);
/* * For the default VL, pick the maximum supported value <= 64. * VL == 64 is guaranteed not to grow the signal frame. */ - set_sve_default_vl(find_supported_sve_vector_length(64)); + set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, 64));
- bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map, + bitmap_andnot(tmp_map, info->vq_partial_map, info->vq_map, SVE_VQ_MAX);
b = find_last_bit(tmp_map, SVE_VQ_MAX); if (b >= SVE_VQ_MAX) /* No non-virtualisable VLs found */ - sve_max_virtualisable_vl = SVE_VQ_MAX; + info->max_virtualisable_vl = SVE_VQ_MAX; else if (WARN_ON(b == SVE_VQ_MAX - 1)) /* No virtualisable VLs? This is architecturally forbidden. */ - sve_max_virtualisable_vl = SVE_VQ_MIN; + info->max_virtualisable_vl = SVE_VQ_MIN; else /* b + 1 < SVE_VQ_MAX */ - sve_max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1)); + info->max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
- if (sve_max_virtualisable_vl > sve_max_vl) - sve_max_virtualisable_vl = sve_max_vl; + if (info->max_virtualisable_vl > info->max_vl) + info->max_virtualisable_vl = info->max_vl;
- pr_info("SVE: maximum available vector length %u bytes per vector\n", - sve_max_vl); - pr_info("SVE: default vector length %u bytes per vector\n", - get_sve_default_vl()); + pr_info("%s: maximum available vector length %u bytes per vector\n", + info->name, info->max_vl); + pr_info("%s: default vector length %u bytes per vector\n", + info->name, get_sve_default_vl());
/* KVM decides whether to support mismatched systems. Just warn here: */ - if (sve_max_virtualisable_vl < sve_max_vl) - pr_warn("SVE: unvirtualisable vector lengths present\n"); + if (sve_max_virtualisable_vl() < sve_max_vl()) + pr_warn("%s: unvirtualisable vector lengths present\n", + info->name);
sve_efi_setup(); } @@ -1085,7 +1106,7 @@ void fpsimd_flush_thread(void) if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN;
- supported_vl = find_supported_sve_vector_length(vl); + supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl); if (WARN_ON(supported_vl != vl)) vl = supported_vl;
@@ -1374,7 +1395,7 @@ void __efi_fpsimd_begin(void)
__this_cpu_write(efi_sve_state_used, true);
- sve_save_state(sve_state + sve_ffr_offset(sve_max_vl), + sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, true); } else { @@ -1400,7 +1421,7 @@ void __efi_fpsimd_end(void) likely(__this_cpu_read(efi_sve_state_used))) { char const *sve_state = this_cpu_ptr(efi_sve_state);
- sve_load_state(sve_state + sve_ffr_offset(sve_max_vl), + sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, true, sve_vq_from_vl(sve_get_vl()) - 1); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 39dc37a6784a..1bd130a4d488 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -729,7 +729,7 @@ static void sve_init_header_from_task(struct user_sve_header *header, header->vl = task_get_sve_vl(target); vq = sve_vq_from_vl(header->vl);
- header->max_vl = sve_max_vl; + header->max_vl = sve_max_vl(); header->size = SVE_PT_SIZE(vq, header->flags); header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), SVE_PT_REGS_SVE); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index cc1762674ddf..a58a94bc6ad0 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -471,7 +471,7 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) unsigned int vq = 0;
if (add_all || test_thread_flag(TIF_SVE)) { - int vl = sve_max_vl; + int vl = sve_max_vl();
if (!add_all) vl = task_get_sve_vl(current); diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 204c62debf06..032911b34f5a 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -99,7 +99,7 @@ unsigned int kvm_sve_max_vl; int kvm_arm_init_sve(void) { if (system_supports_sve()) { - kvm_sve_max_vl = sve_max_virtualisable_vl; + kvm_sve_max_vl = sve_max_virtualisable_vl();
/* * The get_sve_reg()/set_sve_reg() ioctl interface will need @@ -114,7 +114,7 @@ int kvm_arm_init_sve(void) * Don't even try to make use of vector lengths that * aren't available on all CPUs, for now: */ - if (kvm_sve_max_vl < sve_max_vl) + if (kvm_sve_max_vl < sve_max_vl()) pr_warn("KVM: SVE vector length for guests limited to %u bytes\n", kvm_sve_max_vl); } @@ -159,7 +159,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu) * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and * set_sve_vls(). Double-check here just to be sure: */ - if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl || + if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() || vl > SVE_VL_ARCH_MAX)) return -EIO;
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit ddc806b5c4752d35bdaa4dfa2aaa72785711a3da category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Currently when restoring the SVE state we supply the SVE vector length as an argument to sve_load_state() and the underlying macros. This becomes inconvenient with the addition of SME since we may need to restore any combination of SVE and SME vector lengths, and we already separately restore the vector length in the KVM code. We don't need to know the vector length during the actual register load since the SME load instructions can index into the data array for us.
Refactor the interface so we explicitly set the vector length separately to restoring the SVE registers in preparation for adding SME support, no functional change should be involved.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-9-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/include/asm/fpsimdmacros.h | 7 +------ arch/arm64/kernel/entry-fpsimd.S | 2 +- arch/arm64/kernel/fpsimd.c | 13 +++++++------ arch/arm64/kvm/hyp/fpsimd.S | 2 +- 5 files changed, 11 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index c84370e15fed..785bcf91c021 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -68,7 +68,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); extern void sve_load_state(void const *state, u32 const *pfpsr, - int restore_ffr, unsigned long vq_minus_1); + int restore_ffr); extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); extern void sve_set_vq(unsigned long vq_minus_1); diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index dc0f0b4df881..ab4290653a11 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -240,7 +240,7 @@ str w\nxtmp, [\xpfpsr, #4] .endm
-.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp +.macro sve_load nxbase, xpfpsr, restore_ffr, nxtmp _for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34 cbz \restore_ffr, 921f _sve_ldr_p 0, \nxbase @@ -253,8 +253,3 @@ ldr w\nxtmp, [\xpfpsr, #4] msr fpcr, x\nxtmp .endm - -.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2 - sve_load_vq \xvqminus1, x\nxtmp, \xtmp2 - __sve_load \nxbase, \xpfpsr, \restore_ffr, \nxtmp -.endm diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 5060d91b73e2..a5f9475ce6eb 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -39,7 +39,7 @@ SYM_FUNC_START(sve_save_state) SYM_FUNC_END(sve_save_state)
SYM_FUNC_START(sve_load_state) - sve_load 0, x1, x2, x3, 4, x5 + sve_load 0, x1, x2, 4 ret SYM_FUNC_END(sve_load_state)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index cb024befaccd..d5c582625793 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -316,12 +316,13 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
- if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) + if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) { + sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1); sve_load_state(sve_pffr(¤t->thread), - ¤t->thread.uw.fpsimd_state.fpsr, true, - sve_vq_from_vl(task_get_sve_vl(current)) - 1); - else + ¤t->thread.uw.fpsimd_state.fpsr, true); + } else { fpsimd_load_state(¤t->thread.uw.fpsimd_state); + } }
/* @@ -1421,10 +1422,10 @@ void __efi_fpsimd_end(void) likely(__this_cpu_read(efi_sve_state_used))) { char const *sve_state = this_cpu_ptr(efi_sve_state);
+ sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1); sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, - true, - sve_vq_from_vl(sve_get_vl()) - 1); + true);
__this_cpu_write(efi_sve_state_used, false); } else { diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S index 1bb3b04b84e6..e950875e31ce 100644 --- a/arch/arm64/kvm/hyp/fpsimd.S +++ b/arch/arm64/kvm/hyp/fpsimd.S @@ -22,7 +22,7 @@ SYM_FUNC_END(__fpsimd_restore_state)
SYM_FUNC_START(__sve_restore_state) mov x2, #1 - __sve_load 0, x1, x2, 3 + sve_load 0, x1, x2, 3 ret SYM_FUNC_END(__sve_restore_state)
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.16-rc1 commit 5838a155798479e3fe7e1482a31f0db657d5bbdd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
As for SVE we will track a per task SME vector length for tasks. Convert the existing storage for the vector length into an array and update fpsimd_flush_task() to initialise this in a function.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211019172247.3045838-10-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/processor.h | 44 +++++++++++-- arch/arm64/include/asm/thread_info.h | 2 +- arch/arm64/kernel/fpsimd.c | 97 ++++++++++++++++------------ 3 files changed, 95 insertions(+), 48 deletions(-)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index ef4e8142a1b4..39fe01aaf76d 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -163,8 +163,8 @@ struct thread_struct { u64 sctlr_tcf0; u64 gcr_user_incl; #endif - KABI_RESERVE(1) - KABI_RESERVE(2) + KABI_USE(1, unsigned int vl[ARM64_VEC_MAX]) + KABI_USE(2, unsigned int vl_onexec[ARM64_VEC_MAX]) KABI_RESERVE(3) KABI_RESERVE(4) KABI_RESERVE(5) @@ -173,15 +173,45 @@ struct thread_struct { KABI_RESERVE(8) };
+static inline unsigned int thread_get_vl(struct thread_struct *thread, + enum vec_type type) +{ + return thread->vl[type]; +} + static inline unsigned int thread_get_sve_vl(struct thread_struct *thread) { - return thread->sve_vl; + return thread_get_vl(thread, ARM64_VEC_SVE); +} + +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type); +void task_set_vl(struct task_struct *task, enum vec_type type, + unsigned long vl); +void task_set_vl_onexec(struct task_struct *task, enum vec_type type, + unsigned long vl); +unsigned int task_get_vl_onexec(const struct task_struct *task, + enum vec_type type); + +static inline unsigned int task_get_sve_vl(const struct task_struct *task) +{ + return task_get_vl(task, ARM64_VEC_SVE); }
-unsigned int task_get_sve_vl(const struct task_struct *task); -void task_set_sve_vl(struct task_struct *task, unsigned long vl); -unsigned int task_get_sve_vl_onexec(const struct task_struct *task); -void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl); +static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl) +{ + task_set_vl(task, ARM64_VEC_SVE, vl); +} + +static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task) +{ + return task_get_vl_onexec(task, ARM64_VEC_SVE); +} + +static inline void task_set_sve_vl_onexec(struct task_struct *task, + unsigned long vl) +{ + task_set_vl_onexec(task, ARM64_VEC_SVE, vl); +}
static inline void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index af49b6190aee..be1fe9363968 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -81,7 +81,7 @@ void arch_release_task_struct(struct task_struct *tsk); #define TIF_SINGLESTEP 21 #define TIF_32BIT 22 /* AARCH32 process */ #define TIF_SVE 23 /* Scalable Vector Extension in use */ -#define TIF_SVE_VL_INHERIT 24 /* Inherit sve_vl_onexec across exec */ +#define TIF_SVE_VL_INHERIT 24 /* Inherit SVE vl_onexec across exec */ #define TIF_SSBD 25 /* Wants SSB mitigation */ #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_32BIT_AARCH64 27 /* 32 bit process on AArch64(ILP32) */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index d5c582625793..852a20c40e3e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -133,6 +133,17 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = { #endif };
+static unsigned int vec_vl_inherit_flag(enum vec_type type) +{ + switch (type) { + case ARM64_VEC_SVE: + return TIF_SVE_VL_INHERIT; + default: + WARN_ON_ONCE(1); + return 0; + } +} + struct vl_config { int __default_vl; /* Default VL for tasks */ }; @@ -237,24 +248,27 @@ static void sve_free(struct task_struct *task) __sve_free(task); }
-unsigned int task_get_sve_vl(const struct task_struct *task) +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type) { - return task->thread.sve_vl; + return task->thread.vl[type]; }
-void task_set_sve_vl(struct task_struct *task, unsigned long vl) +void task_set_vl(struct task_struct *task, enum vec_type type, + unsigned long vl) { - task->thread.sve_vl = vl; + task->thread.vl[type] = vl; }
-unsigned int task_get_sve_vl_onexec(const struct task_struct *task) +unsigned int task_get_vl_onexec(const struct task_struct *task, + enum vec_type type) { - return task->thread.sve_vl_onexec; + return task->thread.vl_onexec[type]; }
-void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl) +void task_set_vl_onexec(struct task_struct *task, enum vec_type type, + unsigned long vl) { - task->thread.sve_vl_onexec = vl; + task->thread.vl_onexec[type] = vl; }
/* @@ -1072,10 +1086,43 @@ void fpsimd_thread_switch(struct task_struct *next) __put_cpu_fpsimd_context(); }
-void fpsimd_flush_thread(void) +static void fpsimd_flush_thread_vl(enum vec_type type) { int vl, supported_vl;
+ /* + * Reset the task vector length as required. This is where we + * ensure that all user tasks have a valid vector length + * configured: no kernel task can become a user task without + * an exec and hence a call to this function. By the time the + * first call to this function is made, all early hardware + * probing is complete, so __sve_default_vl should be valid. + * If a bug causes this to go wrong, we make some noise and + * try to fudge thread.sve_vl to a safe value here. + */ + vl = task_get_vl_onexec(current, type); + if (!vl) + vl = get_default_vl(type); + + if (WARN_ON(!sve_vl_valid(vl))) + vl = SVE_VL_MIN; + + supported_vl = find_supported_vector_length(type, vl); + if (WARN_ON(supported_vl != vl)) + vl = supported_vl; + + task_set_vl(current, type, vl); + + /* + * If the task is not set to inherit, ensure that the vector + * length will be reset by a subsequent exec: + */ + if (!test_thread_flag(vec_vl_inherit_flag(type))) + task_set_vl_onexec(current, type, 0); +} + +void fpsimd_flush_thread(void) +{ if (!system_supports_fpsimd()) return;
@@ -1088,37 +1135,7 @@ void fpsimd_flush_thread(void) if (system_supports_sve()) { clear_thread_flag(TIF_SVE); sve_free(current); - - /* - * Reset the task vector length as required. - * This is where we ensure that all user tasks have a valid - * vector length configured: no kernel task can become a user - * task without an exec and hence a call to this function. - * By the time the first call to this function is made, all - * early hardware probing is complete, so __sve_default_vl - * should be valid. - * If a bug causes this to go wrong, we make some noise and - * try to fudge thread.sve_vl to a safe value here. - */ - vl = task_get_sve_vl_onexec(current); - if (!vl) - vl = get_sve_default_vl(); - - if (WARN_ON(!sve_vl_valid(vl))) - vl = SVE_VL_MIN; - - supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl); - if (WARN_ON(supported_vl != vl)) - vl = supported_vl; - - task_set_sve_vl(current, vl); - - /* - * If the task is not set to inherit, ensure that the vector - * length will be reset by a subsequent exec: - */ - if (!test_thread_flag(TIF_SVE_VL_INHERIT)) - task_set_sve_vl_onexec(current, 0); + fpsimd_flush_thread_vl(ARM64_VEC_SVE); }
put_cpu_fpsimd_context();
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.17-rc1 commit 97bcbee404e386195614b58e98ab2e7eddd89a5c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The vector length configuration for SME is very similar to that for SVE so in order to allow reuse refactor the SVE configuration so that it takes the vector type from the struct ctl_table. Since there's no dedicated space for this we repurpose the extra1 field to store the vector type, this is otherwise unused for integer sysctls.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 852a20c40e3e..2ee70c4413e9 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -15,6 +15,7 @@ #include <linux/compiler.h> #include <linux/cpu.h> #include <linux/cpu_pm.h> +#include <linux/ctype.h> #include <linux/kernel.h> #include <linux/linkage.h> #include <linux/irqflags.h> @@ -404,12 +405,13 @@ static unsigned int find_supported_vector_length(enum vec_type type,
#if defined(CONFIG_ARM64_SVE) && defined(CONFIG_SYSCTL)
-static int sve_proc_do_default_vl(struct ctl_table *table, int write, +static int vec_proc_do_default_vl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct vl_info *info = &vl_info[ARM64_VEC_SVE]; + struct vl_info *info = table->extra1; + enum vec_type type = info->type; int ret; - int vl = get_sve_default_vl(); + int vl = get_default_vl(type); struct ctl_table tmp_table = { .data = &vl, .maxlen = sizeof(vl), @@ -426,7 +428,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write, if (!sve_vl_valid(vl)) return -EINVAL;
- set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl)); + set_default_vl(type, find_supported_vector_length(type, vl)); return 0; }
@@ -434,7 +436,8 @@ static struct ctl_table sve_default_vl_table[] = { { .procname = "sve_default_vector_length", .mode = 0644, - .proc_handler = sve_proc_do_default_vl, + .proc_handler = vec_proc_do_default_vl, + .extra1 = &vl_info[ARM64_VEC_SVE], }, { } }; @@ -1105,7 +1108,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type) vl = get_default_vl(type);
if (WARN_ON(!sve_vl_valid(vl))) - vl = SVE_VL_MIN; + vl = vl_info[type].min_vl;
supported_vl = find_supported_vector_length(type, vl); if (WARN_ON(supported_vl != vl))
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.17-rc1 commit 30c43e73b3fa2d75f5d4e72fea345380ba5eb21b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
In preparation for adding SME support update the bulk of the implementation for the vector length configuration prctl() calls to be independent of vector type.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211210184133.320748-3-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 6 ++--- arch/arm64/kernel/fpsimd.c | 47 ++++++++++++++++++--------------- arch/arm64/kernel/ptrace.c | 4 +-- arch/arm64/kvm/reset.c | 8 +++--- 4 files changed, 34 insertions(+), 31 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 785bcf91c021..b7f42a72cde5 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -52,8 +52,8 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_save_and_flush_cpu_state(void);
-/* Maximum VL that SVE VL-agnostic software can transparently support */ -#define SVE_VL_ARCH_MAX 0x100 +/* Maximum VL that SVE/SME VL-agnostic software can transparently support */ +#define VL_ARCH_MAX 0x100
/* Offset of FFR in the SVE register dump */ static inline size_t sve_ffr_offset(int vl) @@ -125,7 +125,7 @@ extern void fpsimd_sync_to_sve(struct task_struct *task); extern void sve_sync_to_fpsimd(struct task_struct *task); extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
-extern int sve_set_vector_length(struct task_struct *task, +extern int vec_set_vector_length(struct task_struct *task, enum vec_type type, unsigned long vl, unsigned long flags);
extern int sve_set_current_vl(unsigned long arg); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 2ee70c4413e9..633319462794 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -630,7 +630,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task) __fpsimd_to_sve(sst, fst, vq); }
-int sve_set_vector_length(struct task_struct *task, +int vec_set_vector_length(struct task_struct *task, enum vec_type type, unsigned long vl, unsigned long flags) { if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT | @@ -641,33 +641,35 @@ int sve_set_vector_length(struct task_struct *task, return -EINVAL;
/* - * Clamp to the maximum vector length that VL-agnostic SVE code can - * work with. A flag may be assigned in the future to allow setting - * of larger vector lengths without confusing older software. + * Clamp to the maximum vector length that VL-agnostic code + * can work with. A flag may be assigned in the future to + * allow setting of larger vector lengths without confusing + * older software. */ - if (vl > SVE_VL_ARCH_MAX) - vl = SVE_VL_ARCH_MAX; + if (vl > VL_ARCH_MAX) + vl = VL_ARCH_MAX;
- vl = find_supported_vector_length(ARM64_VEC_SVE, vl); + vl = find_supported_vector_length(type, vl);
if (flags & (PR_SVE_VL_INHERIT | PR_SVE_SET_VL_ONEXEC)) - task_set_sve_vl_onexec(task, vl); + task_set_vl_onexec(task, type, vl); else /* Reset VL to system default on next exec: */ - task_set_sve_vl_onexec(task, 0); + task_set_vl_onexec(task, type, 0);
/* Only actually set the VL if not deferred: */ if (flags & PR_SVE_SET_VL_ONEXEC) goto out;
- if (vl == task_get_sve_vl(task)) + if (vl == task_get_vl(task, type)) goto out;
/* * To ensure the FPSIMD bits of the SVE vector registers are preserved, * write any live register state back to task_struct, and convert to a - * non-SVE thread. + * regular FPSIMD thread. Since the vector length can only be changed + * with a syscall we can't be in streaming mode while reconfiguring. */ if (task == current) { get_cpu_fpsimd_context(); @@ -688,10 +690,10 @@ int sve_set_vector_length(struct task_struct *task, */ sve_free(task);
- task_set_sve_vl(task, vl); + task_set_vl(task, type, vl);
out: - update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT, + update_tsk_thread_flag(task, vec_vl_inherit_flag(type), flags & PR_SVE_VL_INHERIT);
return 0; @@ -699,20 +701,21 @@ int sve_set_vector_length(struct task_struct *task,
/* * Encode the current vector length and flags for return. - * This is only required for prctl(): ptrace has separate fields + * This is only required for prctl(): ptrace has separate fields. + * SVE and SME use the same bits for _ONEXEC and _INHERIT. * - * flags are as for sve_set_vector_length(). + * flags are as for vec_set_vector_length(). */ -static int sve_prctl_status(unsigned long flags) +static int vec_prctl_status(enum vec_type type, unsigned long flags) { int ret;
if (flags & PR_SVE_SET_VL_ONEXEC) - ret = task_get_sve_vl_onexec(current); + ret = task_get_vl_onexec(current, type); else - ret = task_get_sve_vl(current); + ret = task_get_vl(current, type);
- if (test_thread_flag(TIF_SVE_VL_INHERIT)) + if (test_thread_flag(vec_vl_inherit_flag(type))) ret |= PR_SVE_VL_INHERIT;
return ret; @@ -730,11 +733,11 @@ int sve_set_current_vl(unsigned long arg) if (!system_supports_sve() || is_compat_task()) return -EINVAL;
- ret = sve_set_vector_length(current, vl, flags); + ret = vec_set_vector_length(current, ARM64_VEC_SVE, vl, flags); if (ret) return ret;
- return sve_prctl_status(flags); + return vec_prctl_status(ARM64_VEC_SVE, flags); }
/* PR_SVE_GET_VL */ @@ -743,7 +746,7 @@ int sve_get_current_vl(void) if (!system_supports_sve() || is_compat_task()) return -EINVAL;
- return sve_prctl_status(0); + return vec_prctl_status(ARM64_VEC_SVE, 0); }
static void vec_probe_vqs(struct vl_info *info, diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 1bd130a4d488..b125580797d9 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -813,9 +813,9 @@ static int sve_set(struct task_struct *target,
/* * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by - * sve_set_vector_length(), which will also validate them for us: + * vec_set_vector_length(), which will also validate them for us: */ - ret = sve_set_vector_length(target, header.vl, + ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl, ((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16); if (ret) goto out; diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 032911b34f5a..4814d45a9cd4 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -105,10 +105,10 @@ int kvm_arm_init_sve(void) * The get_sve_reg()/set_sve_reg() ioctl interface will need * to be extended with multiple register slice support in * order to support vector lengths greater than - * SVE_VL_ARCH_MAX: + * VL_ARCH_MAX: */ - if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX)) - kvm_sve_max_vl = SVE_VL_ARCH_MAX; + if (WARN_ON(kvm_sve_max_vl > VL_ARCH_MAX)) + kvm_sve_max_vl = VL_ARCH_MAX;
/* * Don't even try to make use of vector lengths that @@ -160,7 +160,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu) * set_sve_vls(). Double-check here just to be sure: */ if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() || - vl > SVE_VL_ARCH_MAX)) + vl > VL_ARCH_MAX)) return -EIO;
buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.17-rc1 commit aed34d9e52b80b4e7485119272c77c5553a21499 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
As suggested by Luis for the SME version of this explicitly say that the vector length should be extracted from the return value of a set vector length prctl() with a bitwise and rather than just any old and.
Suggested-by: Luis Machado Luis.Machado@arm.com Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20211210184133.320748-4-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- Documentation/arm64/sve.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst index 03137154299e..9d9a4de5bc34 100644 --- a/Documentation/arm64/sve.rst +++ b/Documentation/arm64/sve.rst @@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL) vector length change (which would only normally be the case between a fork() or vfork() and the corresponding execve() in typical use).
- To extract the vector length from the result, and it with + To extract the vector length from the result, bitwise and it with PR_SVE_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error:
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.18-rc1 commit 0a2eec83c2c23cf609e781732b338a9a4f18e00c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Since all the fields in the main ID registers are 4 bits wide we have up until now not bothered specifying the width in the code. Since we now wish to use this mechanism to enumerate features from the floating point feature registers which do not follow this pattern add a width to the table. This means updating all the existing table entries but makes it less likely that we run into issues in future due to implicitly assuming a 4 bit width.
Signed-off-by: Mark Brown broonie@kernel.org Cc: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220207152109.197566-4-broonie@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 165 +++++++++++++++++----------- 2 files changed, 100 insertions(+), 66 deletions(-)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 44c856df8848..3fa6cf044c52 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -335,6 +335,7 @@ struct arm64_cpu_capabilities { struct { /* Feature register checking */ u32 sys_reg; u8 field_pos; + u8 field_width; u8 min_field_value; u8 hwcap_type; bool sign; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 1a21eaf14de6..610bc0671705 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1213,7 +1213,9 @@ early_param("lse", parse_lse); static bool feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry) { - int val = cpuid_feature_extract_field(reg, entry->field_pos, entry->sign); + int val = cpuid_feature_extract_field_width(reg, entry->field_pos, + entry->field_width, + entry->sign);
return val >= entry->min_field_value; } @@ -1839,6 +1841,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64MMFR0_EL1, .field_pos = ID_AA64MMFR0_ECV_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, }, @@ -1850,6 +1853,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64MMFR1_EL1, .field_pos = ID_AA64MMFR1_PAN_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, .cpu_enable = cpu_enable_pan, @@ -1863,6 +1867,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64MMFR1_EL1, .field_pos = ID_AA64MMFR1_PAN_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 3, }, @@ -1875,6 +1880,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature_lse, .sys_reg = SYS_ID_AA64ISAR0_EL1, .field_pos = ID_AA64ISAR0_ATOMICS_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 2, }, @@ -1902,6 +1908,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64PFR0_EL0_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT, }, #ifdef CONFIG_KVM @@ -1913,6 +1920,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64PFR0_EL1_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR0_EL1_32BIT_64BIT, }, #endif @@ -1927,6 +1935,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { */ .sys_reg = SYS_ID_AA64PFR0_EL1, .field_pos = ID_AA64PFR0_CSV3_SHIFT, + .field_width = 4, .min_field_value = 1, .matches = unmap_kernel_at_el0, .cpu_enable = kpti_install_ng_mappings, @@ -1946,6 +1955,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64ISAR1_EL1, .field_pos = ID_AA64ISAR1_DPB_SHIFT, + .field_width = 4, .min_field_value = 1, }, { @@ -1956,6 +1966,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64ISAR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64ISAR1_DPB_SHIFT, + .field_width = 4, .min_field_value = 2, }, #endif @@ -1967,6 +1978,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64PFR0_SVE_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR0_SVE, .matches = has_cpuid_feature, .cpu_enable = sve_kernel_enable, @@ -1981,6 +1993,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64PFR0_RAS_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR0_RAS_V1, .cpu_enable = cpu_clear_disr, }, @@ -2011,6 +2024,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64PFR0_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64PFR0_AMU_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR0_AMU, .cpu_enable = cpu_amu_enable, }, @@ -2035,6 +2049,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64MMFR2_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64MMFR2_FWB_SHIFT, + .field_width = 4, .min_field_value = 1, .matches = has_cpuid_feature, .cpu_enable = cpu_has_fwb, @@ -2046,6 +2061,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64MMFR2_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64MMFR2_TTL_SHIFT, + .field_width = 4, .min_field_value = 1, .matches = has_cpuid_feature, }, @@ -2056,6 +2072,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64ISAR0_EL1, .field_pos = ID_AA64ISAR0_TLB_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = ID_AA64ISAR0_TLB_RANGE, }, @@ -2074,6 +2091,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64MMFR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64MMFR1_HADBS_SHIFT, + .field_width = 4, .min_field_value = 2, .matches = has_hw_dbm, .cpu_enable = cpu_enable_hw_dbm, @@ -2086,6 +2104,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64ISAR0_EL1, .field_pos = ID_AA64ISAR0_CRC32_SHIFT, + .field_width = 4, .min_field_value = 1, }, { @@ -2095,6 +2114,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64PFR1_EL1, .field_pos = ID_AA64PFR1_SSBS_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = ID_AA64PFR1_SSBS_PSTATE_ONLY, }, @@ -2107,6 +2127,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64MMFR2_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64MMFR2_CNP_SHIFT, + .field_width = 4, .min_field_value = 1, .cpu_enable = cpu_enable_cnp, }, @@ -2118,6 +2139,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64ISAR1_EL1, .field_pos = ID_AA64ISAR1_SB_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, }, @@ -2129,6 +2151,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64ISAR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64ISAR1_APA_SHIFT, + .field_width = 4, .min_field_value = ID_AA64ISAR1_APA_ARCHITECTED, .matches = has_address_auth_cpucap, }, @@ -2139,6 +2162,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64ISAR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64ISAR1_API_SHIFT, + .field_width = 4, .min_field_value = ID_AA64ISAR1_API_IMP_DEF, .matches = has_address_auth_cpucap, }, @@ -2154,6 +2178,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64ISAR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64ISAR1_GPA_SHIFT, + .field_width = 4, .min_field_value = ID_AA64ISAR1_GPA_ARCHITECTED, .matches = has_cpuid_feature, }, @@ -2164,6 +2189,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sys_reg = SYS_ID_AA64ISAR1_EL1, .sign = FTR_UNSIGNED, .field_pos = ID_AA64ISAR1_GPI_SHIFT, + .field_width = 4, .min_field_value = ID_AA64ISAR1_GPI_IMP_DEF, .matches = has_cpuid_feature, }, @@ -2184,6 +2210,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = can_use_gic_priorities, .sys_reg = SYS_ID_AA64PFR0_EL1, .field_pos = ID_AA64PFR0_GIC_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, }, @@ -2195,6 +2222,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .type = ARM64_CPUCAP_SYSTEM_FEATURE, .sys_reg = SYS_ID_AA64MMFR2_EL1, .sign = FTR_UNSIGNED, + .field_width = 4, .field_pos = ID_AA64MMFR2_E0PD_SHIFT, .matches = has_cpuid_feature, .min_field_value = 1, @@ -2209,6 +2237,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64ISAR0_EL1, .field_pos = ID_AA64ISAR0_RNDR_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, }, @@ -2226,6 +2255,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .cpu_enable = bti_enable, .sys_reg = SYS_ID_AA64PFR1_EL1, .field_pos = ID_AA64PFR1_BT_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR1_BT_BTI, .sign = FTR_UNSIGNED, }, @@ -2238,6 +2268,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64PFR1_EL1, .field_pos = ID_AA64PFR1_MTE_SHIFT, + .field_width = 4, .min_field_value = ID_AA64PFR1_MTE, .sign = FTR_UNSIGNED, .cpu_enable = cpu_enable_mte, @@ -2264,10 +2295,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = { {}, };
-#define HWCAP_CPUID_MATCH(reg, field, s, min_value) \ +#define HWCAP_CPUID_MATCH(reg, field, width, s, min_value) \ .matches = has_cpuid_feature, \ .sys_reg = reg, \ .field_pos = field, \ + .field_width = width, \ .sign = s, \ .min_field_value = min_value,
@@ -2277,10 +2309,10 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .hwcap_type = cap_type, \ .hwcap = cap, \
-#define HWCAP_CAP(reg, field, s, min_value, cap_type, cap) \ +#define HWCAP_CAP(reg, field, width, s, min_value, cap_type, cap) \ { \ __HWCAP_CAP(#cap, cap_type, cap) \ - HWCAP_CPUID_MATCH(reg, field, s, min_value) \ + HWCAP_CPUID_MATCH(reg, field, width, s, min_value) \ }
#define HWCAP_MULTI_CAP(list, cap_type, cap) \ @@ -2300,11 +2332,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = { static const struct arm64_cpu_capabilities ptr_auth_hwcap_addr_matches[] = { { HWCAP_CPUID_MATCH(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_APA_SHIFT, - FTR_UNSIGNED, ID_AA64ISAR1_APA_ARCHITECTED) + 4, FTR_UNSIGNED, + ID_AA64ISAR1_APA_ARCHITECTED) }, { HWCAP_CPUID_MATCH(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_API_SHIFT, - FTR_UNSIGNED, ID_AA64ISAR1_API_IMP_DEF) + 4, FTR_UNSIGNED, ID_AA64ISAR1_API_IMP_DEF) }, {}, }; @@ -2312,77 +2345,77 @@ static const struct arm64_cpu_capabilities ptr_auth_hwcap_addr_matches[] = { static const struct arm64_cpu_capabilities ptr_auth_hwcap_gen_matches[] = { { HWCAP_CPUID_MATCH(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_GPA_SHIFT, - FTR_UNSIGNED, ID_AA64ISAR1_GPA_ARCHITECTED) + 4, FTR_UNSIGNED, ID_AA64ISAR1_GPA_ARCHITECTED) }, { HWCAP_CPUID_MATCH(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_GPI_SHIFT, - FTR_UNSIGNED, ID_AA64ISAR1_GPI_IMP_DEF) + 4, FTR_UNSIGNED, ID_AA64ISAR1_GPI_IMP_DEF) }, {}, }; #endif
static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = { - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_AES_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_PMULL), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_AES_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AES), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA1_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA1), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA2_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA2), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA2_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_SHA512), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_CRC32_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_CRC32), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_ATOMICS_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_ATOMICS), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_RDM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDRDM), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA3_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA3), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SM3), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SM4), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDDP), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDFHM), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FLAGM), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_FLAGM2), - HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_RNDR_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RNG), - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_FP), - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FPHP), - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_ASIMD), - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDHP), - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_DIT_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DIT), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DPB_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DCPOP), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DPB_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_DCPODP), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_JSCVT_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_JSCVT), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FCMA_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FCMA), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_LRCPC), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_ILRCPC), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FRINTTS_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FRINT), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_SB_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SB), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_BF16_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_BF16), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DGH_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DGH), - HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_I8MM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_I8MM), - HWCAP_CAP(SYS_ID_AA64MMFR2_EL1, ID_AA64MMFR2_AT_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_USCAT), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_AES_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_PMULL), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_AES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AES), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA1_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA1), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA2_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA2), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA2_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_SHA512), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_CRC32_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_CRC32), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_ATOMICS_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_ATOMICS), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_RDM_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDRDM), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SHA3_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SHA3), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SM3), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SM4), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDDP), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDFHM), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FLAGM), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_FLAGM2), + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_RNDR_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RNG), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, 4, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_FP), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, 4, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FPHP), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, 4, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_ASIMD), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, 4, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDHP), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_DIT_SHIFT, 4, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DIT), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DPB_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DCPOP), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DPB_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_DCPODP), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_JSCVT_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_JSCVT), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FCMA_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FCMA), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_LRCPC), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, 4, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_ILRCPC), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FRINTTS_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FRINT), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_SB_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SB), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_BF16_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_BF16), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_DGH_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_DGH), + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_I8MM_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_I8MM), + HWCAP_CAP(SYS_ID_AA64MMFR2_EL1, ID_AA64MMFR2_AT_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_USCAT), #ifdef CONFIG_ARM64_SVE - HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_SVE_SHIFT, FTR_UNSIGNED, ID_AA64PFR0_SVE, CAP_HWCAP, KERNEL_HWCAP_SVE), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SVEVER_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_SVEVER_SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_AES_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_AES, CAP_HWCAP, KERNEL_HWCAP_SVEAES), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_AES_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_AES_PMULL, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_BITPERM_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_BITPERM, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_BF16_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_BF16, CAP_HWCAP, KERNEL_HWCAP_SVEBF16), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SHA3_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_SHA3, CAP_HWCAP, KERNEL_HWCAP_SVESHA3), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SM4_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_SM4, CAP_HWCAP, KERNEL_HWCAP_SVESM4), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_I8MM_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_I8MM, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_F32MM_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_F32MM, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM), - HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_F64MM_SHIFT, FTR_UNSIGNED, ID_AA64ZFR0_F64MM, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM), + HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_SVE_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR0_SVE, CAP_HWCAP, KERNEL_HWCAP_SVE), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SVEVER_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_SVEVER_SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_AES_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_AES, CAP_HWCAP, KERNEL_HWCAP_SVEAES), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_AES_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_AES_PMULL, CAP_HWCAP, KERNEL_HWCAP_SVEPMULL), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_BITPERM_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_BITPERM, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_BF16_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_BF16, CAP_HWCAP, KERNEL_HWCAP_SVEBF16), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SHA3_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_SHA3, CAP_HWCAP, KERNEL_HWCAP_SVESHA3), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_SM4_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_SM4, CAP_HWCAP, KERNEL_HWCAP_SVESM4), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_I8MM_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_I8MM, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_F32MM_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_F32MM, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM), + HWCAP_CAP(SYS_ID_AA64ZFR0_EL1, ID_AA64ZFR0_F64MM_SHIFT, 4, FTR_UNSIGNED, ID_AA64ZFR0_F64MM, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM), #endif - HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SSBS_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_SSBS_PSTATE_INSNS, CAP_HWCAP, KERNEL_HWCAP_SSBS), + HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SSBS_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_SSBS_PSTATE_INSNS, CAP_HWCAP, KERNEL_HWCAP_SSBS), #ifdef CONFIG_ARM64_BTI - HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_BT_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_BT_BTI, CAP_HWCAP, KERNEL_HWCAP_BTI), + HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_BT_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_BT_BTI, CAP_HWCAP, KERNEL_HWCAP_BTI), #endif #ifdef CONFIG_ARM64_PTR_AUTH HWCAP_MULTI_CAP(ptr_auth_hwcap_addr_matches, CAP_HWCAP, KERNEL_HWCAP_PACA), HWCAP_MULTI_CAP(ptr_auth_hwcap_gen_matches, CAP_HWCAP, KERNEL_HWCAP_PACG), #endif #ifdef CONFIG_ARM64_MTE - HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE), + HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE), #endif /* CONFIG_ARM64_MTE */ - HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV), - HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP), - HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES), + HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV), + HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP), + HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES), {}, };
@@ -2411,15 +2444,15 @@ static bool compat_has_neon(const struct arm64_cpu_capabilities *cap, int scope) static const struct arm64_cpu_capabilities a32_elf_hwcaps[] = { #ifdef CONFIG_AARCH32_EL0 HWCAP_CAP_MATCH(compat_has_neon, CAP_COMPAT_HWCAP, COMPAT_HWCAP_NEON), - HWCAP_CAP(SYS_MVFR1_EL1, MVFR1_SIMDFMAC_SHIFT, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFPv4), + HWCAP_CAP(SYS_MVFR1_EL1, MVFR1_SIMDFMAC_SHIFT, 4, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFPv4), /* Arm v8 mandates MVFR0.FPDP == {0, 2}. So, piggy back on this for the presence of VFP support */ - HWCAP_CAP(SYS_MVFR0_EL1, MVFR0_FPDP_SHIFT, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFP), - HWCAP_CAP(SYS_MVFR0_EL1, MVFR0_FPDP_SHIFT, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFPv3), - HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_AES_SHIFT, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_PMULL), - HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_AES_SHIFT, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_AES), - HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_SHA1_SHIFT, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_SHA1), - HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_SHA2_SHIFT, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_SHA2), - HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_CRC32_SHIFT, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_CRC32), + HWCAP_CAP(SYS_MVFR0_EL1, MVFR0_FPDP_SHIFT, 4, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFP), + HWCAP_CAP(SYS_MVFR0_EL1, MVFR0_FPDP_SHIFT, 4, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP, COMPAT_HWCAP_VFPv3), + HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_AES_SHIFT, 4, FTR_UNSIGNED, 2, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_PMULL), + HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_AES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_AES), + HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_SHA1_SHIFT, 4, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_SHA1), + HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_SHA2_SHIFT, 4, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_SHA2), + HWCAP_CAP(SYS_ID_ISAR5_EL1, ID_ISAR5_CRC32_SHIFT, 4, FTR_UNSIGNED, 1, CAP_COMPAT_HWCAP2, COMPAT_HWCAP2_CRC32), #endif {}, };
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 96d32e630935c1636b0236c88779e81eff120e0a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Provide ABI documentation for SME similar to that for SVE. Due to the very large overlap around streaming SVE mode in both implementation and interfaces documentation for streaming mode SVE is added to the SVE document rather than the SME one.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Szabolcs Nagy szabolcs.nagy@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-5-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- Documentation/arm64/index.rst | 1 + Documentation/arm64/sme.rst | 428 ++++++++++++++++++++++++++++++++++ Documentation/arm64/sve.rst | 70 +++++- 3 files changed, 489 insertions(+), 10 deletions(-) create mode 100644 Documentation/arm64/sme.rst
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index 937634c49979..130a3d0d9a36 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -20,6 +20,7 @@ ARM64 Architecture perf pointer-authentication silicon-errata + sme sve tagged-address-abi tagged-pointers diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst new file mode 100644 index 000000000000..8ba677b87e90 --- /dev/null +++ b/Documentation/arm64/sme.rst @@ -0,0 +1,428 @@ +=================================================== +Scalable Matrix Extension support for AArch64 Linux +=================================================== + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Matrix Extension (SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. It should be read in conjunction with the SVE +documentation in sve.rst which provides details on the Streaming SVE mode +included in SME. + +This document does not aim to describe the SME architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SME is included in Appendix A. + + +1. General +----------- + +* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA + register state and TPIDR2_EL0 are tracked per thread. + +* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector + AT_HWCAP2 entry. Presence of this flag implies the presence of the SME + instructions and registers, and the Linux-specific system interfaces + described in this document. SME is reported in /proc/cpuinfo as "sme". + +* Support for the execution of SME instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS + instruction, and checking that the value of the SME field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP2_SME instead. + +* There are a number of optional SME features, presence of these is reported + through AT_HWCAP2 through: + + HWCAP2_SME_I16I64 + HWCAP2_SME_F64F64 + HWCAP2_SME_I8I32 + HWCAP2_SME_F16F32 + HWCAP2_SME_B16F32 + HWCAP2_SME_F32F32 + HWCAP2_SME_FA64 + + This list may be extended over time as the SME architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way + of detecting support for these regsets is to connect to a target process + first and then attempt a + + ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov). + +* Whenever ZA register values are exchanged in memory between userspace and + the kernel, the register value is encoded in memory as a series of horizontal + vectors from 0 to VL/8-1 stored in the same endianness invariant format as is + used for SVE vectors. + +* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, + in which case it is set to 0. + +2. Vector lengths +------------------ + +SME defines a second vector length similar to the SVE vector length which is +controls the size of the streaming mode SVE vectors and the ZA matrix array. +The ZA matrix is square with each side having as many bytes as a streaming +mode SVE vector. + + +3. Sharing of streaming and non-streaming mode SVE state +--------------------------------------------------------- + +It is implementation defined which if any parts of the SVE state are shared +between streaming and non-streaming modes. When switching between modes +via software interfaces such as ptrace if no register content is provided as +part of switching no state will be assumed to be shared and everything will +be zeroed. + + +4. System call behaviour +------------------------- + +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the + ZA matrix are preserved. + +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled + as per the standard SVE ABI. + +* Neither the SVE registers nor ZA are used to pass arguments to or receive + results from any syscall. + +* On process creation (eg, clone()) the newly created process will have + PSTATE.SM cleared. + +* All other SME state of a thread, including the currently configured vector + length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + +5. Signal handling +------------------- + +* Signal handlers are invoked with streaming mode and ZA disabled. + +* A new signal frame record za_context encodes the ZA register contents on + signal delivery. [1] + +* The signal frame record for ZA always contains basic metadata, in particular + the thread's vector length (in za_context.vl). + +* The ZA matrix may or may not be included in the record, depending on + the value of PSTATE.ZA. The registers are present if and only if: + za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + in which case PSTATE.ZA == 1. + +* If matrix data is present, the remainder of the record has a vl-dependent + size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to + them. + +* The matrix is stored as a series of horizontal vectors in the same format as + is used for SVE vectors. + +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. za_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no za_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then ZA is disabled. + +* If za_context is present in the signal frame and contains matrix data then + PSTATE.ZA is set to 1 and ZA is populated with the specified data. + +* The vector length cannot be changed via signal return. If za_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SME vector +length: + +prctl(PR_SME_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SME_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SME_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + Without PR_SME_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SME not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SME_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SME_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of + Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified, including both streaming and non-streaming SVE state. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + +prctl(PR_SME_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SME_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SME_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SME not supported. + + +7. ptrace extensions +--------------------- + +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE + state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in + sve.rst. + +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + + Refer to [2] for definitions. + +The regset data starts with struct user_za_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current streaming vector length, in bytes. + + max_vl + + Maximum possible streaming vector length for the target thread. + + flags + + Zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SME_PT_VL_INHERIT + + SME_PT_VL_ONEXEC (SETREGSET only). + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SME_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* The size and layout of the payload depends on the header fields. The + SME_PT_ZA_*() macros are provided to facilitate access to the data. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case the vector length and flags are changed and PSTATE.ZA is set to 0 + (along with any consequences of those changes). If a payload is provided + then PSTATE.ZA will be set to 1. + +* For SETREGSET, if the requested VL is not supported, the effect will be the + same as if the payload were omitted, except that an EIO error is reported. + No attempt is made to translate the payload data to the correct layout + for the vector length actually set. It is up to the caller to translate the + payload layout for the actual VL and retry. + +* The effect of writing a partial, incomplete payload is unspecified. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +* A NT_ARM_ZA note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread + when the coredump was generated. + + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sme_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 32 or the maximum + supported vector length, whichever is smaller and supported. This + determines the initial vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + + +Appendix A. SME programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SVE to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SME adds the following: + +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE + features are available. When supported EL0 software may enter and leave + streaming mode at any time. + + For best system performance it is strongly encouraged for software to enable + streaming mode only when it is actively being used. + +* A new vector length controlling the size of ZA and the Z registers when in + streaming mode, separately to the vector length used for SVE when not in + streaming mode. There is no requirement that either the currently selected + vector length or the set of vector lengths supported for the two modes in + a given system have any relationship. The streaming mode vector length + is referred to as SVL. + +* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most + operations on ZA require that streaming mode be enabled but ZA can be + enabled without streaming mode in order to load, save and retain data. + + For best system performance it is strongly encouraged for software to enable + ZA only when it is actively being used. + +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and + SMSTOP instructions or by access to the SVCR system register: + + * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid + data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is + changed from 0 to 1 all bits in ZA are cleared. + + * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value + of PSTATE.SM is changed then it is implementation defined if the subset + of the floating point register bits valid in both modes may be retained. + Any other bits will be cleared. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arm64/cpu-feature-registers.rst diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst index 9d9a4de5bc34..93c2c2990584 100644 --- a/Documentation/arm64/sve.rst +++ b/Documentation/arm64/sve.rst @@ -7,7 +7,9 @@ Author: Dave Martin Dave.Martin@arm.com Date: 4 August 2017
This document outlines briefly the interface provided to userspace by Linux in -order to support use of the ARM Scalable Vector Extension (SVE). +order to support use of the ARM Scalable Vector Extension (SVE), including +interactions with Streaming SVE mode added by the Scalable Matrix Extension +(SME).
This is an outline of the most important features and issues only and not intended to be exhaustive. @@ -23,6 +25,10 @@ model features for SVE is included in Appendix A. * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are tracked per-thread.
+* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present + in the system, when it is not supported and these interfaces are used to + access streaming mode FFR is read and written as zero. + * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector AT_HWCAP entry. Presence of this flag implies the presence of the SVE instructions and registers, and the Linux-specific system interfaces @@ -53,10 +59,19 @@ model features for SVE is included in Appendix A. which userspace can read using an MRS instruction. See elf_hwcaps.txt and cpu-feature-registers.txt for details.
+* On hardware that supports the SME extensions, HWCAP2_SME will also be + reported in the AT_HWCAP2 aux vector entry. Among other things SME adds + streaming mode which provides a subset of the SVE feature set using a + separate SME vector length and the same Z/V registers. See sme.rst + for more details. + * Debuggers should restrict themselves to interacting with the target via the NT_ARM_SVE regset. The recommended way of detecting support for this regset is to connect to a target process first and then attempt a - ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). + ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is + present and streaming SVE mode is in use the FPSIMD subset of registers + will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode + in the target.
* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory between userspace and the kernel, the register value is encoded in memory in @@ -126,6 +141,11 @@ the SVE instruction set architecture. are only present in fpsimd_context. For convenience, the content of V0..V31 is duplicated between sve_context and fpsimd_context.
+* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which + if set indicates that the thread is in streaming mode and the vector length + and register data (if present) describe the streaming SVE data and vector + length. + * The signal frame record for SVE always contains basic metadata, in particular the thread's vector length (in sve_context.vl).
@@ -170,6 +190,11 @@ When returning from a signal handler: the signal frame does not match the current vector length, the signal return attempt is treated as illegal, resulting in a forced SIGSEGV.
+* It is permitted to enter or leave streaming mode by setting or clearing + the SVE_SIG_FLAG_SM flag but applications should take care to ensure that + when doing so sve_context.vl and any register data are appropriate for the + vector length in the new mode. +
6. prctl extensions -------------------- @@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL) 7. ptrace extensions ---------------------
-* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and - PTRACE_SETREGSET. +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with + PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the + streaming mode SVE registers and NT_ARM_SVE describes the + non-streaming mode SVE registers. + + In this description a register set is referred to as being "live" when + the target is in the appropriate streaming or non-streaming mode and is + using data beyond the subset shared with the FPSIMD Vn registers.
Refer to [2] for definitions.
@@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing:
flags
- either + at most one of
SVE_PT_REGS_FPSIMD
@@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing:
SVE_PT_VL_ONEXEC (SETREGSET only).
+ If neither FPSIMD nor SVE flags are provided then no register + payload is available, this is only possible when SME is implemented. + + * The effects of changing the vector length and/or flags are equivalent to those documented for PR_SVE_SET_VL.
@@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing: case only the vector length and flags are changed (along with any consequences of those changes).
+* In systems supporting SME when in streaming mode a GETREGSET for + NT_REG_SVE will return only the user_sve_header with no register data, + similarly a GETREGSET for NT_REG_SSVE will not return any register data + when not in streaming mode. + +* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. + * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the requested VL is not supported, the effect will be the same as if the payload were omitted, except that an EIO error is reported. No @@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing: unspecified. It is up to the caller to translate the payload layout for the actual VL and retry.
+* Where SME is implemented it is not possible to GETREGSET the register + state for normal SVE when in streaming mode, nor the streaming mode + register state when in normal mode, regardless of the implementation defined + behaviour of the hardware for sharing data between the two modes. + +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in + streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode + if the target was not in streaming mode. + * The effect of writing a partial, incomplete payload is unspecified.
8. ELF coredump extensions ---------------------------
-* A NT_ARM_SVE note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread - when the coredump was generated. - +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated.
9. System runtime configuration --------------------------------
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit b4adc83b07706042ad6e6a767f6c04636db69bcc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The arm64 Scalable Matrix Extension (SME) adds some new system registers, fields in existing system registers and exception syndromes. This patch adds definitions for these for use in future patches implementing support for this extension.
Since SME will be the first user of FEAT_HCX in the kernel also include the definitions for enumerating it and the HCRX system register it adds.
Signed-off-by: Mark Brown broonie@kernel.org Acked-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-6-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/esr.h | 12 +++++- arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/sysreg.h | 68 ++++++++++++++++++++++++++++++++ arch/arm64/kernel/traps.c | 1 + 4 files changed, 81 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index d7638b528c97..2daf0f3a4377 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h @@ -37,7 +37,8 @@ #define ESR_ELx_EC_ERET (0x1a) /* EL2 only */ /* Unallocated EC: 0x1B */ #define ESR_ELx_EC_FPAC (0x1C) /* EL1 and above */ -/* Unallocated EC: 0x1D - 0x1E */ +#define ESR_ELx_EC_SME (0x1D) +/* Unallocated EC: 0x1E */ #define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */ #define ESR_ELx_EC_IABT_LOW (0x20) #define ESR_ELx_EC_IABT_CUR (0x21) @@ -326,6 +327,15 @@ #define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\ ESR_ELx_CP15_32_ISS_DIR_READ)
+/* + * ISS values for SME traps + */ + +#define ESR_ELx_SME_ISS_SME_DISABLED 0 +#define ESR_ELx_SME_ISS_ILL 1 +#define ESR_ELx_SME_ISS_SM_DISABLED 2 +#define ESR_ELx_SME_ISS_ZA_DISABLED 3 + #ifndef __ASSEMBLY__ #include <asm/types.h>
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index cf35d1968c93..b7116d779017 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -279,6 +279,7 @@ #define CPTR_EL2_TCPAC (1U << 31) #define CPTR_EL2_TAM (1 << 30) #define CPTR_EL2_TTA (1 << 20) +#define CPTR_EL2_TSM (1 << 12) #define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT) #define CPTR_EL2_TZ (1 << 8) #define CPTR_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 098247c4c668..37b5ebf2953b 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -115,6 +115,10 @@ * System registers, organised loosely by encoding but grouped together * where the architected name contains an index. e.g. ID_MMFR<n>_EL1. */ +#define SYS_SVCR_SMSTOP_SM_EL0 sys_reg(0, 3, 4, 2, 3) +#define SYS_SVCR_SMSTART_SM_EL0 sys_reg(0, 3, 4, 3, 3) +#define SYS_SVCR_SMSTOP_SMZA_EL0 sys_reg(0, 3, 4, 6, 3) + #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) @@ -170,6 +174,7 @@ #define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0) #define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1) #define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4) +#define SYS_ID_AA64SMFR0_EL1 sys_reg(3, 0, 0, 4, 5)
#define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0) #define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1) @@ -192,6 +197,8 @@ #define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6)
#define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0) +#define SYS_SMPRI_EL1 sys_reg(3, 0, 1, 2, 4) +#define SYS_SMCR_EL1 sys_reg(3, 0, 1, 2, 6)
#define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) #define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1) @@ -333,6 +340,8 @@
/*** End of Statistical Profiling Extension ***/
+#define SMPRI_EL1_PRIORITY_MASK 0xf + #define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1) #define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2)
@@ -388,8 +397,13 @@ #define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0) #define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1) #define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4) +#define SYS_SMIDR_EL1 sys_reg(3, 1, 0, 0, 6) #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
+#define SYS_SMIDR_EL1_IMPLEMENTER_SHIFT 24 +#define SYS_SMIDR_EL1_SMPS_SHIFT 15 +#define SYS_SMIDR_EL1_AFFINITY_SHIFT 0 + #define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0)
#define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1) @@ -398,6 +412,10 @@ #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0) #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1)
+#define SYS_SVCR_EL0 sys_reg(3, 3, 4, 2, 2) +#define SYS_SVCR_EL0_ZA_MASK 2 +#define SYS_SVCR_EL0_SM_MASK 1 + #define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2) @@ -414,6 +432,7 @@
#define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) +#define SYS_TPIDR2_EL0 sys_reg(3, 3, 13, 0, 5)
#define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7)
@@ -478,6 +497,9 @@ #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7)
#define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) +#define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2) +#define SYS_SMPRIMAP_EL2 sys_reg(3, 4, 1, 2, 5) +#define SYS_SMCR_EL2 sys_reg(3, 4, 1, 2, 6) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) @@ -534,6 +556,7 @@ #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) +#define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6) #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) @@ -557,6 +580,7 @@ #define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2)
/* Common SCTLR_ELx flags. */ +#define SCTLR_ELx_ENTP2 (BIT(60)) #define SCTLR_ELx_DSSBS (BIT(44)) #define SCTLR_ELx_ATA (BIT(43))
@@ -752,6 +776,7 @@ #define ID_AA64PFR0_EL0_32BIT_64BIT 0x2
/* id_aa64pfr1 */ +#define ID_AA64PFR1_SME_SHIFT 24 #define ID_AA64PFR1_MPAMFRAC_SHIFT 16 #define ID_AA64PFR1_RASFRAC_SHIFT 12 #define ID_AA64PFR1_MTE_SHIFT 8 @@ -762,6 +787,7 @@ #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 #define ID_AA64PFR1_BT_BTI 0x1 +#define ID_AA64PFR1_SME 1
#define ID_AA64PFR1_MTE_NI 0x0 #define ID_AA64PFR1_MTE_EL0 0x1 @@ -789,6 +815,23 @@ #define ID_AA64ZFR0_AES_PMULL 0x2 #define ID_AA64ZFR0_SVEVER_SVE2 0x1
+/* id_aa64smfr0 */ +#define ID_AA64SMFR0_FA64_SHIFT 63 +#define ID_AA64SMFR0_I16I64_SHIFT 52 +#define ID_AA64SMFR0_F64F64_SHIFT 48 +#define ID_AA64SMFR0_I8I32_SHIFT 36 +#define ID_AA64SMFR0_F16F32_SHIFT 35 +#define ID_AA64SMFR0_B16F32_SHIFT 34 +#define ID_AA64SMFR0_F32F32_SHIFT 32 + +#define ID_AA64SMFR0_FA64 0x1 +#define ID_AA64SMFR0_I16I64 0x4 +#define ID_AA64SMFR0_F64F64 0x1 +#define ID_AA64SMFR0_I8I32 0x4 +#define ID_AA64SMFR0_F16F32 0x1 +#define ID_AA64SMFR0_B16F32 0x1 +#define ID_AA64SMFR0_F32F32 0x1 + /* id_aa64mmfr0 */ #define ID_AA64MMFR0_ECV_SHIFT 60 #define ID_AA64MMFR0_FGT_SHIFT 56 @@ -822,6 +865,7 @@
/* id_aa64mmfr1 */ #define ID_AA64MMFR1_ECBHB_SHIFT 60 +#define ID_AA64MMFR1_HCX_SHIFT 40 #define ID_AA64MMFR1_AFP_SHIFT 44 #define ID_AA64MMFR1_ETS_SHIFT 36 #define ID_AA64MMFR1_TWED_SHIFT 32 @@ -1002,6 +1046,21 @@ #define ZCR_ELx_LEN_SIZE 9 #define ZCR_ELx_LEN_MASK 0x1ff
+#define SMCR_ELx_FA64_SHIFT 31 +#define SMCR_ELx_FA64_MASK (1 << SMCR_ELx_FA64_SHIFT) + +/* + * The SMCR_ELx_LEN_* definitions intentionally include bits [8:4] which + * are reserved by the SME architecture for future expansion of the LEN + * field, with compatible semantics. + */ +#define SMCR_ELx_LEN_SHIFT 0 +#define SMCR_ELx_LEN_SIZE 9 +#define SMCR_ELx_LEN_MASK 0x1ff + +#define CPACR_EL1_SMEN_EL1EN (BIT(24)) /* enable EL1 access */ +#define CPACR_EL1_SMEN_EL0EN (BIT(25)) /* enable EL0 access, if EL1EN set */ + #define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */ #define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */ #define CPACR_EL1_ZEN (CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN) @@ -1029,6 +1088,9 @@ #define SYS_TFSR_EL1_TF0 (UL(1) << SYS_TFSR_EL1_TF0_SHIFT) #define SYS_TFSR_EL1_TF1 (UL(1) << SYS_TFSR_EL1_TF1_SHIFT)
+/* HCRX_EL2 definitions */ +#define HCRX_EL2_SMPME_MASK (1 << 5) + /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */ #define SYS_MPIDR_SAFE_VAL (BIT(31))
@@ -1150,4 +1212,10 @@
#endif
+/* HFG[WR]TR_EL2 bit definitions */ +#define HFGxTR_EL2_nTPIDR2_EL0_SHIFT 55 +#define HFGxTR_EL2_nTPIDR2_EL0_MASK BIT_MASK(HFGxTR_EL2_nTPIDR2_EL0_SHIFT) +#define HFGxTR_EL2_nSMPRI_EL1_SHIFT 54 +#define HFGxTR_EL2_nSMPRI_EL1_MASK BIT_MASK(HFGxTR_EL2_nSMPRI_EL1_SHIFT) + #endif /* __ASM_SYSREG_H */ diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 7c05bcf29013..e3339115c78f 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -736,6 +736,7 @@ static const char *esr_class_str[] = { [ESR_ELx_EC_SVE] = "SVE", [ESR_ELx_EC_ERET] = "ERET/ERETAA/ERETAB", [ESR_ELx_EC_FPAC] = "FPAC", + [ESR_ELx_EC_SME] = "SME", [ESR_ELx_EC_IMP_DEF] = "EL3 IMP DEF", [ESR_ELx_EC_IABT_LOW] = "IABT (lower EL)", [ESR_ELx_EC_IABT_CUR] = "IABT (current EL)",
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit ca8a4ebcff4465f0272637433c789a5e4a272626 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
As with SVE rather than impose ambitious toolchain requirements for SME we manually encode the few instructions which we require in order to perform the work the kernel needs to do. The instructions used to save and restore context are provided as assembler macros while those for entering and leaving streaming mode are done in asm volatile blocks since they are expected to be used from C.
We could do the SMSTART and SMSTOP operations with read/modify/write cycles on SVCR but using the aliases provided for individual field accesses should be slightly faster. These instructions are aliases for MSR but since our minimum toolchain requirements are old enough to mean that we can't use the sX_X_cX_cX_X form and they always use xzr rather than taking a value like write_sysreg_s() wants we just use .inst.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-7-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 25 +++++++++++++ arch/arm64/include/asm/fpsimdmacros.h | 54 +++++++++++++++++++++++++++ 2 files changed, 79 insertions(+)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index b7f42a72cde5..814136d00f94 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -247,6 +247,31 @@ static inline void sve_setup(void) { }
#endif /* ! CONFIG_ARM64_SVE */
+#ifdef CONFIG_ARM64_SME + +static inline void sme_smstart_sm(void) +{ + asm volatile(__msr_s(SYS_SVCR_SMSTART_SM_EL0, "xzr")); +} + +static inline void sme_smstop_sm(void) +{ + asm volatile(__msr_s(SYS_SVCR_SMSTOP_SM_EL0, "xzr")); +} + +static inline void sme_smstop(void) +{ + asm volatile(__msr_s(SYS_SVCR_SMSTOP_SMZA_EL0, "xzr")); +} + +#else + +static inline void sme_smstart_sm(void) { } +static inline void sme_smstop_sm(void) { } +static inline void sme_smstop(void) { } + +#endif /* ! CONFIG_ARM64_SME */ + /* For use by EFI runtime services calls only */ extern void __efi_fpsimd_begin(void); extern void __efi_fpsimd_end(void); diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index ab4290653a11..e638890fef27 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -93,6 +93,12 @@ .endif .endm
+.macro _sme_check_wv v + .if (\v) < 12 || (\v) > 15 + .error "Bad vector select register \v." + .endif +.endm + /* SVE instruction encodings for non-SVE-capable assemblers */
/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */ @@ -173,6 +179,54 @@ | (\np) .endm
+/* SME instruction encodings for non-SME-capable assemblers */ +/* (pre binutils 2.38/LLVM 13) */ + +/* RDSVL X\nx, #\imm */ +.macro _sme_rdsvl nx, imm + _check_general_reg \nx + _check_num (\imm), -0x20, 0x1f + .inst 0x04bf5800 \ + | (\nx) \ + | (((\imm) & 0x3f) << 5) +.endm + +/* + * STR (vector from ZA array): + * STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL] + */ +.macro _sme_str_zav nw, nxbase, offset=0 + _sme_check_wv \nw + _check_general_reg \nxbase + _check_num (\offset), -0x100, 0xff + .inst 0xe1200000 \ + | (((\nw) & 3) << 13) \ + | ((\nxbase) << 5) \ + | ((\offset) & 7) +.endm + +/* + * LDR (vector to ZA array): + * LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL] + */ +.macro _sme_ldr_zav nw, nxbase, offset=0 + _sme_check_wv \nw + _check_general_reg \nxbase + _check_num (\offset), -0x100, 0xff + .inst 0xe1000000 \ + | (((\nw) & 3) << 13) \ + | ((\nxbase) << 5) \ + | ((\offset) & 7) +.endm + +/* + * Zero the entire ZA array + * ZERO ZA + */ +.macro zero_za + .inst 0xc00800ff +.endm + .macro __for from:req, to:req .if (\from) == (\to) _for__body %\from
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.13-rc1 commit 31c00d2aeaa2da89361f5b64a64ca831433be5fc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The arm64 FEAT_FGT extension introduces a set of traps to EL2 for accesses to small sets of registers and instructions from EL1 and EL0. Currently Linux makes no use of this feature, ensure that it is not active at boot by disabling the traps during EL2 setup.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210401180942.35815-3-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/sysreg.h | 6 ++++++ arch/arm64/kernel/head.S | 22 ++++++++++++++++++++++ 2 files changed, 28 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 37b5ebf2953b..4691bc456bd5 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -496,11 +496,17 @@
#define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7)
+#define SYS_HFGRTR_EL2 sys_reg(3, 4, 1, 1, 4) +#define SYS_HFGWTR_EL2 sys_reg(3, 4, 1, 1, 5) +#define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6) #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) #define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2) #define SYS_SMPRIMAP_EL2 sys_reg(3, 4, 1, 2, 5) #define SYS_SMCR_EL2 sys_reg(3, 4, 1, 2, 6) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) +#define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4) +#define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5) +#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6) #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index e4ba7c189bb2..5ad40c88087b 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -613,6 +613,26 @@ set_hcr: isb ret
+/* Disable any fine grained traps */ +.macro __init_el2_fgt + mrs x1, id_aa64mmfr0_el1 + ubfx x1, x1, #ID_AA64MMFR0_FGT_SHIFT, #4 + cbz x1, .Lskip_fgt_@ + + msr_s SYS_HDFGRTR_EL2, xzr + msr_s SYS_HDFGWTR_EL2, xzr + msr_s SYS_HFGRTR_EL2, xzr + msr_s SYS_HFGWTR_EL2, xzr + msr_s SYS_HFGITR_EL2, xzr + + mrs x1, id_aa64pfr0_el1 // AMU traps UNDEF without AMU + ubfx x1, x1, #ID_AA64PFR0_AMU_SHIFT, #4 + cbz x1, .Lskip_fgt_@ + + msr_s SYS_HAFGRTR_EL2, xzr +.Lskip_fgt_@: +.endm + SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) /* * When VHE is not in use, early init of EL2 and EL1 needs to be @@ -639,6 +659,8 @@ SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector msr_s SYS_ZCR_EL2, x1 // length for EL1.
+ __init_el2_fgt + /* Hypervisor stub */ 7: adr_l x0, __hyp_stub_vectors msr vbar_el2, x0
From: Alexandru Elisei alexandru.elisei@arm.com
mainline inclusion from mainline-v5.15-rc1 commit 50cb99fa89aa2bec2cab2f9917010bbd7769bfa3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Commit 31c00d2aeaa2 ("arm64: Disable fine grained traps on boot") zeroed the fine grained trap registers to prevent unwanted register traps from occuring. However, for the PMSNEVFR_EL1 register, the corresponding HDFG{R,W}TR_EL2.nPMSNEVFR_EL1 fields must be 1 to disable trapping. Set both fields to 1 if FEAT_SPEv1p2 is detected to disable read and write traps.
Fixes: 31c00d2aeaa2 ("arm64: Disable fine grained traps on boot") Cc: stable@vger.kernel.org # 5.13.x Signed-off-by: Alexandru Elisei alexandru.elisei@arm.com Reviewed-by: Mark Brown broonie@kernel.org Acked-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210824154523.906270-1-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Conflict: arch/arm64/include/asm/el2_setup.h does not exist Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/head.S | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 5ad40c88087b..dba96bcde727 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -619,8 +619,17 @@ set_hcr: ubfx x1, x1, #ID_AA64MMFR0_FGT_SHIFT, #4 cbz x1, .Lskip_fgt_@
- msr_s SYS_HDFGRTR_EL2, xzr - msr_s SYS_HDFGWTR_EL2, xzr + mov x0, xzr + mrs x1, id_aa64dfr0_el1 + ubfx x1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 + cmp x1, #3 + b.lt .Lset_fgt_@ + /* Disable PMSNEVFR_EL1 read and write traps */ + orr x0, x0, #(1 << 62) + +.Lset_fgt_@: + msr_s SYS_HDFGRTR_EL2, x0 + msr_s SYS_HDFGWTR_EL2, x0 msr_s SYS_HFGRTR_EL2, xzr msr_s SYS_HFGWTR_EL2, xzr msr_s SYS_HFGITR_EL2, xzr
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit b2cf6a23289b3268cc7915a09c0c8372147b2727 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
SME requires similar setup to that for SVE: disable traps to EL2 and make sure that the maximum vector length is available to EL1, for SME we have two traps - one for SME itself and one for TPIDR2.
In addition since we currently make no active use of priority control for SCMUs we map all SME priorities lower ELs may configure to 0, the architecture specified minimum priority, to ensure that nothing we manage is able to configure itself to consume excessive resources. This will need to be revisited should there be a need to manage SME priorities at runtime.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-8-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/head.S | 64 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index dba96bcde727..fdb7be3307b7 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -623,15 +623,26 @@ set_hcr: mrs x1, id_aa64dfr0_el1 ubfx x1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 cmp x1, #3 - b.lt .Lset_fgt_@ + b.lt .Lset_debug_fgt_@ /* Disable PMSNEVFR_EL1 read and write traps */ orr x0, x0, #(1 << 62)
-.Lset_fgt_@: +.Lset_debug_fgt_@: msr_s SYS_HDFGRTR_EL2, x0 msr_s SYS_HDFGWTR_EL2, x0 - msr_s SYS_HFGRTR_EL2, xzr - msr_s SYS_HFGWTR_EL2, xzr + + mov x0, xzr + mrs x1, id_aa64pfr1_el1 + ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4 + cbz x1, .Lset_fgt_@ + + /* Disable nVHE traps of TPIDR2 and SMPRI */ + orr x0, x0, #HFGxTR_EL2_nSMPRI_EL1_MASK + orr x0, x0, #HFGxTR_EL2_nTPIDR2_EL0_MASK + +.Lset_fgt_@: + msr_s SYS_HFGRTR_EL2, x0 + msr_s SYS_HFGWTR_EL2, x0 msr_s SYS_HFGITR_EL2, xzr
mrs x1, id_aa64pfr0_el1 // AMU traps UNDEF without AMU @@ -642,6 +653,50 @@ set_hcr: .Lskip_fgt_@: .endm
+/* SME register access and priority mapping */ +.macro __init_el2_nvhe_sme + mrs x1, id_aa64pfr1_el1 + ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4 + cbz x1, .Lskip_sme_@ + + bic x0, x0, #CPTR_EL2_TSM // Also disable SME traps + msr cptr_el2, x0 // Disable copro. traps to EL2 + isb + + mrs x1, sctlr_el2 + orr x1, x1, #SCTLR_ELx_ENTP2 // Disable TPIDR2 traps + msr sctlr_el2, x1 + isb + + mov x1, #0 // SMCR controls + + mrs_s x2, SYS_ID_AA64SMFR0_EL1 + ubfx x2, x2, #ID_AA64SMFR0_FA64_SHIFT, #1 // Full FP in SM? + cbz x2, .Lskip_sme_fa64_@ + + orr x1, x1, SMCR_ELx_FA64_MASK +.Lskip_sme_fa64_@: + + orr x1, x1, #SMCR_ELx_LEN_MASK // Enable full SME vector + msr_s SYS_SMCR_EL2, x1 // length for EL1. + + mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported? + ubfx x1, x1, #SYS_SMIDR_EL1_SMPS_SHIFT, #1 + cbz x1, .Lskip_sme_@ + + msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal + + mrs x1, id_aa64mmfr1_el1 // HCRX_EL2 present? + ubfx x1, x1, #ID_AA64MMFR1_HCX_SHIFT, #4 + cbz x1, .Lskip_sme_@ + + mrs_s x1, SYS_HCRX_EL2 + orr x1, x1, #HCRX_EL2_SMPME_MASK // Enable priority mapping + msr_s SYS_HCRX_EL2, x1 + +.Lskip_sme_@: +.endm + SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) /* * When VHE is not in use, early init of EL2 and EL1 needs to be @@ -668,6 +723,7 @@ SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector msr_s SYS_ZCR_EL2, x1 // length for EL1.
+ __init_el2_nvhe_sme __init_el2_fgt
/* Hypervisor stub */
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 5e64b862c4823ab53aac028042abd918c2f27041 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
This patch introduces basic cpufeature support for discovering the presence of the Scalable Matrix Extension.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-9-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- Documentation/arm64/elf_hwcaps.rst | 34 +++++++++++++++++ arch/arm64/include/asm/cpu.h | 1 + arch/arm64/include/asm/cpucaps.h | 2 + arch/arm64/include/asm/cpufeature.h | 12 ++++++ arch/arm64/include/asm/fpsimd.h | 2 + arch/arm64/include/asm/hwcap.h | 8 ++++ arch/arm64/include/uapi/asm/hwcap.h | 8 ++++ arch/arm64/kernel/cpufeature.c | 59 +++++++++++++++++++++++++++++ arch/arm64/kernel/cpuinfo.c | 9 +++++ arch/arm64/kernel/fpsimd.c | 30 +++++++++++++++ 10 files changed, 165 insertions(+)
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst index e88d245d426d..4d594f465b4d 100644 --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -257,6 +257,40 @@ HWCAP2_RPRES
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
+HWCAP2_SME + + Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described + by Documentation/arm64/sme.rst. + +HWCAP2_SME_I16I64 + + Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111. + +HWCAP2_SME_F64F64 + + Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1. + +HWCAP2_SME_I8I32 + + Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111. + +HWCAP2_SME_F16F32 + + Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1. + +HWCAP2_SME_B16F32 + + Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1. + +HWCAP2_SME_F32F32 + + Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1. + +HWCAP2_SME_FA64 + + Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1. + + 4. Unused AT_HWCAP bits -----------------------
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index 24ed6643da26..3479fbaeeb73 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -32,6 +32,7 @@ struct cpuinfo_arm64 { u64 reg_id_aa64pfr0; u64 reg_id_aa64pfr1; u64 reg_id_aa64zfr0; + u64 reg_id_aa64smfr0;
u32 reg_id_dfr0; u32 reg_id_dfr1; diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 3ffa6108c96d..38de6dcecfcd 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -72,6 +72,8 @@ #define ARM64_HAS_ECV 64 #define ARM64_HAS_EPAN 65 #define ARM64_SPECTRE_BHB 66 +#define ARM64_SME 67 +#define ARM64_SME_FA64 68
#define ARM64_NCAPS 80
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 3fa6cf044c52..774f2fb13b07 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -718,6 +718,18 @@ static __always_inline bool system_supports_sve(void) cpus_have_const_cap(ARM64_SVE); }
+static __always_inline bool system_supports_sme(void) +{ + return IS_ENABLED(CONFIG_ARM64_SME) && + cpus_have_const_cap(ARM64_SME); +} + +static __always_inline bool system_supports_fa64(void) +{ + return IS_ENABLED(CONFIG_ARM64_SME) && + cpus_have_const_cap(ARM64_SME_FA64); +} + static __always_inline bool system_supports_cnp(void) { return IS_ENABLED(CONFIG_ARM64_CNP) && diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 814136d00f94..dbf232891b4d 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -75,6 +75,8 @@ extern void sve_set_vq(unsigned long vq_minus_1);
struct arm64_cpu_capabilities; extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); +extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused); +extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void);
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h index b2cb230ef21e..e534aa527161 100644 --- a/arch/arm64/include/asm/hwcap.h +++ b/arch/arm64/include/asm/hwcap.h @@ -108,6 +108,14 @@ #define KERNEL_HWCAP_ECV __khwcap2_feature(ECV) #define KERNEL_HWCAP_AFP __khwcap2_feature(AFP) #define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES) +#define KERNEL_HWCAP_SME __khwcap2_feature(SME) +#define KERNEL_HWCAP_SME_I16I64 __khwcap2_feature(SME_I16I64) +#define KERNEL_HWCAP_SME_F64F64 __khwcap2_feature(SME_F64F64) +#define KERNEL_HWCAP_SME_I8I32 __khwcap2_feature(SME_I8I32) +#define KERNEL_HWCAP_SME_F16F32 __khwcap2_feature(SME_F16F32) +#define KERNEL_HWCAP_SME_B16F32 __khwcap2_feature(SME_B16F32) +#define KERNEL_HWCAP_SME_F32F32 __khwcap2_feature(SME_F32F32) +#define KERNEL_HWCAP_SME_FA64 __khwcap2_feature(SME_FA64)
/* * This yields a mask that user programs can use to figure out what diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h index f03731847d9d..b086717eb0f5 100644 --- a/arch/arm64/include/uapi/asm/hwcap.h +++ b/arch/arm64/include/uapi/asm/hwcap.h @@ -78,5 +78,13 @@ #define HWCAP2_ECV (1 << 19) #define HWCAP2_AFP (1 << 20) #define HWCAP2_RPRES (1 << 21) +#define HWCAP2_SME (1 << 23) +#define HWCAP2_SME_I16I64 (1 << 24) +#define HWCAP2_SME_F64F64 (1 << 25) +#define HWCAP2_SME_I8I32 (1 << 26) +#define HWCAP2_SME_F16F32 (1 << 27) +#define HWCAP2_SME_B16F32 (1 << 28) +#define HWCAP2_SME_F32F32 (1 << 29) +#define HWCAP2_SME_FA64 (1 << 30)
#endif /* _UAPI__ASM_HWCAP_H */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 610bc0671705..5bd971fea36f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -233,6 +233,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { };
static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SME_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE), @@ -265,6 +267,24 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = { ARM64_FTR_END, };
+static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = { + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_FA64_SHIFT, 1, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I16I64_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F64F64_SHIFT, 1, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I8I32_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F16F32_SHIFT, 1, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_B16F32_SHIFT, 1, 0), + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), + FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F32F32_SHIFT, 1, 0), + ARM64_FTR_END, +}; + static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0), @@ -596,6 +616,7 @@ static const struct __ftr_reg_entry { ARM64_FTR_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0), ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1), ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0), + ARM64_FTR_REG(SYS_ID_AA64SMFR0_EL1, ftr_id_aa64smfr0),
/* Op1 = 0, CRn = 0, CRm = 5 */ ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0), @@ -846,6 +867,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0); init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1); init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0); + init_cpu_ftr_reg(SYS_ID_AA64SMFR0_EL1, info->reg_id_aa64smfr0);
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0); @@ -2292,6 +2314,33 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .min_field_value = 1, }, #endif +#ifdef CONFIG_ARM64_SME + { + .desc = "Scalable Matrix Extension", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_SME, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR1_SME_SHIFT, + .field_width = 4, + .min_field_value = ID_AA64PFR1_SME, + .matches = has_cpuid_feature, + .cpu_enable = sme_kernel_enable, + }, + /* FA64 should be sorted after the base SME capability */ + { + .desc = "FA64", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_SME_FA64, + .sys_reg = SYS_ID_AA64SMFR0_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64SMFR0_FA64_SHIFT, + .field_width = 1, + .min_field_value = ID_AA64SMFR0_FA64, + .matches = has_cpuid_feature, + .cpu_enable = fa64_kernel_enable, + }, +#endif /* CONFIG_ARM64_SME */ {}, };
@@ -2416,6 +2465,16 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = { HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV), HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP), HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES), +#ifdef CONFIG_ARM64_SME + HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SME_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_SME, CAP_HWCAP, KERNEL_HWCAP_SME), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_FA64_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_FA64, CAP_HWCAP, KERNEL_HWCAP_SME_FA64), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I16I64_SHIFT, 4, FTR_UNSIGNED, ID_AA64SMFR0_I16I64, CAP_HWCAP, KERNEL_HWCAP_SME_I16I64), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F64F64_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F64F64, CAP_HWCAP, KERNEL_HWCAP_SME_F64F64), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I8I32_SHIFT, 4, FTR_UNSIGNED, ID_AA64SMFR0_I8I32, CAP_HWCAP, KERNEL_HWCAP_SME_I8I32), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F16F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F16F32, CAP_HWCAP, KERNEL_HWCAP_SME_F16F32), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_B16F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_B16F32, CAP_HWCAP, KERNEL_HWCAP_SME_B16F32), + HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F32F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F32F32, CAP_HWCAP, KERNEL_HWCAP_SME_F32F32), +#endif /* CONFIG_ARM64_SME */ {}, };
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index 97dab8f4634f..972315bf8302 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -97,6 +97,14 @@ static const char *const hwcap_str[] = { [KERNEL_HWCAP_ECV] = "ecv", [KERNEL_HWCAP_AFP] = "afp", [KERNEL_HWCAP_RPRES] = "rpres", + [KERNEL_HWCAP_SME] = "sme", + [KERNEL_HWCAP_SME_I16I64] = "smei16i64", + [KERNEL_HWCAP_SME_F64F64] = "smef64f64", + [KERNEL_HWCAP_SME_I8I32] = "smei8i32", + [KERNEL_HWCAP_SME_F16F32] = "smef16f32", + [KERNEL_HWCAP_SME_B16F32] = "smeb16f32", + [KERNEL_HWCAP_SME_F32F32] = "smef32f32", + [KERNEL_HWCAP_SME_FA64] = "smefa64", };
#ifdef CONFIG_AARCH32_EL0 @@ -374,6 +382,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1); info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1); info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1); + info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
/* Update the 32bit ID registers only if AArch32 is implemented */ if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 633319462794..411356fea1a9 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -981,6 +981,32 @@ void fpsimd_release_task(struct task_struct *dead_task)
#endif /* CONFIG_ARM64_SVE */
+#ifdef CONFIG_ARM64_SME + +void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p) +{ + /* Set priority for all PEs to architecturally defined minimum */ + write_sysreg_s(read_sysreg_s(SYS_SMPRI_EL1) & ~SMPRI_EL1_PRIORITY_MASK, + SYS_SMPRI_EL1); + + /* Allow SME in kernel */ + write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1); + isb(); +} + +/* + * This must be called after sme_kernel_enable(), we rely on the + * feature table being sorted to ensure this. + */ +void fa64_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p) +{ + /* Allow use of FA64 */ + write_sysreg_s(read_sysreg_s(SYS_SMCR_EL1) | SMCR_ELx_FA64_MASK, + SYS_SMCR_EL1); +} + +#endif /* CONFIG_ARM64_SVE */ + /* * Trapped SVE access * @@ -1523,6 +1549,10 @@ static int __init fpsimd_init(void) if (!cpu_have_named_feature(ASIMD)) pr_notice("Advanced SIMD is not implemented\n");
+ + if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE)) + pr_notice("SME is implemented but not SVE\n"); + return sve_sysctl_init(); } core_initcall(fpsimd_init);
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit b42990d3bf77cc29d7c33e21518c1f806dae6b21 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The vector lengths used for SME are controlled through a similar set of registers to those for SVE and enumerated using a similar algorithm with some slight differences due to the fact that unlike SVE there are no restrictions on which combinations of vector lengths can be supported nor any mandatory vector lengths which must be implemented. Add a new vector type and implement support for enumerating it.
One slightly awkward feature is that we need to read the current vector length using a different instruction (or enter streaming mode which would have the same issue and be higher cost). Rather than add an ops structure we add special cases directly in the otherwise generic vec_probe_vqs() function, this is a bit inelegant but it's the only place where this is an issue.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-10-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/cpu.h | 3 + arch/arm64/include/asm/cpufeature.h | 7 ++ arch/arm64/include/asm/fpsimd.h | 26 ++++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/kernel/cpufeature.c | 47 +++++++++++ arch/arm64/kernel/cpuinfo.c | 4 + arch/arm64/kernel/entry-fpsimd.S | 9 ++ arch/arm64/kernel/fpsimd.c | 123 +++++++++++++++++++++++++++- 8 files changed, 218 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index 3479fbaeeb73..d05306f3f5e9 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -59,6 +59,9 @@ struct cpuinfo_arm64 {
/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */ u64 reg_zcr; + + /* pseudo-SMCR for recording maximum SMCR_EL1 LEN value: */ + u64 reg_smcr; };
DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data); diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 774f2fb13b07..600fc40b313f 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -599,6 +599,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0) return val > 0; }
+static inline bool id_aa64pfr1_sme(u64 pfr1) +{ + u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_SME_SHIFT); + + return val > 0; +} + void __init setup_cpu_features(void); void check_local_cpu_capabilities(void);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index dbf232891b4d..6cb4b10a2c93 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -79,6 +79,7 @@ extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused); extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void); +extern u64 read_smcr_features(void);
/* * Helpers to translate bit indices in sve_vq_map to VQ values (and @@ -175,6 +176,12 @@ static inline void write_vl(enum vec_type type, u64 val) tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK; write_sysreg_s(tmp | val, SYS_ZCR_EL1); break; +#endif +#ifdef CONFIG_ARM64_SME + case ARM64_VEC_SME: + tmp = read_sysreg_s(SYS_SMCR_EL1) & ~SMCR_ELx_LEN_MASK; + write_sysreg_s(tmp | val, SYS_SMCR_EL1); + break; #endif default: WARN_ON_ONCE(1); @@ -266,12 +273,31 @@ static inline void sme_smstop(void) asm volatile(__msr_s(SYS_SVCR_SMSTOP_SMZA_EL0, "xzr")); }
+extern void __init sme_setup(void); + +static inline int sme_max_vl(void) +{ + return vec_max_vl(ARM64_VEC_SME); +} + +static inline int sme_max_virtualisable_vl(void) +{ + return vec_max_virtualisable_vl(ARM64_VEC_SME); +} + +extern unsigned int sme_get_vl(void); + #else
static inline void sme_smstart_sm(void) { } static inline void sme_smstop_sm(void) { } static inline void sme_smstop(void) { }
+static inline void sme_setup(void) { } +static inline unsigned int sme_get_vl(void) { return 0; } +static inline int sme_max_vl(void) { return 0; } +static inline int sme_max_virtualisable_vl(void) { return 0; } + #endif /* ! CONFIG_ARM64_SME */
/* For use by EFI runtime services calls only */ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 39fe01aaf76d..a121b4976d10 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -115,6 +115,7 @@ struct debug_info {
enum vec_type { ARM64_VEC_SVE = 0, + ARM64_VEC_SME, ARM64_VEC_MAX, };
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5bd971fea36f..b443de8bcf70 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -545,6 +545,12 @@ static const struct arm64_ftr_bits ftr_zcr[] = { ARM64_FTR_END, };
+static const struct arm64_ftr_bits ftr_smcr[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, + SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_SIZE, 0), /* LEN */ + ARM64_FTR_END, +}; + /* * Common ftr bits for a 32bit register with all hidden, strict * attributes, with 4bit feature fields and a default safe value of @@ -634,6 +640,7 @@ static const struct __ftr_reg_entry {
/* Op1 = 0, CRn = 1, CRm = 2 */ ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr), + ARM64_FTR_REG(SYS_SMCR_EL1, ftr_smcr),
/* Op1 = 3, CRn = 0, CRm = 0 */ { SYS_CTR_EL0, &arm64_ftr_reg_ctrel0 }, @@ -898,6 +905,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) vec_init_vq_map(ARM64_VEC_SVE); }
+ if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) { + init_cpu_ftr_reg(SYS_SMCR_EL1, info->reg_smcr); + if (IS_ENABLED(CONFIG_ARM64_SME)) + vec_init_vq_map(ARM64_VEC_SME); + } + /* * Initialize the indirect array of CPU hwcaps capabilities pointers * before we handle the boot CPU below. @@ -1113,6 +1126,9 @@ void update_cpu_features(int cpu, taint |= check_update_ftr_reg(SYS_ID_AA64ZFR0_EL1, cpu, info->reg_id_aa64zfr0, boot->reg_id_aa64zfr0);
+ taint |= check_update_ftr_reg(SYS_ID_AA64SMFR0_EL1, cpu, + info->reg_id_aa64smfr0, boot->reg_id_aa64smfr0); + if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) { taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu, info->reg_zcr, boot->reg_zcr); @@ -1123,6 +1139,16 @@ void update_cpu_features(int cpu, vec_update_vq_map(ARM64_VEC_SVE); }
+ if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) { + taint |= check_update_ftr_reg(SYS_SMCR_EL1, cpu, + info->reg_smcr, boot->reg_smcr); + + /* Probe vector lengths, unless we already gave up on SME */ + if (id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1)) && + !system_capabilities_finalized()) + vec_update_vq_map(ARM64_VEC_SME); + } + /* * This relies on a sanitised view of the AArch64 ID registers * (e.g. SYS_ID_AA64PFR0_EL1), so we call it last. @@ -2764,6 +2790,23 @@ static void verify_sve_features(void) /* Add checks on other ZCR bits here if necessary */ }
+static void verify_sme_features(void) +{ + u64 safe_smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1); + u64 smcr = read_smcr_features(); + + unsigned int safe_len = safe_smcr & SMCR_ELx_LEN_MASK; + unsigned int len = smcr & SMCR_ELx_LEN_MASK; + + if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SME)) { + pr_crit("CPU%d: SME: vector length support mismatch\n", + smp_processor_id()); + cpu_die_early(); + } + + /* Add checks on other SMCR bits here if necessary */ +} + static void verify_hyp_capabilities(void) { u64 safe_mmfr1, mmfr0, mmfr1; @@ -2820,6 +2863,9 @@ static void verify_local_cpu_capabilities(void) if (system_supports_sve()) verify_sve_features();
+ if (system_supports_sme()) + verify_sme_features(); + if (is_hyp_mode_available()) verify_hyp_capabilities(); } @@ -2936,6 +2982,7 @@ void __init setup_cpu_features(void) pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
sve_setup(); + sme_setup(); minsigstksz_setup();
/* Advertise that we have computed the system capabilities */ diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index 972315bf8302..3899083ff555 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -414,6 +414,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) id_aa64pfr0_sve(info->reg_id_aa64pfr0)) info->reg_zcr = read_zcr_features();
+ if (IS_ENABLED(CONFIG_ARM64_SME) && + id_aa64pfr1_sme(info->reg_id_aa64pfr1)) + info->reg_smcr = read_smcr_features(); + cpuinfo_detect_icache_policy(info); }
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index a5f9475ce6eb..048dfcfadaed 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -72,3 +72,12 @@ SYM_FUNC_START(sve_flush_live) SYM_FUNC_END(sve_flush_live)
#endif /* CONFIG_ARM64_SVE */ + +#ifdef CONFIG_ARM64_SME + +SYM_FUNC_START(sme_get_vl) + _sme_rdsvl 0, 1 + ret +SYM_FUNC_END(sme_get_vl) + +#endif /* CONFIG_ARM64_SME */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 411356fea1a9..6cc51288531f 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -132,6 +132,12 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = { .max_virtualisable_vl = SVE_VL_MIN, }, #endif +#ifdef CONFIG_ARM64_SME + [ARM64_VEC_SME] = { + .type = ARM64_VEC_SME, + .name = "SME", + }, +#endif };
static unsigned int vec_vl_inherit_flag(enum vec_type type) @@ -182,6 +188,20 @@ extern void __percpu *efi_sve_state;
#endif /* ! CONFIG_ARM64_SVE */
+#ifdef CONFIG_ARM64_SME + +static int get_sme_default_vl(void) +{ + return get_default_vl(ARM64_VEC_SME); +} + +static void set_sme_default_vl(int val) +{ + set_default_vl(ARM64_VEC_SME, val); +} + +#endif + DEFINE_PER_CPU(bool, fpsimd_context_busy); EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
@@ -397,6 +417,8 @@ static unsigned int find_supported_vector_length(enum vec_type type,
if (vl > max_vl) vl = max_vl; + if (vl < info->min_vl) + vl = info->min_vl;
bit = find_next_bit(info->vq_map, SVE_VQ_MAX, __vq_to_bit(sve_vq_from_vl(vl))); @@ -758,7 +780,23 @@ static void vec_probe_vqs(struct vl_info *info,
for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) { write_vl(info->type, vq - 1); /* self-syncing */ - vl = sve_get_vl(); + + switch (info->type) { + case ARM64_VEC_SVE: + vl = sve_get_vl(); + break; + case ARM64_VEC_SME: + vl = sme_get_vl(); + break; + default: + vl = 0; + break; + } + + /* Minimum VL identified? */ + if (sve_vq_from_vl(vl) > vq) + break; + vq = sve_vq_from_vl(vl); /* skip intervening lengths */ set_bit(__vq_to_bit(vq), map); } @@ -1005,7 +1043,88 @@ void fa64_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p) SYS_SMCR_EL1); }
-#endif /* CONFIG_ARM64_SVE */ +/* + * Read the pseudo-SMCR used by cpufeatures to identify the supported + * vector length. + * + * Use only if SME is present. + * This function clobbers the SME vector length. + */ +u64 read_smcr_features(void) +{ + u64 smcr; + unsigned int vq_max; + + sme_kernel_enable(NULL); + sme_smstart_sm(); + + /* + * Set the maximum possible VL. + */ + write_sysreg_s(read_sysreg_s(SYS_SMCR_EL1) | SMCR_ELx_LEN_MASK, + SYS_SMCR_EL1); + + smcr = read_sysreg_s(SYS_SMCR_EL1); + smcr &= ~(u64)SMCR_ELx_LEN_MASK; /* Only the LEN field */ + vq_max = sve_vq_from_vl(sve_get_vl()); + smcr |= vq_max - 1; /* set LEN field to maximum effective value */ + + sme_smstop_sm(); + + return smcr; +} + +void __init sme_setup(void) +{ + struct vl_info *info = &vl_info[ARM64_VEC_SME]; + u64 smcr; + int min_bit; + + if (!system_supports_sme()) + return; + + /* + * SME doesn't require any particular vector length be + * supported but it does require at least one. We should have + * disabled the feature entirely while bringing up CPUs but + * let's double check here. + */ + WARN_ON(bitmap_empty(info->vq_map, SVE_VQ_MAX)); + + min_bit = find_last_bit(info->vq_map, SVE_VQ_MAX); + info->min_vl = sve_vl_from_vq(__bit_to_vq(min_bit)); + + smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1); + info->max_vl = sve_vl_from_vq((smcr & SMCR_ELx_LEN_MASK) + 1); + + /* + * Sanity-check that the max VL we determined through CPU features + * corresponds properly to sme_vq_map. If not, do our best: + */ + if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SME, + info->max_vl))) + info->max_vl = find_supported_vector_length(ARM64_VEC_SME, + info->max_vl); + + WARN_ON(info->min_vl > info->max_vl); + + /* + * For the default VL, pick the maximum supported value <= 32 + * (256 bits) if there is one since this is guaranteed not to + * grow the signal frame when in streaming mode, otherwise the + * minimum available VL will be used. + */ + set_sme_default_vl(find_supported_vector_length(ARM64_VEC_SME, 32)); + + pr_info("SME: minimum available vector length %u bytes per vector\n", + info->min_vl); + pr_info("SME: maximum available vector length %u bytes per vector\n", + info->max_vl); + pr_info("SME: default vector length %u bytes per vector\n", + get_sme_default_vl()); +} + +#endif /* CONFIG_ARM64_SME */
/* * Trapped SVE access
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 12f1bacfc5d9e55bedbfc7a25bf42ff6d19d1dab category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
As for SVE provide a sysctl which allows the default SME vector length to be configured.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-11-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 6cc51288531f..71690049a97e 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -477,6 +477,30 @@ static int __init sve_sysctl_init(void) static int __init sve_sysctl_init(void) { return 0; } #endif /* ! (CONFIG_ARM64_SVE && CONFIG_SYSCTL) */
+#if defined(CONFIG_ARM64_SME) && defined(CONFIG_SYSCTL) +static struct ctl_table sme_default_vl_table[] = { + { + .procname = "sme_default_vector_length", + .mode = 0644, + .proc_handler = vec_proc_do_default_vl, + .extra1 = &vl_info[ARM64_VEC_SME], + }, + { } +}; + +static int __init sme_sysctl_init(void) +{ + if (system_supports_sme()) + if (!register_sysctl("abi", sme_default_vl_table)) + return -EINVAL; + + return 0; +} + +#else /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */ +static int __init sme_sysctl_init(void) { return 0; } +#endif /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */ + #define ZREG(sve_state, vq, n) ((char *)(sve_state) + \ (SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET))
@@ -1672,6 +1696,9 @@ static int __init fpsimd_init(void) if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE)) pr_notice("SME is implemented but not SVE\n");
- return sve_sysctl_init(); + sve_sysctl_init(); + sme_sysctl_init(); + + return 0; } core_initcall(fpsimd_init);
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 9e4ab6c89109472082616f8d2f6ada7deaffe161 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
As for SVE provide a prctl() interface which allows processes to configure their SME vector length.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-12-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 4 ++++ arch/arm64/include/asm/processor.h | 4 +++- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/fpsimd.c | 32 ++++++++++++++++++++++++++++ include/uapi/linux/prctl.h | 9 ++++++++ kernel/sys.c | 12 +++++++++++ 6 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 6cb4b10a2c93..c6804639319e 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -286,6 +286,8 @@ static inline int sme_max_virtualisable_vl(void) }
extern unsigned int sme_get_vl(void); +extern int sme_set_current_vl(unsigned long arg); +extern int sme_get_current_vl(void);
#else
@@ -297,6 +299,8 @@ static inline void sme_setup(void) { } static inline unsigned int sme_get_vl(void) { return 0; } static inline int sme_max_vl(void) { return 0; } static inline int sme_max_virtualisable_vl(void) { return 0; } +static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; } +static inline int sme_get_current_vl(void) { return -EINVAL; }
#endif /* ! CONFIG_ARM64_SME */
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index a121b4976d10..ff37d2d618d0 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -352,9 +352,11 @@ extern void __init minsigstksz_setup(void); */ #include <asm/fpsimd.h>
-/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ +/* Userspace interface for PR_S[MV]E_{SET,GET}_VL prctl()s: */ #define SVE_SET_VL(arg) sve_set_current_vl(arg) #define SVE_GET_VL() sve_get_current_vl() +#define SME_SET_VL(arg) sme_set_current_vl(arg) +#define SME_GET_VL() sme_get_current_vl()
/* PR_PAC_RESET_KEYS prctl */ #define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index be1fe9363968..5be2aedb2614 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -86,6 +86,7 @@ void arch_release_task_struct(struct task_struct *tsk); #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_32BIT_AARCH64 27 /* 32 bit process on AArch64(ILP32) */ #define TIF_PATCH_PENDING 28 /* pending live patching update */ +#define TIF_SME_VL_INHERIT 30 /* Inherit SME vl_onexec across exec */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 71690049a97e..0d1eb210fd4d 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -145,6 +145,8 @@ static unsigned int vec_vl_inherit_flag(enum vec_type type) switch (type) { case ARM64_VEC_SVE: return TIF_SVE_VL_INHERIT; + case ARM64_VEC_SME: + return TIF_SME_VL_INHERIT; default: WARN_ON_ONCE(1); return 0; @@ -795,6 +797,36 @@ int sve_get_current_vl(void) return vec_prctl_status(ARM64_VEC_SVE, 0); }
+#ifdef CONFIG_ARM64_SME +/* PR_SME_SET_VL */ +int sme_set_current_vl(unsigned long arg) +{ + unsigned long vl, flags; + int ret; + + vl = arg & PR_SME_VL_LEN_MASK; + flags = arg & ~vl; + + if (!system_supports_sme() || is_compat_task()) + return -EINVAL; + + ret = vec_set_vector_length(current, ARM64_VEC_SME, vl, flags); + if (ret) + return ret; + + return vec_prctl_status(ARM64_VEC_SME, flags); +} + +/* PR_SME_GET_VL */ +int sme_get_current_vl(void) +{ + if (!system_supports_sme() || is_compat_task()) + return -EINVAL; + + return vec_prctl_status(ARM64_VEC_SME, 0); +} +#endif /* CONFIG_ARM64_SME */ + static void vec_probe_vqs(struct vl_info *info, DECLARE_BITMAP(map, SVE_VQ_MAX)) { diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 7f0827705c9a..09edcd6c1689 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -247,4 +247,13 @@ struct prctl_mm_map { #define PR_SET_IO_FLUSHER 57 #define PR_GET_IO_FLUSHER 58
+/* arm64 Scalable Matrix Extension controls */ +/* Flag values must be in sync with SVE versions */ +#define PR_SME_SET_VL 63 /* set task vector length */ +# define PR_SME_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */ +#define PR_SME_GET_VL 64 /* get task vector length */ +/* Bits common to PR_SME_SET_VL and PR_SME_GET_VL */ +# define PR_SME_VL_LEN_MASK 0xffff +# define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */ + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 24a3a28ae228..980015c3ebfb 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -116,6 +116,12 @@ #ifndef SVE_GET_VL # define SVE_GET_VL() (-EINVAL) #endif +#ifndef SME_SET_VL +# define SME_SET_VL(a) (-EINVAL) +#endif +#ifndef SME_GET_VL +# define SME_GET_VL() (-EINVAL) +#endif #ifndef PAC_RESET_KEYS # define PAC_RESET_KEYS(a, b) (-EINVAL) #endif @@ -2475,6 +2481,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_SVE_GET_VL: error = SVE_GET_VL(); break; + case PR_SME_SET_VL: + error = SME_SET_VL(arg2); + break; + case PR_SME_GET_VL: + error = SME_GET_VL(); + break; case PR_GET_SPECULATION_CTRL: if (arg3 || arg4 || arg5) return -EINVAL;
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit a9d69158595017d260ab37bf88b8f125e5e8144c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The Scalable Matrix Extension introduces support for a new thread specific data register TPIDR2 intended for use by libc. The kernel must save the value of TPIDR2 on context switch and should ensure that all new threads start off with a default value of 0. Add a field to the thread_struct to store TPIDR2 and context switch it with the other thread specific data.
In case there are future extensions which also use TPIDR2 we introduce system_supports_tpidr2() and use that rather than system_supports_sme() for TPIDR2 handling.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-13-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/cpufeature.h | 5 +++++ arch/arm64/include/asm/processor.h | 2 +- arch/arm64/kernel/fpsimd.c | 4 ++++ arch/arm64/kernel/process.c | 15 +++++++++++++-- 4 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 600fc40b313f..e0dc240946aa 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -737,6 +737,11 @@ static __always_inline bool system_supports_fa64(void) cpus_have_const_cap(ARM64_SME_FA64); }
+static __always_inline bool system_supports_tpidr2(void) +{ + return system_supports_sme(); +} + static __always_inline bool system_supports_cnp(void) { return IS_ENABLED(CONFIG_ARM64_CNP) && diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index ff37d2d618d0..37cb6c10d67b 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -166,7 +166,7 @@ struct thread_struct { #endif KABI_USE(1, unsigned int vl[ARM64_VEC_MAX]) KABI_USE(2, unsigned int vl_onexec[ARM64_VEC_MAX]) - KABI_RESERVE(3) + KABI_USE(3, u64 tpidr2_el0) KABI_RESERVE(4) KABI_RESERVE(5) KABI_RESERVE(6) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 0d1eb210fd4d..1956bcc5713f 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1086,6 +1086,10 @@ void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p) /* Allow SME in kernel */ write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1); isb(); + + /* Allow EL0 to access TPIDR2 */ + write_sysreg(read_sysreg(SCTLR_EL1) | SCTLR_ELx_ENTP2, SCTLR_EL1); + isb(); }
/* diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 7317c79db9e8..f8e8a1335bed 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -320,6 +320,9 @@ void show_regs(struct pt_regs * regs) static void tls_thread_flush(void) { write_sysreg(0, tpidr_el0); + if (system_supports_tpidr2()) + write_sysreg_s(0, SYS_TPIDR2_EL0); +
if (is_a32_compat_task()) { current->thread.uw.tp_value = 0; @@ -414,6 +417,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, * out-of-sync with the saved value. */ *task_user_tls(p) = read_sysreg(tpidr_el0); + if (system_supports_tpidr2()) + p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
if (stack_start) { if (is_a32_compat_thread(task_thread_info(p))) @@ -424,10 +429,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
/* * If a TLS pointer was passed to clone, use it for the new - * thread. + * thread. We also reset TPIDR2 if it's in use. */ - if (clone_flags & CLONE_SETTLS) + if (clone_flags & CLONE_SETTLS) { p->thread.uw.tp_value = tls; + p->thread.tpidr2_el0 = 0; + } } else { /* * A kthread has no context to ERET to, so ensure any buggy @@ -453,6 +460,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, void tls_preserve_current_state(void) { *task_user_tls(current) = read_sysreg(tpidr_el0); + if (system_supports_tpidr2() && !is_compat_task()) + current->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0); }
static void tls_thread_switch(struct task_struct *next) @@ -465,6 +474,8 @@ static void tls_thread_switch(struct task_struct *next) write_sysreg(0, tpidrro_el0);
write_sysreg(*task_user_tls(next), tpidr_el0); + if (system_supports_tpidr2()) + write_sysreg_s(next->thread.tpidr2_el0, SYS_TPIDR2_EL0); }
/*
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit b40c559b45bec736f588c57dd5be967fe573058b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
In SME the use of both streaming SVE mode and ZA are tracked through PSTATE.SM and PSTATE.ZA, visible through the system register SVCR. In order to context switch the floating point state for SME we need to context switch the contents of this register as part of context switching the floating point state.
Since changing the vector length exits streaming SVE mode and disables ZA we also make sure we update SVCR appropriately when setting vector length, and similarly ensure that new threads have streaming SVE mode and ZA disabled.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-14-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 3 ++- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/fpsimd.c | 18 +++++++++++++++++- arch/arm64/kernel/process.c | 2 ++ arch/arm64/kvm/fpsimd.c | 7 ++++++- 6 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index c6804639319e..9c37b98327b5 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -47,7 +47,8 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
extern void fpsimd_bind_task_to_cpu(void); extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, - void *sve_state, unsigned int sve_vl); + void *sve_state, unsigned int sve_vl, + u64 *svcr);
extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_save_and_flush_cpu_state(void); diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 37cb6c10d67b..39aef25b233f 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -167,7 +167,7 @@ struct thread_struct { KABI_USE(1, unsigned int vl[ARM64_VEC_MAX]) KABI_USE(2, unsigned int vl_onexec[ARM64_VEC_MAX]) KABI_USE(3, u64 tpidr2_el0) - KABI_RESERVE(4) + KABI_USE(4, u64 svcr) KABI_RESERVE(5) KABI_RESERVE(6) KABI_RESERVE(7) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 5be2aedb2614..3eb1934927fe 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -86,6 +86,7 @@ void arch_release_task_struct(struct task_struct *tsk); #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_32BIT_AARCH64 27 /* 32 bit process on AArch64(ILP32) */ #define TIF_PATCH_PENDING 28 /* pending live patching update */ +#define TIF_SME 29 /* SME in use */ #define TIF_SME_VL_INHERIT 30 /* Inherit SME vl_onexec across exec */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1956bcc5713f..6bc7bd157cf4 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -117,6 +117,7 @@ struct fpsimd_last_state_struct { struct user_fpsimd_state *st; void *sve_state; + u64 *svcr; unsigned int sve_vl; };
@@ -353,6 +354,9 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
+ if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME)) + write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0); + if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) { sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1); sve_load_state(sve_pffr(¤t->thread), @@ -378,6 +382,12 @@ static void fpsimd_save(void) if (test_thread_flag(TIF_FOREIGN_FPSTATE)) return;
+ if (IS_ENABLED(CONFIG_ARM64_SME) && + test_thread_flag(TIF_SME)) { + u64 *svcr = last->svcr; + *svcr = read_sysreg_s(SYS_SVCR_EL0); + } + if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) { if (WARN_ON(sve_get_vl() != last->sve_vl)) { @@ -729,6 +739,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) sve_to_fpsimd(task);
+ if (system_supports_sme() && type == ARM64_VEC_SME) + task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK | + SYS_SVCR_EL0_ZA_MASK); + if (task == current) put_cpu_fpsimd_context();
@@ -1392,6 +1406,7 @@ void fpsimd_bind_task_to_cpu(void) last->st = ¤t->thread.uw.fpsimd_state; last->sve_state = current->thread.sve_state; last->sve_vl = task_get_sve_vl(current); + last->svcr = ¤t->thread.svcr; current->thread.fpsimd_cpu = smp_processor_id();
if (system_supports_sve()) { @@ -1406,7 +1421,7 @@ void fpsimd_bind_task_to_cpu(void) }
void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, - unsigned int sve_vl) + unsigned int sve_vl, u64 *svcr) { struct fpsimd_last_state_struct *last = this_cpu_ptr(&fpsimd_last_state); @@ -1415,6 +1430,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, WARN_ON(!in_softirq() && !irqs_disabled());
last->st = st; + last->svcr = svcr; last->sve_state = sve_state; last->sve_vl = sve_vl; } diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index f8e8a1335bed..c658967bb1d3 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -382,6 +382,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) dst->thread.sve_state = NULL; clear_tsk_thread_flag(dst, TIF_SVE);
+ dst->thread.svcr = 0; + /* clear any pending asynchronous tag fault raised by the parent */ clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 5621020b28de..12d0732d5aec 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -97,9 +97,14 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { + /* + * Currently we do not support SME guests so SVCR is + * always 0 and we just need a variable to point to. + */ fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs, vcpu->arch.sve_state, - vcpu->arch.sve_max_vl); + vcpu->arch.sve_max_vl, + NULL);
clear_thread_flag(TIF_FOREIGN_FPSTATE); update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit af7167d6d2675f3343eff3ad6c9b4a8e30122e2c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When in streaming mode we need to save and restore the streaming mode SVE register state rather than the regular SVE register state. This uses the streaming mode vector length and omits FFR but is otherwise identical, if TIF_SVE is enabled when we are in streaming mode then streaming mode takes precedence.
This does not handle use of streaming SVE state with KVM, ptrace or signals. This will be updated in further patches.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-15-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 22 +++++- arch/arm64/include/asm/fpsimdmacros.h | 11 +++ arch/arm64/include/asm/processor.h | 10 +++ arch/arm64/kernel/entry-fpsimd.S | 5 ++ arch/arm64/kernel/fpsimd.c | 109 +++++++++++++++++++++----- arch/arm64/kvm/fpsimd.c | 2 +- 6 files changed, 136 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 9c37b98327b5..08f62c63b658 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -48,11 +48,21 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); extern void fpsimd_bind_task_to_cpu(void); extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, void *sve_state, unsigned int sve_vl, - u64 *svcr); + unsigned int sme_vl, u64 *svcr);
extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_save_and_flush_cpu_state(void);
+static inline bool thread_sm_enabled(struct thread_struct *thread) +{ + return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK); +} + +static inline bool thread_za_enabled(struct thread_struct *thread) +{ + return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_ZA_MASK); +} + /* Maximum VL that SVE/SME VL-agnostic software can transparently support */ #define VL_ARCH_MAX 0x100
@@ -64,7 +74,14 @@ static inline size_t sve_ffr_offset(int vl)
static inline void *sve_pffr(struct thread_struct *thread) { - return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread)); + unsigned int vl; + + if (system_supports_sme() && thread_sm_enabled(thread)) + vl = thread_get_sme_vl(thread); + else + vl = thread_get_sve_vl(thread); + + return (char *)thread->sve_state + sve_ffr_offset(vl); }
extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); @@ -73,6 +90,7 @@ extern void sve_load_state(void const *state, u32 const *pfpsr, extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); extern void sve_set_vq(unsigned long vq_minus_1); +extern void sme_set_vq(unsigned long vq_minus_1);
struct arm64_cpu_capabilities; extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index e638890fef27..fef5db370913 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -261,6 +261,17 @@ 921: .endm
+/* Update SMCR_EL1.LEN with the new VQ */ +.macro sme_load_vq xvqminus1, xtmp, xtmp2 + mrs_s \xtmp, SYS_SMCR_EL1 + bic \xtmp2, \xtmp, SMCR_ELx_LEN_MASK + orr \xtmp2, \xtmp2, \xvqminus1 + cmp \xtmp2, \xtmp + b.eq 921f + msr_s SYS_SMCR_EL1, \xtmp2 //self-synchronising +921: +.endm + /* Preserve the first 128-bits of Znz and zero the rest. */ .macro _sve_flush_z nz _sve_check_zreg \nz diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 39aef25b233f..9734a312dddd 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -185,6 +185,11 @@ static inline unsigned int thread_get_sve_vl(struct thread_struct *thread) return thread_get_vl(thread, ARM64_VEC_SVE); }
+static inline unsigned int thread_get_sme_vl(struct thread_struct *thread) +{ + return thread_get_vl(thread, ARM64_VEC_SME); +} + unsigned int task_get_vl(const struct task_struct *task, enum vec_type type); void task_set_vl(struct task_struct *task, enum vec_type type, unsigned long vl); @@ -198,6 +203,11 @@ static inline unsigned int task_get_sve_vl(const struct task_struct *task) return task_get_vl(task, ARM64_VEC_SVE); }
+static inline unsigned int task_get_sme_vl(const struct task_struct *task) +{ + return task_get_vl(task, ARM64_VEC_SME); +} + static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl) { task_set_vl(task, ARM64_VEC_SVE, vl); diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 048dfcfadaed..25bcb6a39e67 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -80,4 +80,9 @@ SYM_FUNC_START(sme_get_vl) ret SYM_FUNC_END(sme_get_vl)
+SYM_FUNC_START(sme_set_vq) + sme_load_vq x0, x1, x2 + ret +SYM_FUNC_END(sme_set_vq) + #endif /* CONFIG_ARM64_SME */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 6bc7bd157cf4..967a38277fbd 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -119,6 +119,7 @@ struct fpsimd_last_state_struct { void *sve_state; u64 *svcr; unsigned int sve_vl; + unsigned int sme_vl; };
static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state); @@ -295,17 +296,28 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type, task->thread.vl_onexec[type] = vl; }
+/* + * TIF_SME controls whether a task can use SME without trapping while + * in userspace, when TIF_SME is set then we must have storage + * alocated in sve_state and za_state to store the contents of both ZA + * and the SVE registers for both streaming and non-streaming modes. + * + * If both SVCR.ZA and SVCR.SM are disabled then at any point we + * may disable TIF_SME and reenable traps. + */ + + /* * TIF_SVE controls whether a task can use SVE without trapping while - * in userspace, and also the way a task's FPSIMD/SVE state is stored - * in thread_struct. + * in userspace, and also (together with TIF_SME) the way a task's + * FPSIMD/SVE state is stored in thread_struct. * * The kernel uses this flag to track whether a user task is actively * using SVE, and therefore whether full SVE register state needs to * be tracked. If not, the cheaper FPSIMD context handling code can * be used instead of the more costly SVE equivalents. * - * * TIF_SVE set: + * * TIF_SVE or SVCR.SM set: * * The task can execute SVE instructions while in userspace without * trapping to the kernel. @@ -313,7 +325,8 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type, * When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the * corresponding Zn), P0-P15 and FFR are encoded in in * task->thread.sve_state, formatted appropriately for vector - * length task->thread.sve_vl. + * length task->thread.sve_vl or, if SVCR.SM is set, + * task->thread.sme_vl. * * task->thread.sve_state must point to a valid buffer at least * sve_state_size(task) bytes in size. @@ -351,19 +364,40 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type, */ static void task_fpsimd_load(void) { + bool restore_sve_regs = false; + bool restore_ffr; + WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context());
- if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME)) - write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0); - + /* Check if we should restore SVE first */ if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) { sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1); + restore_sve_regs = true; + restore_ffr = true; + } + + /* Restore SME, override SVE register configuration if needed */ + if (system_supports_sme()) { + unsigned long sme_vl = task_get_sme_vl(current); + + if (test_thread_flag(TIF_SME)) + sme_set_vq(sve_vq_from_vl(sme_vl) - 1); + + write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0); + + if (thread_sm_enabled(¤t->thread)) { + restore_sve_regs = true; + restore_ffr = system_supports_fa64(); + } + } + + if (restore_sve_regs) sve_load_state(sve_pffr(¤t->thread), - ¤t->thread.uw.fpsimd_state.fpsr, true); - } else { + ¤t->thread.uw.fpsimd_state.fpsr, + restore_ffr); + else fpsimd_load_state(¤t->thread.uw.fpsimd_state); - } }
/* @@ -375,6 +409,9 @@ static void fpsimd_save(void) struct fpsimd_last_state_struct const *last = this_cpu_ptr(&fpsimd_last_state); /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */ + bool save_sve_regs = false; + bool save_ffr; + unsigned int vl;
WARN_ON(!system_supports_fpsimd()); WARN_ON(!have_cpu_fpsimd_context()); @@ -382,15 +419,33 @@ static void fpsimd_save(void) if (test_thread_flag(TIF_FOREIGN_FPSTATE)) return;
- if (IS_ENABLED(CONFIG_ARM64_SME) && - test_thread_flag(TIF_SME)) { + if (test_thread_flag(TIF_SVE)) { + save_sve_regs = true; + save_ffr = true; + vl = last->sve_vl; + } + + if (system_supports_sme()) { u64 *svcr = last->svcr; *svcr = read_sysreg_s(SYS_SVCR_EL0); + + if (thread_za_enabled(¤t->thread)) { + /* ZA state managment is not implemented yet */ + force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); + return; + } + + /* If we are in streaming mode override regular SVE. */ + if (*svcr & SYS_SVCR_EL0_SM_MASK) { + save_sve_regs = true; + save_ffr = system_supports_fa64(); + vl = last->sme_vl; + } }
- if (IS_ENABLED(CONFIG_ARM64_SVE) && - test_thread_flag(TIF_SVE)) { - if (WARN_ON(sve_get_vl() != last->sve_vl)) { + if (IS_ENABLED(CONFIG_ARM64_SVE) && save_sve_regs) { + /* Get the configured VL from RDVL, will account for SM */ + if (WARN_ON(sve_get_vl() != vl)) { /* * Can't save the user regs, so current would * re-enter user with corrupt state. @@ -401,8 +456,8 @@ static void fpsimd_save(void) }
sve_save_state((char *)last->sve_state + - sve_ffr_offset(last->sve_vl), - &last->st->fpsr, true); + sve_ffr_offset(vl), + &last->st->fpsr, save_ffr); } else { fpsimd_save_state(last->st); } @@ -607,7 +662,14 @@ static void sve_to_fpsimd(struct task_struct *task) */ size_t sve_state_size(struct task_struct const *task) { - return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task))); + unsigned int vl = 0; + + if (system_supports_sve()) + vl = task_get_sve_vl(task); + if (system_supports_sme()) + vl = max(vl, task_get_sme_vl(task)); + + return SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)); }
/* @@ -736,7 +798,8 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, }
fpsimd_flush_task_state(task); - if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) + if (test_and_clear_tsk_thread_flag(task, TIF_SVE) || + thread_sm_enabled(&task->thread)) sve_to_fpsimd(task);
if (system_supports_sme() && type == ARM64_VEC_SME) @@ -1363,6 +1426,9 @@ void fpsimd_flush_thread(void) fpsimd_flush_thread_vl(ARM64_VEC_SVE); }
+ if (system_supports_sme()) + fpsimd_flush_thread_vl(ARM64_VEC_SME); + put_cpu_fpsimd_context(); }
@@ -1406,6 +1472,7 @@ void fpsimd_bind_task_to_cpu(void) last->st = ¤t->thread.uw.fpsimd_state; last->sve_state = current->thread.sve_state; last->sve_vl = task_get_sve_vl(current); + last->sme_vl = task_get_sme_vl(current); last->svcr = ¤t->thread.svcr; current->thread.fpsimd_cpu = smp_processor_id();
@@ -1421,7 +1488,8 @@ void fpsimd_bind_task_to_cpu(void) }
void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, - unsigned int sve_vl, u64 *svcr) + unsigned int sve_vl, unsigned int sme_vl, + u64 *svcr) { struct fpsimd_last_state_struct *last = this_cpu_ptr(&fpsimd_last_state); @@ -1433,6 +1501,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, last->svcr = svcr; last->sve_state = sve_state; last->sve_vl = sve_vl; + last->sme_vl = sme_vl; }
/* diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 12d0732d5aec..4af62af52b95 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -104,7 +104,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs, vcpu->arch.sve_state, vcpu->arch.sve_max_vl, - NULL); + 0, NULL);
clear_thread_flag(TIF_FOREIGN_FPSTATE); update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 0033cd9339642f9b7bef23f96aa2e7277ab51cce category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Allocate space for storing ZA on first access to SME and use that to save and restore ZA state when context switching. We do this by using the vector form of the LDR and STR ZA instructions, these do not require streaming mode and have implementation recommendations that they avoid contention issues in shared SMCU implementations.
Since ZA is architecturally guaranteed to be zeroed when enabled we do not need to explicitly zero ZA, either we will be restoring from a saved copy or trapping on first use of SME so we know that ZA must be disabled.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-16-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 5 ++++- arch/arm64/include/asm/fpsimdmacros.h | 22 ++++++++++++++++++++++ arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/include/asm/processor.h | 2 +- arch/arm64/kernel/entry-fpsimd.S | 22 ++++++++++++++++++++++ arch/arm64/kernel/fpsimd.c | 20 +++++++++++++------- arch/arm64/kvm/fpsimd.c | 2 +- 7 files changed, 66 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 08f62c63b658..4cd12d5f4ad9 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -48,7 +48,8 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); extern void fpsimd_bind_task_to_cpu(void); extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, void *sve_state, unsigned int sve_vl, - unsigned int sme_vl, u64 *svcr); + void *za_state, unsigned int sme_vl, + u64 *svcr);
extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_save_and_flush_cpu_state(void); @@ -91,6 +92,8 @@ extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); extern void sve_set_vq(unsigned long vq_minus_1); extern void sme_set_vq(unsigned long vq_minus_1); +extern void za_save_state(void *state); +extern void za_load_state(void const *state);
struct arm64_cpu_capabilities; extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index fef5db370913..4f45dec6c970 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -318,3 +318,25 @@ ldr w\nxtmp, [\xpfpsr, #4] msr fpcr, x\nxtmp .endm + +.macro sme_save_za nxbase, xvl, nw + mov w\nw, #0 + +423: + _sme_str_zav \nw, \nxbase + add x\nxbase, x\nxbase, \xvl + add x\nw, x\nw, #1 + cmp \xvl, x\nw + bne 423b +.endm + +.macro sme_load_za nxbase, xvl, nw + mov w\nw, #0 + +423: + _sme_ldr_zav \nw, \nxbase + add x\nxbase, x\nxbase, \xvl + add x\nw, x\nw, #1 + cmp \xvl, x\nw + bne 423b +.endm diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 299137135c91..dfa409d640a3 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -285,8 +285,11 @@ struct vcpu_reset_state {
struct kvm_vcpu_arch { struct kvm_cpu_context ctxt; + + /* Guest floating point state */ void *sve_state; unsigned int sve_max_vl; + u64 svcr;
/* Stage 2 paging state used by the hardware on next switch */ struct kvm_s2_mmu *hw_mmu; diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 9734a312dddd..0ce3f74926b1 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -168,7 +168,7 @@ struct thread_struct { KABI_USE(2, unsigned int vl_onexec[ARM64_VEC_MAX]) KABI_USE(3, u64 tpidr2_el0) KABI_USE(4, u64 svcr) - KABI_RESERVE(5) + KABI_USE(5, void *za_state) /* ZA register, if any */ KABI_RESERVE(6) KABI_RESERVE(7) KABI_RESERVE(8) diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index 25bcb6a39e67..8d12aaac7862 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -85,4 +85,26 @@ SYM_FUNC_START(sme_set_vq) ret SYM_FUNC_END(sme_set_vq)
+/* + * Save the SME state + * + * x0 - pointer to buffer for state + */ +SYM_FUNC_START(za_save_state) + _sme_rdsvl 1, 1 // x1 = VL/8 + sme_save_za 0, x1, 12 + ret +SYM_FUNC_END(za_save_state) + +/* + * Load the SME state + * + * x0 - pointer to buffer for state + */ +SYM_FUNC_START(za_load_state) + _sme_rdsvl 1, 1 // x1 = VL/8 + sme_load_za 0, x1, 12 + ret +SYM_FUNC_END(za_load_state) + #endif /* CONFIG_ARM64_SME */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 967a38277fbd..891e7085e190 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -117,6 +117,7 @@ struct fpsimd_last_state_struct { struct user_fpsimd_state *st; void *sve_state; + void *za_state; u64 *svcr; unsigned int sve_vl; unsigned int sme_vl; @@ -381,11 +382,15 @@ static void task_fpsimd_load(void) if (system_supports_sme()) { unsigned long sme_vl = task_get_sme_vl(current);
+ /* Ensure VL is set up for restoring data */ if (test_thread_flag(TIF_SME)) sme_set_vq(sve_vq_from_vl(sme_vl) - 1);
write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
+ if (thread_za_enabled(¤t->thread)) + za_load_state(current->thread.za_state); + if (thread_sm_enabled(¤t->thread)) { restore_sve_regs = true; restore_ffr = system_supports_fa64(); @@ -429,11 +434,10 @@ static void fpsimd_save(void) u64 *svcr = last->svcr; *svcr = read_sysreg_s(SYS_SVCR_EL0);
- if (thread_za_enabled(¤t->thread)) { - /* ZA state managment is not implemented yet */ - force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); - return; - } + *svcr = read_sysreg_s(SYS_SVCR_EL0); + + if (*svcr & SYS_SVCR_EL0_ZA_MASK) + za_save_state(last->za_state);
/* If we are in streaming mode override regular SVE. */ if (*svcr & SYS_SVCR_EL0_SM_MASK) { @@ -1471,6 +1475,7 @@ void fpsimd_bind_task_to_cpu(void) WARN_ON(!system_supports_fpsimd()); last->st = ¤t->thread.uw.fpsimd_state; last->sve_state = current->thread.sve_state; + last->za_state = current->thread.za_state; last->sve_vl = task_get_sve_vl(current); last->sme_vl = task_get_sme_vl(current); last->svcr = ¤t->thread.svcr; @@ -1488,8 +1493,8 @@ void fpsimd_bind_task_to_cpu(void) }
void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, - unsigned int sve_vl, unsigned int sme_vl, - u64 *svcr) + unsigned int sve_vl, void *za_state, + unsigned int sme_vl, u64 *svcr) { struct fpsimd_last_state_struct *last = this_cpu_ptr(&fpsimd_last_state); @@ -1500,6 +1505,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, last->st = st; last->svcr = svcr; last->sve_state = sve_state; + last->za_state = za_state; last->sve_vl = sve_vl; last->sme_vl = sme_vl; } diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 4af62af52b95..80e54a600441 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -104,7 +104,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs, vcpu->arch.sve_state, vcpu->arch.sve_max_vl, - 0, NULL); + NULL, 0, &vcpu->arch.svcr);
clear_thread_flag(TIF_FOREIGN_FPSTATE); update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 8bd7f91c03d886f41d35f6108078d20be5a4a1bd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
By default all SME operations in userspace will trap. When this happens we allocate storage space for the SME register state, set up the SVE registers and disable traps. We do not need to initialize ZA since the architecture guarantees that it will be zeroed when enabled and when we trap ZA is disabled.
On syscall we exit streaming mode if we were previously in it and ensure that all but the lower 128 bits of the registers are zeroed while preserving the state of ZA. This follows the aarch64 PCS for SME, ZA state is preserved over a function call and streaming mode is exited. Since the traps for SME do not distinguish between streaming mode SVE and ZA usage if ZA is in use rather than reenabling traps we instead zero the parts of the SVE registers not shared with FPSIMD and leave SME enabled, this simplifies handling SME traps. If ZA is not in use then we reenable SME traps and fall through to normal handling of SVE.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-17-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/esr.h | 1 + arch/arm64/include/asm/exception.h | 1 + arch/arm64/include/asm/fpsimd.h | 39 +++++++ arch/arm64/kernel/entry-common.c | 11 ++ arch/arm64/kernel/fpsimd.c | 165 ++++++++++++++++++++++++++--- arch/arm64/kernel/process.c | 30 +++++- arch/arm64/kernel/syscall.c | 29 ++++- 7 files changed, 254 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index 2daf0f3a4377..843c5d2249c3 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h @@ -76,6 +76,7 @@ #define ESR_ELx_IL_SHIFT (25) #define ESR_ELx_IL (UL(1) << ESR_ELx_IL_SHIFT) #define ESR_ELx_ISS_MASK (ESR_ELx_IL - 1) +#define ESR_ELx_ISS(esr) ((esr) & ESR_ELx_ISS_MASK)
/* ISS field definitions shared by different classes */ #define ESR_ELx_WNR_SHIFT (6) diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h index 731cf01d9296..5a9fb95f5ff7 100644 --- a/arch/arm64/include/asm/exception.h +++ b/arch/arm64/include/asm/exception.h @@ -58,6 +58,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr, struct pt_regs *regs); void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs); void do_sve_acc(unsigned int esr, struct pt_regs *regs); +void do_sme_acc(unsigned int esr, struct pt_regs *regs); void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs); void do_sysinstr(unsigned int esr, struct pt_regs *regs); void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs); diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 4cd12d5f4ad9..d3aff96a1d62 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -242,6 +242,8 @@ static inline bool sve_vq_available(unsigned int vq) return vq_available(ARM64_VEC_SVE, vq); }
+size_t sve_state_size(struct task_struct const *task); + #else /* ! CONFIG_ARM64_SVE */
static inline void sve_alloc(struct task_struct *task) { } @@ -276,10 +278,25 @@ static inline void vec_update_vq_map(enum vec_type t) { } static inline int vec_verify_vq_map(enum vec_type t) { return 0; } static inline void sve_setup(void) { }
+static inline size_t sve_state_size(struct task_struct const *task) +{ + return 0; +} + #endif /* ! CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
+static inline void sme_user_disable(void) +{ + sysreg_clear_set(cpacr_el1, CPACR_EL1_SMEN_EL0EN, 0); +} + +static inline void sme_user_enable(void) +{ + sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_SMEN_EL0EN); +} + static inline void sme_smstart_sm(void) { asm volatile(__msr_s(SYS_SVCR_SMSTART_SM_EL0, "xzr")); @@ -307,16 +324,33 @@ static inline int sme_max_virtualisable_vl(void) return vec_max_virtualisable_vl(ARM64_VEC_SME); }
+extern void sme_alloc(struct task_struct *task); extern unsigned int sme_get_vl(void); extern int sme_set_current_vl(unsigned long arg); extern int sme_get_current_vl(void);
+/* + * Return how many bytes of memory are required to store the full SME + * specific state (currently just ZA) for task, given task's currently + * configured vector length. + */ +static inline size_t za_state_size(struct task_struct const *task) +{ + unsigned int vl = task_get_sme_vl(task); + + return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl)); +} + #else
+static inline void sme_user_disable(void) { BUILD_BUG(); } +static inline void sme_user_enable(void) { BUILD_BUG(); } + static inline void sme_smstart_sm(void) { } static inline void sme_smstop_sm(void) { } static inline void sme_smstop(void) { }
+static inline void sme_alloc(struct task_struct *task) { } static inline void sme_setup(void) { } static inline unsigned int sme_get_vl(void) { return 0; } static inline int sme_max_vl(void) { return 0; } @@ -324,6 +358,11 @@ static inline int sme_max_virtualisable_vl(void) { return 0; } static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; } static inline int sme_get_current_vl(void) { return -EINVAL; }
+static inline size_t za_state_size(struct task_struct const *task) +{ + return 0; +} + #endif /* ! CONFIG_ARM64_SME */
/* For use by EFI runtime services calls only */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index 387c755b79f6..87fd4d5b7429 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -283,6 +283,14 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr) do_sve_acc(esr, regs); }
+static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr) +{ + enter_from_user_mode(); + local_daif_restore(DAIF_PROCCTX); + do_sme_acc(esr, regs); + exit_to_user_mode(); +} + static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr) { enter_from_user_mode(); @@ -380,6 +388,9 @@ asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs) case ESR_ELx_EC_SVE: el0_sve_acc(regs, esr); break; + case ESR_ELx_EC_SME: + el0_sme_acc(regs, esr); + break; case ESR_ELx_EC_FP_EXC64: el0_fpsimd_exc(regs, esr); break; diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 891e7085e190..43493265caa3 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -205,6 +205,12 @@ static void set_sme_default_vl(int val) set_default_vl(ARM64_VEC_SME, val); }
+static void sme_free(struct task_struct *); + +#else + +static inline void sme_free(struct task_struct *t) { } + #endif
DEFINE_PER_CPU(bool, fpsimd_context_busy); @@ -806,18 +812,22 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, thread_sm_enabled(&task->thread)) sve_to_fpsimd(task);
- if (system_supports_sme() && type == ARM64_VEC_SME) + if (system_supports_sme() && type == ARM64_VEC_SME) { task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK | SYS_SVCR_EL0_ZA_MASK); + clear_thread_flag(TIF_SME); + }
if (task == current) put_cpu_fpsimd_context();
/* - * Force reallocation of task SVE state to the correct size - * on next use: + * Force reallocation of task SVE and SME state to the correct + * size on next use: */ sve_free(task); + if (system_supports_sme() && type == ARM64_VEC_SME) + sme_free(task);
task_set_vl(task, type, vl);
@@ -1152,12 +1162,43 @@ void __init sve_setup(void) void fpsimd_release_task(struct task_struct *dead_task) { __sve_free(dead_task); + sme_free(dead_task); }
#endif /* CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
+/* This will move to uapi/asm/sigcontext.h when signals are implemented */ +#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES)) + +/* + * Ensure that task->thread.za_state is allocated and sufficiently large. + * + * This function should be used only in preparation for replacing + * task->thread.za_state with new data. The memory is always zeroed + * here to prevent stale data from showing through: this is done in + * the interest of testability and predictability, the architecture + * guarantees that when ZA is enabled it will be zeroed. + */ +void sme_alloc(struct task_struct *task) +{ + if (task->thread.za_state) { + memset(task->thread.za_state, 0, za_state_size(task)); + return; + } + + /* This could potentially be up to 64K. */ + task->thread.za_state = + kzalloc(za_state_size(task), GFP_KERNEL); +} + +static void sme_free(struct task_struct *task) +{ + kfree(task->thread.za_state); + task->thread.za_state = NULL; +} + void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p) { /* Set priority for all PEs to architecturally defined minimum */ @@ -1267,6 +1308,29 @@ void __init sme_setup(void)
#endif /* CONFIG_ARM64_SME */
+static void sve_init_regs(void) +{ + /* + * Convert the FPSIMD state to SVE, zeroing all the state that + * is not shared with FPSIMD. If (as is likely) the current + * state is live in the registers then do this there and + * update our metadata for the current task including + * disabling the trap, otherwise update our in-memory copy. + * We are guaranteed to not be in streaming mode, we can only + * take a SVE trap when not in streaming mode and we can't be + * in streaming mode when taking a SME trap. + */ + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { + unsigned long vq_minus_one = + sve_vq_from_vl(task_get_sve_vl(current)) - 1; + sve_set_vq(vq_minus_one); + sve_flush_live(true, vq_minus_one); + fpsimd_bind_task_to_cpu(); + } else { + fpsimd_to_sve(current); + } +} + /* * Trapped SVE access * @@ -1298,22 +1362,77 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) WARN_ON(1); /* SVE access shouldn't have trapped */
/* - * Convert the FPSIMD state to SVE, zeroing all the state that - * is not shared with FPSIMD. If (as is likely) the current - * state is live in the registers then do this there and - * update our metadata for the current task including - * disabling the trap, otherwise update our in-memory copy. + * Even if the task can have used streaming mode we can only + * generate SVE access traps in normal SVE mode and + * transitioning out of streaming mode may discard any + * streaming mode state. Always clear the high bits to avoid + * any potential errors tracking what is properly initialised. + */ + sve_init_regs(); + + put_cpu_fpsimd_context(); +} + +/* + * Trapped SME access + * + * Storage is allocated for the full SVE and SME state, the current + * FPSIMD register contents are migrated to SVE if SVE is not already + * active, and the access trap is disabled. + * + * TIF_SME should be clear on entry: otherwise, fpsimd_restore_current_state() + * would have disabled the SME access trap for userspace during + * ret_to_user, making an SVE access trap impossible in that case. + */ +void do_sme_acc(unsigned int esr, struct pt_regs *regs) +{ + /* Even if we chose not to use SME, the hardware could still trap: */ + if (unlikely(!system_supports_sme()) || WARN_ON(is_compat_task())) { + force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0); + return; + } + + /* + * If this not a trap due to SME being disabled then something + * is being used in the wrong mode, report as SIGILL. */ + if (ESR_ELx_ISS(esr) != ESR_ELx_SME_ISS_SME_DISABLED) { + force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0); + return; + } + + sve_alloc(current); + sme_alloc(current); + if (!current->thread.sve_state || !current->thread.za_state) { + force_sig(SIGKILL); + return; + } + + get_cpu_fpsimd_context(); + + /* With TIF_SME userspace shouldn't generate any traps */ + if (test_and_set_thread_flag(TIF_SME)) + WARN_ON(1); + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { unsigned long vq_minus_one = - sve_vq_from_vl(task_get_sve_vl(current)) - 1; - sve_set_vq(vq_minus_one); - sve_flush_live(true, vq_minus_one); + sve_vq_from_vl(task_get_sme_vl(current)) - 1; + sme_set_vq(vq_minus_one); + fpsimd_bind_task_to_cpu(); - } else { - fpsimd_to_sve(current); }
+ /* + * If SVE was not already active initialise the SVE registers, + * any non-shared state between the streaming and regular SVE + * registers is architecturally guaranteed to be zeroed when + * we enter streaming mode. We do not need to initialize ZA + * since ZA must be disabled at this point and enabling ZA is + * architecturally defined to zero ZA. + */ + if (system_supports_sve() && !test_thread_flag(TIF_SVE)) + sve_init_regs(); + put_cpu_fpsimd_context(); }
@@ -1430,8 +1549,12 @@ void fpsimd_flush_thread(void) fpsimd_flush_thread_vl(ARM64_VEC_SVE); }
- if (system_supports_sme()) + if (system_supports_sme()) { + clear_thread_flag(TIF_SME); + sme_free(current); fpsimd_flush_thread_vl(ARM64_VEC_SME); + current->thread.svcr = 0; + }
put_cpu_fpsimd_context(); } @@ -1481,14 +1604,22 @@ void fpsimd_bind_task_to_cpu(void) last->svcr = ¤t->thread.svcr; current->thread.fpsimd_cpu = smp_processor_id();
+ /* + * Toggle SVE and SME trapping for userspace if needed, these + * are serialsied by ret_to_user(). + */ + if (system_supports_sme()) { + if (test_thread_flag(TIF_SME)) + sme_user_enable(); + else + sme_user_disable(); + } + if (system_supports_sve()) { - /* Toggle SVE trapping for userspace if needed */ if (test_thread_flag(TIF_SVE)) sve_user_enable(); else sve_user_disable(); - - /* Serialised by exception return to user */ } }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index c658967bb1d3..9af564773dda 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -372,17 +372,41 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
/* * Detach src's sve_state (if any) from dst so that it does not - * get erroneously used or freed prematurely. dst's sve_state + * get erroneously used or freed prematurely. dst's copies * will be allocated on demand later on if dst uses SVE. * For consistency, also clear TIF_SVE here: this could be done * later in copy_process(), but to avoid tripping up future - * maintainers it is best not to leave TIF_SVE and sve_state in + * maintainers it is best not to leave TIF flags and buffers in * an inconsistent state, even temporarily. */ dst->thread.sve_state = NULL; clear_tsk_thread_flag(dst, TIF_SVE);
- dst->thread.svcr = 0; + /* + * In the unlikely event that we create a new thread with ZA + * enabled we should retain the ZA state so duplicate it here. + * This may be shortly freed if we exec() or if CLONE_SETTLS + * but it's simpler to do it here. To avoid confusing the rest + * of the code ensure that we have a sve_state allocated + * whenever za_state is allocated. + */ + if (thread_za_enabled(&src->thread)) { + dst->thread.sve_state = kzalloc(sve_state_size(src), + GFP_KERNEL); + if (!dst->thread.za_state) + return -ENOMEM; + dst->thread.za_state = kmemdup(src->thread.za_state, + za_state_size(src), + GFP_KERNEL); + if (!dst->thread.za_state) { + kfree(dst->thread.sve_state); + dst->thread.sve_state = NULL; + return -ENOMEM; + } + } else { + dst->thread.za_state = NULL; + clear_tsk_thread_flag(dst, TIF_SME); + }
/* clear any pending asynchronous tag fault raised by the parent */ clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT); diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 66ca9534bd69..14986b67054d 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -171,11 +171,36 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, syscall_trace_exit(regs); }
-static inline void sve_user_discard(void) +/* + * As per the ABI exit SME streaming mode and clear the SVE state not + * shared with FPSIMD on syscall entry. + */ +static inline void fp_user_discard(void) { + /* + * If SME is active then exit streaming mode. If ZA is active + * then flush the SVE registers but leave userspace access to + * both SVE and SME enabled, otherwise disable SME for the + * task and fall through to disabling SVE too. This means + * that after a syscall we never have any streaming mode + * register state to track, if this changes the KVM code will + * need updating. + */ + if (system_supports_sme() && test_thread_flag(TIF_SME)) { + u64 svcr = read_sysreg_s(SYS_SVCR_EL0); + + if (svcr & SYS_SVCR_EL0_SM_MASK) + sme_smstop_sm(); + } + if (!system_supports_sve()) return;
+ /* + * If SME is not active then disable SVE, the registers will + * be cleared when userspace next attempts to access them and + * we do not need to track the SVE register state until then. + */ clear_thread_flag(TIF_SVE);
/* @@ -213,7 +238,7 @@ void do_el0_svc(struct pt_regs *regs) } #endif
- sve_user_discard(); + fp_user_discard(); el0_svc_common(regs, regs->regs[8], __NR_syscalls, t); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 40a8e87bb32855b39839d35b5b5b125494b3a604 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The ABI requires that streaming mode and ZA are disabled when invoking signal handlers, do this in setup_return() when we prepare the task state for the signal handler.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-18-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/signal.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index a58a94bc6ad0..0de0264975d1 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -559,6 +559,13 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, /* TCO (Tag Check Override) always cleared for signal handlers */ regs->pstate &= ~PSR_TCO_BIT;
+ /* Signal handlers are invoked with ZA and streaming mode disabled */ + if (system_supports_sme()) { + current->thread.svcr &= ~(SYS_SVCR_EL0_ZA_MASK | + SYS_SVCR_EL0_SM_MASK); + sme_smstop(); + } + if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 85ed24dad2904f7c141911d91b7807ab02694b5e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When in streaming mode we have the same set of SVE registers as we do in regular SVE mode with the exception of FFR and the use of the SME vector length. Provide signal handling for these registers by taking one of the reserved words in the SVE signal context as a flags field and defining a flag which is set for streaming mode. When the flag is set the vector length is set to the streaming mode vector length and we save and restore streaming mode data. We support entering or leaving streaming mode based on the value of the flag but do not support changing the vector length, this is not currently supported SVE signal handling.
We could instead allocate a separate record in the signal frame for the streaming mode SVE context but this inflates the size of the maximal signal frame required and adds complication when validating signal frames from userspace, especially given the current structure of the code.
Any implementation of support for streaming mode vectors in signals will have some potential for causing issues for applications that attempt to handle SVE vectors in signals, use streaming mode but do not understand streaming mode in their signal handling code, it is hard to identify a case that is clearly better than any other - they all have cases where they could cause unexpected register corruption or faults.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-19-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Conflicts: arch/arm64/kernel/signal.c setup_sigframe move to arch/arm64/include/asm/signal_common.h. Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/processor.h | 8 +++++ arch/arm64/include/asm/signal_common.h | 5 ++-- arch/arm64/include/uapi/asm/sigcontext.h | 16 ++++++++-- arch/arm64/kernel/signal.c | 37 +++++++++++++++++++----- 4 files changed, 53 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 0ce3f74926b1..8b2bb5ce3256 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -190,6 +190,14 @@ static inline unsigned int thread_get_sme_vl(struct thread_struct *thread) return thread_get_vl(thread, ARM64_VEC_SME); }
+static inline unsigned int thread_get_cur_vl(struct thread_struct *thread) +{ + if (system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK)) + return thread_get_sme_vl(thread); + else + return thread_get_sve_vl(thread); +} + unsigned int task_get_vl(const struct task_struct *task, enum vec_type type); void task_set_vl(struct task_struct *task, enum vec_type type, unsigned long vl); diff --git a/arch/arm64/include/asm/signal_common.h b/arch/arm64/include/asm/signal_common.h index d9f0770a6c42..4f20ab2abac7 100644 --- a/arch/arm64/include/asm/signal_common.h +++ b/arch/arm64/include/asm/signal_common.h @@ -212,8 +212,9 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, &esr_ctx->esr, err); }
- /* Scalable Vector Extension state, if present */ - if (system_supports_sve() && err == 0 && user->sve_offset) { + /* Scalable Vector Extension state (including streaming), if present */ + if ((system_supports_sve() || system_supports_sme()) && + err == 0 && user->sve_offset) { struct sve_context __user *sve_ctx = apply_user_offset(user, user->sve_offset); err |= preserve_sve_context(sve_ctx); diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 0c796c795dbe..57e9f8c3ee9e 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -134,9 +134,12 @@ struct extra_context { struct sve_context { struct _aarch64_ctx head; __u16 vl; - __u16 __reserved[3]; + __u16 flags; + __u16 __reserved[2]; };
+#define SVE_SIG_FLAG_SM 0x1 /* Context describes streaming mode */ + #endif /* !__ASSEMBLY__ */
#include <asm/sve_context.h> @@ -186,9 +189,16 @@ struct sve_context { * sve_context.vl must equal the thread's current vector length when * doing a sigreturn. * + * On systems with support for SME the SVE register state may reflect either + * streaming or non-streaming mode. In streaming mode the streaming mode + * vector length will be used and the flag SVE_SIG_FLAG_SM will be set in + * the flags field. It is permitted to enter or leave streaming mode in + * a signal return, applications should take care to ensure that any difference + * in vector length between the two modes is handled, including any resizing + * and movement of context blocks. * - * Note: for all these macros, the "vq" argument denotes the SVE - * vector length in quadwords (i.e., units of 128 bits). + * Note: for all these macros, the "vq" argument denotes the vector length + * in quadwords (i.e., units of 128 bits). * * The correct way to obtain vq is to use sve_vq_from_vl(vl). The * result is valid if and only if sve_vl_valid(vl) is true. This is diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 0de0264975d1..a1c2dffbfee4 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -180,11 +180,17 @@ int preserve_sve_context(struct sve_context __user *ctx) { int err = 0; u16 reserved[ARRAY_SIZE(ctx->__reserved)]; + u16 flags = 0; unsigned int vl = task_get_sve_vl(current); unsigned int vq = 0;
- if (test_thread_flag(TIF_SVE)) + if (thread_sm_enabled(¤t->thread)) { + vl = task_get_sme_vl(current); vq = sve_vq_from_vl(vl); + flags |= SVE_SIG_FLAG_SM; + } else if (test_thread_flag(TIF_SVE)) { + vq = sve_vq_from_vl(vl); + }
memset(reserved, 0, sizeof(reserved));
@@ -192,6 +198,7 @@ int preserve_sve_context(struct sve_context __user *ctx) __put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16), &ctx->head.size, err); __put_user_error(vl, &ctx->vl, err); + __put_user_error(flags, &ctx->flags, err); BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
@@ -212,18 +219,28 @@ int preserve_sve_context(struct sve_context __user *ctx) int restore_sve_fpsimd_context(struct user_ctxs *user) { int err; - unsigned int vq; + unsigned int vl, vq; struct user_fpsimd_state fpsimd; struct sve_context sve;
if (__copy_from_user(&sve, user->sve, sizeof(sve))) return -EFAULT;
- if (sve.vl != task_get_sve_vl(current)) + if (sve.flags & SVE_SIG_FLAG_SM) { + if (!system_supports_sme()) + return -EINVAL; + + vl = task_get_sme_vl(current); + } else { + vl = task_get_sve_vl(current); + } + + if (sve.vl != vl) return -EINVAL;
if (sve.head.size <= sizeof(*user->sve)) { clear_thread_flag(TIF_SVE); + current->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK; goto fpsimd_only; }
@@ -255,7 +272,10 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) if (err) return -EFAULT;
- set_thread_flag(TIF_SVE); + if (sve.flags & SVE_SIG_FLAG_SM) + current->thread.svcr |= SYS_SVCR_EL0_SM_MASK; + else + set_thread_flag(TIF_SVE);
fpsimd_only: /* copy the FP and status/control registers */ @@ -340,7 +360,7 @@ int __parse_user_sigcontext(struct user_ctxs *user, break;
case SVE_MAGIC: - if (!system_supports_sve()) + if (!system_supports_sve() && !system_supports_sme()) goto invalid;
if (user->sve) @@ -470,11 +490,12 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) if (system_supports_sve()) { unsigned int vq = 0;
- if (add_all || test_thread_flag(TIF_SVE)) { - int vl = sve_max_vl(); + if (add_all || test_thread_flag(TIF_SVE) || + thread_sm_enabled(¤t->thread)) { + int vl = max(sve_max_vl(), sme_max_vl());
if (!add_all) - vl = task_get_sve_vl(current); + vl = thread_get_cur_vl(¤t->thread);
vq = sve_vq_from_vl(vl); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 39782210eb7e87634d96cacb6ece370bc59d74ba category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Implement support for ZA in signal handling in a very similar way to how we implement support for SVE registers, using a signal context structure with optional register state after it. Where present this register state stores the ZA matrix as a series of horizontal vectors numbered from 0 to VL/8 in the endinanness independent format used for vectors.
As with SVE we do not allow changes in the vector length during signal return but we do allow ZA to be enabled or disabled.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-20-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Conflicts: arch/arm64/kernel/signal.c Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/signal_common.h | 14 +++ arch/arm64/include/uapi/asm/sigcontext.h | 41 ++++++++ arch/arm64/kernel/fpsimd.c | 3 - arch/arm64/kernel/signal.c | 124 +++++++++++++++++++++++ 4 files changed, 179 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/signal_common.h b/arch/arm64/include/asm/signal_common.h index 4f20ab2abac7..b7389dffd367 100644 --- a/arch/arm64/include/asm/signal_common.h +++ b/arch/arm64/include/asm/signal_common.h @@ -37,6 +37,7 @@ struct rt_sigframe_user_layout { unsigned long fpsimd_offset; unsigned long esr_offset; unsigned long sve_offset; + unsigned long za_offset; unsigned long extra_offset; unsigned long end_offset; }; @@ -44,6 +45,7 @@ struct rt_sigframe_user_layout { struct user_ctxs { struct fpsimd_context __user *fpsimd; struct sve_context __user *sve; + struct za_context __user *za; };
struct frame_record { @@ -129,6 +131,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user, return 0; }
+extern int restore_za_context(struct user_ctxs *user); static int restore_sigframe(struct pt_regs *regs, struct rt_sigframe __user *sf) { @@ -170,9 +173,13 @@ static int restore_sigframe(struct pt_regs *regs, } }
+ if (err == 0 && system_supports_sme() && user.za) + err = restore_za_context(&user); + return err; }
+extern int preserve_za_context(struct za_context __user *ctx); static int setup_sigframe(struct rt_sigframe_user_layout *user, struct pt_regs *regs, sigset_t *set) { @@ -220,6 +227,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, err |= preserve_sve_context(sve_ctx); }
+ /* ZA state if present */ + if (system_supports_sme() && err == 0 && user->za_offset) { + struct za_context __user *za_ctx = + apply_user_offset(user, user->za_offset); + err |= preserve_za_context(za_ctx); + } + if (err == 0 && user->extra_offset) setup_extra_context((char __user *)user->sigframe, user->size, (char __user *)apply_user_offset(user, diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 57e9f8c3ee9e..4aaf31e3bf16 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -140,6 +140,14 @@ struct sve_context {
#define SVE_SIG_FLAG_SM 0x1 /* Context describes streaming mode */
+#define ZA_MAGIC 0x54366345 + +struct za_context { + struct _aarch64_ctx head; + __u16 vl; + __u16 __reserved[3]; +}; + #endif /* !__ASSEMBLY__ */
#include <asm/sve_context.h> @@ -259,4 +267,37 @@ struct sve_context { #define SVE_SIG_CONTEXT_SIZE(vq) \ (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
+/* + * If the ZA register is enabled for the thread at signal delivery then, + * za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + * and the register data may be accessed using the ZA_SIG_*() macros. + * + * If za_context.head.size < ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + * then ZA was not enabled and no register data was included in which case + * ZA register was not enabled for the thread and no register data + * the ZA_SIG_*() macros should not be used except for this check. + * + * The same convention applies when returning from a signal: a caller + * will need to remove or resize the za_context block if it wants to + * enable the ZA register when it was previously non-live or vice-versa. + * This may require the caller to allocate fresh memory and/or move other + * context blocks in the signal frame. + * + * Changing the vector length during signal return is not permitted: + * za_context.vl must equal the thread's current SME vector length when + * doing a sigreturn. + */ + +#define ZA_SIG_REGS_OFFSET \ + ((sizeof(struct za_context) + (__SVE_VQ_BYTES - 1)) \ + / __SVE_VQ_BYTES * __SVE_VQ_BYTES) + +#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES)) + +#define ZA_SIG_ZAV_OFFSET(vq, n) (ZA_SIG_REGS_OFFSET + \ + (SVE_SIG_ZREG_SIZE(vq) * n)) + +#define ZA_SIG_CONTEXT_SIZE(vq) \ + (ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq)) + #endif /* _UAPI__ASM_SIGCONTEXT_H */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 43493265caa3..41e3622bf92c 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1169,9 +1169,6 @@ void fpsimd_release_task(struct task_struct *dead_task)
#ifdef CONFIG_ARM64_SME
-/* This will move to uapi/asm/sigcontext.h when signals are implemented */ -#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES)) - /* * Ensure that task->thread.za_state is allocated and sufficiently large. * diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index a1c2dffbfee4..2bec57cafbec 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -294,6 +294,98 @@ int restore_sve_fpsimd_context(struct user_ctxs *user)
#endif /* ! CONFIG_ARM64_SVE */
+#ifdef CONFIG_ARM64_SME + +int preserve_za_context(struct za_context __user *ctx) +{ + int err = 0; + u16 reserved[ARRAY_SIZE(ctx->__reserved)]; + unsigned int vl = task_get_sme_vl(current); + unsigned int vq; + + if (thread_za_enabled(¤t->thread)) + vq = sve_vq_from_vl(vl); + else + vq = 0; + + memset(reserved, 0, sizeof(reserved)); + + __put_user_error(ZA_MAGIC, &ctx->head.magic, err); + __put_user_error(round_up(ZA_SIG_CONTEXT_SIZE(vq), 16), + &ctx->head.size, err); + __put_user_error(vl, &ctx->vl, err); + BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); + err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); + + if (vq) { + /* + * This assumes that the ZA state has already been saved to + * the task struct by calling the function + * fpsimd_signal_preserve_current_state(). + */ + err |= __copy_to_user((char __user *)ctx + ZA_SIG_REGS_OFFSET, + current->thread.za_state, + ZA_SIG_REGS_SIZE(vq)); + } + + return err ? -EFAULT : 0; +} + +int restore_za_context(struct user_ctxs __user *user) +{ + int err; + unsigned int vq; + struct za_context za; + + if (__copy_from_user(&za, user->za, sizeof(za))) + return -EFAULT; + + if (za.vl != task_get_sme_vl(current)) + return -EINVAL; + + if (za.head.size <= sizeof(*user->za)) { + current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + return 0; + } + + vq = sve_vq_from_vl(za.vl); + + if (za.head.size < ZA_SIG_CONTEXT_SIZE(vq)) + return -EINVAL; + + /* + * Careful: we are about __copy_from_user() directly into + * thread.za_state with preemption enabled, so protection is + * needed to prevent a racing context switch from writing stale + * registers back over the new data. + */ + + fpsimd_flush_task_state(current); + /* From now, fpsimd_thread_switch() won't touch thread.sve_state */ + + sme_alloc(current); + if (!current->thread.za_state) { + current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + clear_thread_flag(TIF_SME); + return -ENOMEM; + } + + err = __copy_from_user(current->thread.za_state, + (char __user const *)user->za + + ZA_SIG_REGS_OFFSET, + ZA_SIG_REGS_SIZE(vq)); + if (err) + return -EFAULT; + + set_thread_flag(TIF_SME); + current->thread.svcr |= SYS_SVCR_EL0_ZA_MASK; + + return 0; +} +#else /* ! CONFIG_ARM64_SME */ + +#endif /* ! CONFIG_ARM64_SME */ + int __parse_user_sigcontext(struct user_ctxs *user, struct sigcontext __user const *sc, void __user const *sigframe_base) @@ -307,6 +399,7 @@ int __parse_user_sigcontext(struct user_ctxs *user,
user->fpsimd = NULL; user->sve = NULL; + user->za = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) goto invalid; @@ -372,6 +465,19 @@ int __parse_user_sigcontext(struct user_ctxs *user, user->sve = (struct sve_context __user *)head; break;
+ case ZA_MAGIC: + if (!system_supports_sme()) + goto invalid; + + if (user->za) + goto invalid; + + if (size < sizeof(*user->za)) + goto invalid; + + user->za = (struct za_context __user *)head; + break; + case EXTRA_MAGIC: if (have_extra_context) goto invalid; @@ -506,6 +612,24 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) return err; }
+ if (system_supports_sme()) { + unsigned int vl; + unsigned int vq = 0; + + if (add_all) + vl = sme_max_vl(); + else + vl = task_get_sme_vl(current); + + if (thread_za_enabled(¤t->thread)) + vq = sve_vq_from_vl(vl); + + err = sigframe_alloc(user, &user->za_offset, + ZA_SIG_CONTEXT_SIZE(vq)); + if (err) + return err; + } + return sigframe_alloc_end(user); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit e12310a0d30f260b26297bc8d7c95769489af038 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The streaming mode SVE registers are represented using the same data structures as for SVE but since the vector lengths supported and in use may not be the same as SVE we represent them with a new type NT_ARM_SSVE. Unfortunately we only have a single 16 bit reserved field available in the header so there is no space to fit the current and maximum vector length for both standard and streaming SVE mode without redefining the structure in a way the creates a complicatd and fragile ABI. Since FFR is not present in streaming mode it is read and written as zero.
Setting NT_ARM_SSVE registers will put the task into streaming mode, similarly setting NT_ARM_SVE registers will exit it. Reads that do not correspond to the current mode of the task will return the header with no register data. For compatibility reasons on write setting no flag for the register type will be interpreted as setting SVE registers, though users can provide no register data as an alternative mechanism for doing so.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-21-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 13 +- arch/arm64/kernel/fpsimd.c | 31 +++- arch/arm64/kernel/ptrace.c | 214 +++++++++++++++++++++------ include/uapi/linux/elf.h | 1 + 5 files changed, 201 insertions(+), 59 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index d3aff96a1d62..5d401aebcef5 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -147,6 +147,7 @@ extern size_t sve_state_size(struct task_struct const *task); extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_task(struct task_struct *task); extern void fpsimd_sync_to_sve(struct task_struct *task); +extern void fpsimd_force_sync_to_sve(struct task_struct *task); extern void sve_sync_to_fpsimd(struct task_struct *task); extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 758ae984ff97..522b925a78c1 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -109,7 +109,7 @@ struct user_hwdebug_state { } dbg_regs[16]; };
-/* SVE/FP/SIMD state (NT_ARM_SVE) */ +/* SVE/FP/SIMD state (NT_ARM_SVE & NT_ARM_SSVE) */
struct user_sve_header { __u32 size; /* total meaningful regset content in bytes */ @@ -220,6 +220,7 @@ struct user_sve_header { (SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \ SVE_PT_SVE_PREGS_OFFSET(vq))
+/* For streaming mode SVE (SSVE) FFR must be read and written as zero */ #define SVE_PT_SVE_FFR_OFFSET(vq) \ (SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
@@ -240,10 +241,12 @@ struct user_sve_header { - SVE_PT_SVE_OFFSET + (__SVE_VQ_BYTES - 1)) \ / __SVE_VQ_BYTES * __SVE_VQ_BYTES)
-#define SVE_PT_SIZE(vq, flags) \ - (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ - SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ - : SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) +#define SVE_PT_SIZE(vq, flags) \ + (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ + SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ + : ((((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD ? \ + SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags) \ + : SVE_PT_REGS_OFFSET)))
/* pointer authentication masks (NT_ARM_PAC_MASK) */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 41e3622bf92c..971577f54ca2 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -631,7 +631,7 @@ static void fpsimd_to_sve(struct task_struct *task) if (!system_supports_sve()) return;
- vq = sve_vq_from_vl(task_get_sve_vl(task)); + vq = sve_vq_from_vl(thread_get_cur_vl(&task->thread)); __fpsimd_to_sve(sst, fst, vq); }
@@ -648,7 +648,7 @@ static void fpsimd_to_sve(struct task_struct *task) */ static void sve_to_fpsimd(struct task_struct *task) { - unsigned int vq; + unsigned int vq, vl; void const *sst = task->thread.sve_state; struct user_fpsimd_state *fst = &task->thread.uw.fpsimd_state; unsigned int i; @@ -657,7 +657,8 @@ static void sve_to_fpsimd(struct task_struct *task) if (!system_supports_sve()) return;
- vq = sve_vq_from_vl(task_get_sve_vl(task)); + vl = thread_get_cur_vl(&task->thread); + vq = sve_vq_from_vl(vl); for (i = 0; i < SVE_NUM_ZREGS; ++i) { p = (__uint128_t const *)ZREG(sst, vq, i); fst->vregs[i] = arm64_le128_to_cpu(*p); @@ -705,6 +706,19 @@ void sve_alloc(struct task_struct *task) }
+/* + * Force the FPSIMD state shared with SVE to be updated in the SVE state + * even if the SVE state is the current active state. + * + * This should only be called by ptrace. task must be non-runnable. + * task->thread.sve_state must point to at least sve_state_size(task) + * bytes of allocated kernel memory. + */ +void fpsimd_force_sync_to_sve(struct task_struct *task) +{ + fpsimd_to_sve(task); +} + /* * Ensure that task->thread.sve_state is up to date with respect to * the user task, irrespective of when SVE is in use or not. @@ -715,7 +729,8 @@ void sve_alloc(struct task_struct *task) */ void fpsimd_sync_to_sve(struct task_struct *task) { - if (!test_tsk_thread_flag(task, TIF_SVE)) + if (!test_tsk_thread_flag(task, TIF_SVE) && + !thread_sm_enabled(&task->thread)) fpsimd_to_sve(task); }
@@ -729,7 +744,8 @@ void fpsimd_sync_to_sve(struct task_struct *task) */ void sve_sync_to_fpsimd(struct task_struct *task) { - if (test_tsk_thread_flag(task, TIF_SVE)) + if (test_tsk_thread_flag(task, TIF_SVE) || + thread_sm_enabled(&task->thread)) sve_to_fpsimd(task); }
@@ -754,7 +770,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task) if (!test_tsk_thread_flag(task, TIF_SVE)) return;
- vq = sve_vq_from_vl(task_get_sve_vl(task)); + vq = sve_vq_from_vl(thread_get_cur_vl(&task->thread));
memset(sst, 0, SVE_SIG_REGS_SIZE(vq)); __fpsimd_to_sve(sst, fst, vq); @@ -798,8 +814,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, /* * To ensure the FPSIMD bits of the SVE vector registers are preserved, * write any live register state back to task_struct, and convert to a - * regular FPSIMD thread. Since the vector length can only be changed - * with a syscall we can't be in streaming mode while reconfiguring. + * regular FPSIMD thread. */ if (task == current) { get_cpu_fpsimd_context(); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index b125580797d9..f478b1590111 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -715,21 +715,51 @@ static int system_call_set(struct task_struct *target, #ifdef CONFIG_ARM64_SVE
static void sve_init_header_from_task(struct user_sve_header *header, - struct task_struct *target) + struct task_struct *target, + enum vec_type type) { unsigned int vq; + bool active; + bool fpsimd_only; + enum vec_type task_type;
memset(header, 0, sizeof(*header));
- header->flags = test_tsk_thread_flag(target, TIF_SVE) ? - SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; - if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) - header->flags |= SVE_PT_VL_INHERIT; + /* Check if the requested registers are active for the task */ + if (thread_sm_enabled(&target->thread)) + task_type = ARM64_VEC_SME; + else + task_type = ARM64_VEC_SVE; + active = (task_type == type); + + switch (type) { + case ARM64_VEC_SVE: + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) + header->flags |= SVE_PT_VL_INHERIT; + fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE); + break; + case ARM64_VEC_SME: + if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT)) + header->flags |= SVE_PT_VL_INHERIT; + fpsimd_only = false; + break; + default: + WARN_ON_ONCE(1); + return; + }
- header->vl = task_get_sve_vl(target); + if (active) { + if (fpsimd_only) { + header->flags |= SVE_PT_REGS_FPSIMD; + } else { + header->flags |= SVE_PT_REGS_SVE; + } + } + + header->vl = task_get_vl(target, type); vq = sve_vq_from_vl(header->vl);
- header->max_vl = sve_max_vl(); + header->max_vl = vec_max_vl(type); header->size = SVE_PT_SIZE(vq, header->flags); header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), SVE_PT_REGS_SVE); @@ -740,19 +770,17 @@ static unsigned int sve_size_from_header(struct user_sve_header const *header) return ALIGN(header->size, SVE_VQ_BYTES); }
-static int sve_get(struct task_struct *target, - const struct user_regset *regset, - struct membuf to) +static int sve_get_common(struct task_struct *target, + const struct user_regset *regset, + struct membuf to, + enum vec_type type) { struct user_sve_header header; unsigned int vq; unsigned long start, end;
- if (!system_supports_sve()) - return -EINVAL; - /* Header */ - sve_init_header_from_task(&header, target); + sve_init_header_from_task(&header, target, type); vq = sve_vq_from_vl(header.vl);
membuf_write(&to, &header, sizeof(header)); @@ -760,49 +788,61 @@ static int sve_get(struct task_struct *target, if (target == current) fpsimd_preserve_current_state();
- /* Registers: FPSIMD-only case */ - BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); - if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + + switch ((header.flags & SVE_PT_REGS_MASK)) { + case SVE_PT_REGS_FPSIMD: return __fpr_get(target, regset, to);
- /* Otherwise: full SVE case */ + case SVE_PT_REGS_SVE: + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + membuf_write(&to, target->thread.sve_state, end - start);
- BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); - start = SVE_PT_SVE_OFFSET; - end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); - membuf_write(&to, target->thread.sve_state, end - start); + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + membuf_zero(&to, end - start);
- start = end; - end = SVE_PT_SVE_FPSR_OFFSET(vq); - membuf_zero(&to, end - start); + /* + * Copy fpsr, and fpcr which must follow contiguously in + * struct fpsimd_state: + */ + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr, + end - start);
- /* - * Copy fpsr, and fpcr which must follow contiguously in - * struct fpsimd_state: - */ - start = end; - end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; - membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr, end - start); + start = end; + end = sve_size_from_header(&header); + return membuf_zero(&to, end - start);
- start = end; - end = sve_size_from_header(&header); - return membuf_zero(&to, end - start); + default: + return 0; + } }
-static int sve_set(struct task_struct *target, +static int sve_get(struct task_struct *target, const struct user_regset *regset, - unsigned int pos, unsigned int count, - const void *kbuf, const void __user *ubuf) + struct membuf to) +{ + if (!system_supports_sve()) + return -EINVAL; + + return sve_get_common(target, regset, to, ARM64_VEC_SVE); +} + +static int sve_set_common(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf, + enum vec_type type) { int ret; struct user_sve_header header; unsigned int vq; unsigned long start, end;
- if (!system_supports_sve()) - return -EINVAL; - /* Header */ if (count < sizeof(header)) return -EINVAL; @@ -815,13 +855,37 @@ static int sve_set(struct task_struct *target, * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by * vec_set_vector_length(), which will also validate them for us: */ - ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl, + ret = vec_set_vector_length(target, type, header.vl, ((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16); if (ret) goto out;
/* Actual VL set may be less than the user asked for: */ - vq = sve_vq_from_vl(task_get_sve_vl(target)); + vq = sve_vq_from_vl(task_get_vl(target, type)); + + /* Enter/exit streaming mode */ + if (system_supports_sme()) { + u64 old_svcr = target->thread.svcr; + + switch (type) { + case ARM64_VEC_SVE: + target->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK; + break; + case ARM64_VEC_SME: + target->thread.svcr |= SYS_SVCR_EL0_SM_MASK; + break; + default: + WARN_ON_ONCE(1); + return -EINVAL; + } + + /* + * If we switched then invalidate any existing SVE + * state and ensure there's storage. + */ + if (target->thread.svcr != old_svcr) + sve_alloc(target); + }
/* Registers: FPSIMD-only case */
@@ -830,10 +894,15 @@ static int sve_set(struct task_struct *target, ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, SVE_PT_FPSIMD_OFFSET); clear_tsk_thread_flag(target, TIF_SVE); + if (type == ARM64_VEC_SME) + fpsimd_force_sync_to_sve(target); goto out; }
- /* Otherwise: full SVE case */ + /* + * Otherwise: no registers or full SVE case. For backwards + * compatibility reasons we treat empty flags as SVE registers. + */
/* * If setting a different VL from the requested VL and there is @@ -854,8 +923,9 @@ static int sve_set(struct task_struct *target,
/* * Ensure target->thread.sve_state is up to date with target's - * FPSIMD regs, so that a short copyin leaves trailing registers - * unmodified. + * FPSIMD regs, so that a short copyin leaves trailing + * registers unmodified. Always enable SVE even if going into + * streaming mode. */ fpsimd_sync_to_sve(target); set_tsk_thread_flag(target, TIF_SVE); @@ -891,8 +961,46 @@ static int sve_set(struct task_struct *target, return ret; }
+static int sve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + if (!system_supports_sve()) + return -EINVAL; + + return sve_set_common(target, regset, pos, count, kbuf, ubuf, + ARM64_VEC_SVE); +} + #endif /* CONFIG_ARM64_SVE */
+#ifdef CONFIG_ARM64_SME + +static int ssve_get(struct task_struct *target, + const struct user_regset *regset, + struct membuf to) +{ + if (!system_supports_sme()) + return -EINVAL; + + return sve_get_common(target, regset, to, ARM64_VEC_SME); +} + +static int ssve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + if (!system_supports_sme()) + return -EINVAL; + + return sve_set_common(target, regset, pos, count, kbuf, ubuf, + ARM64_VEC_SME); +} + +#endif /* CONFIG_ARM64_SME */ + #ifdef CONFIG_ARM64_PTR_AUTH static int pac_mask_get(struct task_struct *target, const struct user_regset *regset, @@ -1078,6 +1186,9 @@ enum aarch64_regset { #ifdef CONFIG_ARM64_SVE REGSET_SVE, #endif +#ifdef CONFIG_ARM64_SVE + REGSET_SSVE, +#endif #ifdef CONFIG_ARM64_PTR_AUTH REGSET_PAC_MASK, #ifdef CONFIG_CHECKPOINT_RESTORE @@ -1157,6 +1268,17 @@ static const struct user_regset aarch64_regsets[] = { .set = sve_set, }, #endif +#ifdef CONFIG_ARM64_SME + [REGSET_SSVE] = { /* Streaming mode SVE */ + .core_note_type = NT_ARM_SSVE, + .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), + SVE_VQ_BYTES), + .size = SVE_VQ_BYTES, + .align = SVE_VQ_BYTES, + .regset_get = ssve_get, + .set = ssve_set, + }, +#endif #ifdef CONFIG_ARM64_PTR_AUTH [REGSET_PAC_MASK] = { .core_note_type = NT_ARM_PAC_MASK, diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 30f68b42eeb5..70c808747299 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -426,6 +426,7 @@ typedef struct elf64_shdr { #define NT_ARM_PACA_KEYS 0x407 /* ARM pointer authentication address keys */ #define NT_ARM_PACG_KEYS 0x408 /* ARM pointer authentication generic key */ #define NT_ARM_TAGGED_ADDR_CTRL 0x409 /* arm64 tagged address control (prctl()) */ +#define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */ #define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */ #define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */ #define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 776b4a1cf36411e96972455ca72906b722b80ea1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The ZA array can be read and written with the NT_ARM_ZA. Similarly to our interface for the SVE vector registers the regset consists of a header with information on the current vector length followed by an optional register data payload, represented as for signals as a series of horizontal vectors from 0 to VL/8 in the endianness independent format used for vectors.
On get if ZA is enabled then register data will be provided, otherwise it will be omitted. On set if register data is provided then ZA is enabled and initialized using the provided data, otherwise it is disabled.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-22-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/uapi/asm/ptrace.h | 56 +++++++++++ arch/arm64/kernel/ptrace.c | 144 +++++++++++++++++++++++++++ include/uapi/linux/elf.h | 1 + 3 files changed, 201 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 522b925a78c1..7fa2f7036aa7 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -268,6 +268,62 @@ struct user_pac_generic_keys { __uint128_t apgakey; };
+/* ZA state (NT_ARM_ZA) */ + +struct user_za_header { + __u32 size; /* total meaningful regset content in bytes */ + __u32 max_size; /* maxmium possible size for this thread */ + __u16 vl; /* current vector length */ + __u16 max_vl; /* maximum possible vector length */ + __u16 flags; + __u16 __reserved; +}; + +/* + * Common ZA_PT_* flags: + * These must be kept in sync with prctl interface in <linux/prctl.h> + */ +#define ZA_PT_VL_INHERIT ((1 << 17) /* PR_SME_VL_INHERIT */ >> 16) +#define ZA_PT_VL_ONEXEC ((1 << 18) /* PR_SME_SET_VL_ONEXEC */ >> 16) + + +/* + * The remainder of the ZA state follows struct user_za_header. The + * total size of the ZA state (including header) depends on the + * metadata in the header: ZA_PT_SIZE(vq, flags) gives the total size + * of the state in bytes, including the header. + * + * Refer to <asm/sigcontext.h> for details of how to pass the correct + * "vq" argument to these macros. + */ + +/* Offset from the start of struct user_za_header to the register data */ +#define ZA_PT_ZA_OFFSET \ + ((sizeof(struct user_za_header) + (__SVE_VQ_BYTES - 1)) \ + / __SVE_VQ_BYTES * __SVE_VQ_BYTES) + +/* + * The payload starts at offset ZA_PT_ZA_OFFSET, and is of size + * ZA_PT_ZA_SIZE(vq, flags). + * + * The ZA array is stored as a sequence of horizontal vectors ZAV of SVL/8 + * bytes each, starting from vector 0. + * + * Additional data might be appended in the future. + * + * The ZA matrix is represented in memory in an endianness-invariant layout + * which differs from the layout used for the FPSIMD V-registers on big-endian + * systems: see sigcontext.h for more explanation. + */ + +#define ZA_PT_ZAV_OFFSET(vq, n) \ + (ZA_PT_ZA_OFFSET + ((vq * __SVE_VQ_BYTES) * n)) + +#define ZA_PT_ZA_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES)) + +#define ZA_PT_SIZE(vq) \ + (ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq)) + #endif /* __ASSEMBLY__ */
#endif /* _UAPI__ASM_PTRACE_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index f478b1590111..aa4347f4c6aa 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -999,6 +999,141 @@ static int ssve_set(struct task_struct *target, ARM64_VEC_SME); }
+static int za_get(struct task_struct *target, + const struct user_regset *regset, + struct membuf to) +{ + struct user_za_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sme()) + return -EINVAL; + + /* Header */ + memset(&header, 0, sizeof(header)); + + if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT)) + header.flags |= ZA_PT_VL_INHERIT; + + header.vl = task_get_sme_vl(target); + vq = sve_vq_from_vl(header.vl); + header.max_vl = sme_max_vl(); + header.max_size = ZA_PT_SIZE(vq); + + /* If ZA is not active there is only the header */ + if (thread_za_enabled(&target->thread)) + header.size = ZA_PT_SIZE(vq); + else + header.size = ZA_PT_ZA_OFFSET; + + membuf_write(&to, &header, sizeof(header)); + + BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header)); + end = ZA_PT_ZA_OFFSET; + + if (target == current) + fpsimd_preserve_current_state(); + + /* Any register data to include? */ + if (thread_za_enabled(&target->thread)) { + start = end; + end = ZA_PT_SIZE(vq); + membuf_write(&to, target->thread.za_state, end - start); + } + + /* Zero any trailing padding */ + start = end; + end = ALIGN(header.size, SVE_VQ_BYTES); + return membuf_zero(&to, end - start); +} + +static int za_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + struct user_za_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sme()) + return -EINVAL; + + /* Header */ + if (count < sizeof(header)) + return -EINVAL; + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + goto out; + + /* + * All current ZA_PT_* flags are consumed by + * vec_set_vector_length(), which will also validate them for + * us: + */ + ret = vec_set_vector_length(target, ARM64_VEC_SME, header.vl, + ((unsigned long)header.flags) << 16); + if (ret) + goto out; + + /* Actual VL set may be less than the user asked for: */ + vq = sve_vq_from_vl(task_get_sme_vl(target)); + + /* Ensure there is some SVE storage for streaming mode */ + if (!target->thread.sve_state) { + sve_alloc(target); + if (!target->thread.sve_state) { + clear_thread_flag(TIF_SME); + ret = -ENOMEM; + goto out; + } + } + + /* Allocate/reinit ZA storage */ + sme_alloc(target); + if (!target->thread.za_state) { + ret = -ENOMEM; + clear_tsk_thread_flag(target, TIF_SME); + goto out; + } + + /* If there is no data then disable ZA */ + if (!count) { + target->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + goto out; + } + + /* + * If setting a different VL from the requested VL and there is + * register data, the data layout will be wrong: don't even + * try to set the registers in this case. + */ + if (vq != sve_vq_from_vl(header.vl)) { + ret = -EIO; + goto out; + } + + BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header)); + start = ZA_PT_ZA_OFFSET; + end = ZA_PT_SIZE(vq); + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + target->thread.za_state, + start, end); + if (ret) + goto out; + + /* Mark ZA as active and let userspace use it */ + set_tsk_thread_flag(target, TIF_SME); + target->thread.svcr |= SYS_SVCR_EL0_ZA_MASK; + +out: + fpsimd_flush_task_state(target); + return ret; +} + #endif /* CONFIG_ARM64_SME */
#ifdef CONFIG_ARM64_PTR_AUTH @@ -1188,6 +1323,7 @@ enum aarch64_regset { #endif #ifdef CONFIG_ARM64_SVE REGSET_SSVE, + REGSET_ZA, #endif #ifdef CONFIG_ARM64_PTR_AUTH REGSET_PAC_MASK, @@ -1278,6 +1414,14 @@ static const struct user_regset aarch64_regsets[] = { .regset_get = ssve_get, .set = ssve_set, }, + [REGSET_ZA] = { /* SME ZA */ + .core_note_type = NT_ARM_ZA, + .n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES), + .size = SVE_VQ_BYTES, + .align = SVE_VQ_BYTES, + .regset_get = za_get, + .set = za_set, + }, #endif #ifdef CONFIG_ARM64_PTR_AUTH [REGSET_PAC_MASK] = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 70c808747299..fff8982b0cc8 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -427,6 +427,7 @@ typedef struct elf64_shdr { #define NT_ARM_PACG_KEYS 0x408 /* ARM pointer authentication generic key */ #define NT_ARM_TAGGED_ADDR_CTRL 0x409 /* arm64 tagged address control (prctl()) */ #define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */ +#define NT_ARM_ZA 0x40c /* ARM SME ZA registers */ #define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */ #define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */ #define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit d45d7ff7047f7f6c3221b0f028fade640812f931 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Both streaming mode and ZA may increase power consumption when they are enabled and streaming mode makes many FPSIMD and SVE instructions undefined which will cause problems for any kernel mode floating point so disable both when we flush the CPU state. This covers both kernel_neon_begin() and idle and after flushing the state a reload is always required anyway.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-23-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 971577f54ca2..5d747b332742 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1744,6 +1744,15 @@ static void fpsimd_flush_cpu_state(void) { WARN_ON(!system_supports_fpsimd()); __this_cpu_write(fpsimd_last_state.st, NULL); + + /* + * Leaving streaming mode enabled will cause issues for any kernel + * NEON and leaving streaming mode or ZA enabled may increase power + * consumption. + */ + if (system_supports_sme()) + sme_smstop(); + set_thread_flag(TIF_FOREIGN_FPSTATE); }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit e0838f6373e5cb72516fc4c26bba309097e2a80a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When saving and restoring the floating point state over an EFI runtime call ensure that we handle streaming mode, only handling FFR if we are not in streaming mode and ensuring that we are in normal mode over the call into runtime services.
We currently assume that ZA will not be modified by runtime services, the specification is not yet finalised so this may need updating if that changes.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-24-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 48 +++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 5d747b332742..1e1de099777c 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1044,21 +1044,25 @@ int vec_verify_vq_map(enum vec_type type)
static void __init sve_efi_setup(void) { - struct vl_info *info = &vl_info[ARM64_VEC_SVE]; + int max_vl = 0; + int i;
if (!IS_ENABLED(CONFIG_EFI)) return;
+ for (i = 0; i < ARRAY_SIZE(vl_info); i++) + max_vl = max(vl_info[i].max_vl, max_vl); + /* * alloc_percpu() warns and prints a backtrace if this goes wrong. * This is evidence of a crippled system and we are returning void, * so no attempt is made to handle this situation here. */ - if (!sve_vl_valid(info->max_vl)) + if (!sve_vl_valid(max_vl)) goto fail;
efi_sve_state = __alloc_percpu( - SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES); + SVE_SIG_REGS_SIZE(sve_vq_from_vl(max_vl)), SVE_VQ_BYTES); if (!efi_sve_state) goto fail;
@@ -1830,6 +1834,7 @@ EXPORT_SYMBOL(kernel_neon_end); static DEFINE_PER_CPU(struct user_fpsimd_state, efi_fpsimd_state); static DEFINE_PER_CPU(bool, efi_fpsimd_state_used); static DEFINE_PER_CPU(bool, efi_sve_state_used); +static DEFINE_PER_CPU(bool, efi_sm_state);
/* * EFI runtime services support functions @@ -1864,12 +1869,28 @@ void __efi_fpsimd_begin(void) */ if (system_supports_sve() && likely(efi_sve_state)) { char *sve_state = this_cpu_ptr(efi_sve_state); + bool ffr = true; + u64 svcr;
__this_cpu_write(efi_sve_state_used, true);
+ if (system_supports_sme()) { + svcr = read_sysreg_s(SYS_SVCR_EL0); + + if (!system_supports_fa64()) + ffr = svcr & SYS_SVCR_EL0_SM_MASK; + + __this_cpu_write(efi_sm_state, ffr); + } + sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, - true); + ffr); + + if (system_supports_sme()) + sysreg_clear_set_s(SYS_SVCR_EL0, + SYS_SVCR_EL0_SM_MASK, 0); + } else { fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state)); } @@ -1892,11 +1913,26 @@ void __efi_fpsimd_end(void) if (system_supports_sve() && likely(__this_cpu_read(efi_sve_state_used))) { char const *sve_state = this_cpu_ptr(efi_sve_state); + bool ffr = true; + + /* + * Restore streaming mode; EFI calls are + * normal function calls so should not return in + * streaming mode. + */ + if (system_supports_sme()) { + if (__this_cpu_read(efi_sm_state)) { + sysreg_clear_set_s(SYS_SVCR_EL0, + 0, + SYS_SVCR_EL0_SM_MASK); + if (!system_supports_fa64()) + ffr = efi_sm_state; + } + }
- sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1); sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()), &this_cpu_ptr(&efi_fpsimd_state)->fpsr, - true); + ffr);
__this_cpu_write(efi_sve_state_used, false); } else {
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit a1f4ccd25cc256255813f584f10e5527369d4a02 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Now that basline support for the Scalable Matrix Extension (SME) is present introduce the Kconfig option allowing it to be built. While the feature registers don't impose a strong requirement for a system with SME to support SVE at runtime the support for streaming mode SVE is mostly shared with normal SVE so depend on SVE.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Link: https://lore.kernel.org/r/20220419112247.711548-28-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4f6c80ea976..d268cca515b0 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1917,6 +1917,17 @@ config ARM64_SVE Thus, you will need to enable CONFIG_ARM64_VHE if you want to support KVM in the same kernel image.
+config ARM64_SME + bool "ARM Scalable Matrix Extension support" + default y + depends on ARM64_SVE + help + The Scalable Matrix Extension (SME) is an extension to the AArch64 + execution state which utilises a substantial subset of the SVE + instruction set, together with the addition of new architectural + register state capable of holding two dimensional matrix tiles to + enable various matrix operations. + config ARM64_MODULE_PLTS bool "Use PLTs to allow module memory to spill over into vmalloc area" depends on MODULES
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 8a58bcd00e2e8d46afce468adc09fcd7968f514c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
We need to explicitly enumerate all the ID registers which we rely on for CPU capabilities in __read_sysreg_by_encoding(), ID_AA64SMFR0_EL1 was missed from this list so we trip a BUG() in paths which rely on that function such as CPU hotplug. Add the register.
Reported-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Mark Brown broonie@kernel.org Tested-by: Marek Szyprowski m.szyprowski@samsung.com Link: https://lore.kernel.org/r/20220427130828.162615-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/cpufeature.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index b443de8bcf70..5fad051bacb3 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1223,6 +1223,7 @@ static u64 __read_sysreg_by_encoding(u32 sys_id) read_sysreg_case(SYS_ID_AA64PFR0_EL1); read_sysreg_case(SYS_ID_AA64PFR1_EL1); read_sysreg_case(SYS_ID_AA64ZFR0_EL1); + read_sysreg_case(SYS_ID_AA64SMFR0_EL1); read_sysreg_case(SYS_ID_AA64DFR0_EL1); read_sysreg_case(SYS_ID_AA64DFR1_EL1); read_sysreg_case(SYS_ID_AA64MMFR0_EL1);
From: Wan Jiabing wanjiabing@vivo.com
mainline inclusion from mainline-v5.19-rc1 commit 2e29b9971ac54dec88baa58856a230ec2f2a2dff category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Fix following coccicheck error: ./arch/arm64/kernel/process.c:322:2-23: alloc with no test, possible model on line 326
Here should be dst->thread.sve_state.
Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Signed-off-by: Wan Jiabing wanjiabing@vivo.com Reviwed-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220426113054.630983-1-wanjiabing@vivo.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 9af564773dda..61df01a8a4de 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -393,7 +393,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) if (thread_za_enabled(&src->thread)) { dst->thread.sve_state = kzalloc(sve_state_size(src), GFP_KERNEL); - if (!dst->thread.za_state) + if (!dst->thread.sve_state) return -ENOMEM; dst->thread.za_state = kmemdup(src->thread.za_state, za_state_size(src),
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit d158a0608eb85f29f98357b97d01b80250636613 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Since the vector length configuration mechanism is identical between SVE and SME we share large elements of the code including the definition for the maximum vector length. Unfortunately when we were defining the ABI for SVE we included not only the actual maximum vector length of 2048 bits but also the value possible if all the bits reserved in the architecture for expansion of the LEN field were used, 16384 bits.
This starts creating problems if we try to allocate anything for the ZA matrix based on the maximum possible vector length, as we do for the regset used with ptrace during the process of generating a core dump. While the maximum potential size for ZA with the current architecture is a reasonably managable 64K with the higher reserved limit ZA would be 64M which leads to entirely reasonable complaints from the memory management code when we try to allocate a buffer of that size. Avoid these issues by defining the actual maximum vector length for the architecture and using it for the SME regsets.
Also use the full ZA_PT_SIZE() with the header rather than just the actual register payload when specifying the size, fixing support for the largest vector lengths now that we have this new, lower define. With the SVE maximum this did not cause problems due to the extra headroom we had.
While we're at it add a comment clarifying why even though ZA is a single register we tell the regset code that it is a multi-register regset.
Reported-by: Qian Cai quic_qiancai@quicinc.com Signed-off-by: Mark Brown broonie@kernel.org Tested-by: Naresh Kamboju naresh.kamboju@linaro.org Link: https://lore.kernel.org/r/20220505221517.1642014-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 12 ++++++++++++ arch/arm64/kernel/ptrace.c | 12 ++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 5d401aebcef5..153d10747128 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -32,6 +32,18 @@ #define VFP_STATE_SIZE ((32 * 8) + 4) #endif
+/* + * When we defined the maximum SVE vector length we defined the ABI so + * that the maximum vector length included all the reserved for future + * expansion bits in ZCR rather than those just currently defined by + * the architecture. While SME follows a similar pattern the fact that + * it includes a square matrix means that any allocations that attempt + * to cover the maximum potential vector length (such as happen with + * the regset used for ptrace) end up being extremely large. Define + * the much lower actual limit for use in such situations. + */ +#define SME_VQ_MAX 16 + struct task_struct;
extern void fpsimd_save_state(struct user_fpsimd_state *state); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index aa4347f4c6aa..46a806a88df9 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1407,7 +1407,7 @@ static const struct user_regset aarch64_regsets[] = { #ifdef CONFIG_ARM64_SME [REGSET_SSVE] = { /* Streaming mode SVE */ .core_note_type = NT_ARM_SSVE, - .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), + .n = DIV_ROUND_UP(SVE_PT_SIZE(SME_VQ_MAX, SVE_PT_REGS_SVE), SVE_VQ_BYTES), .size = SVE_VQ_BYTES, .align = SVE_VQ_BYTES, @@ -1416,7 +1416,15 @@ static const struct user_regset aarch64_regsets[] = { }, [REGSET_ZA] = { /* SME ZA */ .core_note_type = NT_ARM_ZA, - .n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES), + /* + * ZA is a single register but it's variably sized and + * the ptrace core requires that the size of any data + * be an exact multiple of the configured register + * size so report as though we had SVE_VQ_BYTES + * registers. These values aren't exposed to + * userspace. + */ + .n = DIV_ROUND_UP(ZA_PT_SIZE(SME_VQ_MAX), SVE_VQ_BYTES), .size = SVE_VQ_BYTES, .align = SVE_VQ_BYTES, .regset_get = za_get,
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 90807748ca3ac4874853b2148928529bf1f13e5e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
For the time being we do not support use of SME by KVM guests, support for this will be enabled in future. In order to prevent any side effects or side channels via the new system registers, including the EL0 read/write register TPIDR2, explicitly undefine all the system registers added by SME and mask out the SME bitfield in SYS_ID_AA64PFR1.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220419112247.711548-25-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/sys_regs.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 6c1bc564b91f..33692d669065 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1161,6 +1161,7 @@ static u64 read_id_reg(struct kvm_vcpu *vcpu, val |= ((u64)vcpu->kvm->arch.pfr0_csv2 << ID_AA64PFR0_CSV2_SHIFT); } else if (id == SYS_ID_AA64PFR1_EL1) { val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT); + val &= ~(0xfUL << ID_AA64PFR1_SME_SHIFT); } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) { val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) | (0xfUL << ID_AA64ISAR1_API_SHIFT) | @@ -1531,7 +1532,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,3), ID_SANITISED(ID_AA64ZFR0_EL1), - ID_UNALLOCATED(4,5), + ID_HIDDEN(ID_AA64SMFR0_EL1), ID_UNALLOCATED(4,6), ID_UNALLOCATED(4,7),
@@ -1573,6 +1574,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_GCR_EL1), undef_access },
{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility }, + { SYS_DESC(SYS_SMPRI_EL1), undef_access }, + { SYS_DESC(SYS_SMCR_EL1), undef_access }, { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 }, { SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 }, { SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 }, @@ -1639,8 +1642,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr }, { SYS_DESC(SYS_CLIDR_EL1), access_clidr }, + { SYS_DESC(SYS_SMIDR_EL1), undef_access }, { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 }, { SYS_DESC(SYS_CTR_EL0), access_ctr }, + { SYS_DESC(SYS_SVCR_EL0), undef_access },
{ SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, PMCR_EL0 }, { SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, PMCNTENSET_EL0 }, @@ -1662,6 +1667,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, + { SYS_DESC(SYS_TPIDR2_EL0), undef_access },
{ SYS_DESC(SYS_SCXTNUM_EL0), undef_access },
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 51729fb1d0683df5e9e4d5dbe2ec46188f011da9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
SME defines two new traps which need to be enabled for guests to ensure that they can't use SME, one for the main SME operations which mirrors the traps for SVE and another for access to TPIDR2 in SCTLR_EL2.
For VHE manage SMEN along with ZEN in activate_traps() and the FP state management callbacks, along with SCTLR_EL2.EnTPIDR2. There is no existing dynamic management of SCTLR_EL2.
For nVHE manage TSM in activate_traps() along with the fine grained traps for TPIDR2 and SMPRI. There is no existing dynamic management of fine grained traps.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220419112247.711548-26-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/hyp/nvhe/switch.c | 34 +++++++++++++++++++++++++++++++- arch/arm64/kvm/hyp/vhe/switch.c | 10 +++++++++- 2 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index c6d7451a6daf..87f969922fca 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -45,10 +45,24 @@ static void __activate_traps(struct kvm_vcpu *vcpu) val |= CPTR_EL2_TFP | CPTR_EL2_TZ; __activate_traps_fpsimd32(vcpu); } + if (cpus_have_final_cap(ARM64_SME)) + val |= CPTR_EL2_TSM;
write_sysreg(val, cptr_el2); write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el2);
+ if (cpus_have_final_cap(ARM64_SME)) { + val = read_sysreg_s(SYS_HFGRTR_EL2); + val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK | + HFGxTR_EL2_nSMPRI_EL1_MASK); + write_sysreg_s(val, SYS_HFGRTR_EL2); + + val = read_sysreg_s(SYS_HFGWTR_EL2); + val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK | + HFGxTR_EL2_nSMPRI_EL1_MASK); + write_sysreg_s(val, SYS_HFGWTR_EL2); + } + if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -68,6 +82,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) { extern char __kvm_hyp_host_vector[]; u64 mdcr_el2; + u64 cptr;
___deactivate_traps(vcpu);
@@ -97,7 +112,24 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
write_sysreg(mdcr_el2, mdcr_el2); write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2); - write_sysreg(CPTR_EL2_DEFAULT, cptr_el2); + + if (cpus_have_final_cap(ARM64_SME)) { + u64 val; + + val = read_sysreg_s(SYS_HFGRTR_EL2); + val |= HFGxTR_EL2_nTPIDR2_EL0_MASK | + HFGxTR_EL2_nSMPRI_EL1_MASK; + write_sysreg_s(val, SYS_HFGRTR_EL2); + + val = read_sysreg_s(SYS_HFGWTR_EL2); + val |= HFGxTR_EL2_nTPIDR2_EL0_MASK | + HFGxTR_EL2_nSMPRI_EL1_MASK; + write_sysreg_s(val, SYS_HFGWTR_EL2); + } + cptr = CPTR_EL2_DEFAULT; + if (cpus_have_final_cap(ARM64_SME)) + cptr &= ~CPTR_EL2_TSM; + write_sysreg(cptr, cptr_el2); write_sysreg(__kvm_hyp_host_vector, vbar_el2); }
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c index 532e687f6936..e109d6db41f3 100644 --- a/arch/arm64/kvm/hyp/vhe/switch.c +++ b/arch/arm64/kvm/hyp/vhe/switch.c @@ -43,7 +43,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1); val |= CPACR_EL1_TTA; - val &= ~CPACR_EL1_ZEN; + val &= ~(CPACR_EL1_ZEN | CPACR_EL1_SMEN_EL0EN | CPACR_EL1_SMEN_EL1EN);
/* * With VHE (HCR.E2H == 1), accesses to CPACR_EL1 are routed to @@ -64,6 +64,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu) __activate_traps_fpsimd32(vcpu); }
+ if (cpus_have_final_cap(ARM64_SME)) + write_sysreg(read_sysreg(sctlr_el2) & ~SCTLR_ELx_ENTP2, + sctlr_el2); + write_sysreg(val, cpacr_el1);
write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1); @@ -85,6 +89,10 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) */ asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
+ if (cpus_have_final_cap(ARM64_SME)) + write_sysreg(read_sysreg(sctlr_el2) | SCTLR_ELx_ENTP2, + sctlr_el2); + write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
if (!arm64_kernel_unmapped_at_el0())
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 861262ab862702061ae3355b811a07b15d1b2fc0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
While we don't currently support SME in guests we do currently support it for the host system so we need to take care of SME's impact, including the floating point register state, when running guests. Simiarly to SVE we need to manage the traps in CPACR_RL1, what is new is the handling of streaming mode and ZA.
Normally we defer any handling of the floating point register state until the guest first uses it however if the system is in streaming mode FPSIMD and SVE operations may generate SME traps which we would need to distinguish from actual attempts by the guest to use SME. Rather than do this for the time being if we are in streaming mode when entering the guest we force the floating point state to be saved immediately and exit streaming mode, meaning that the guest won't generate SME traps for supported operations.
We could handle ZA in the access trap similarly to the FPSIMD/SVE state without the disruption caused by streaming mode but for simplicity handle it the same way as streaming mode for now.
This will be revisited when we support SME for guests (hopefully before SME hardware becomes available), for now it will only incur additional cost on systems with SME and even there only if streaming mode or ZA are enabled.
Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220419112247.711548-27-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/fpsimd.c | 36 +++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index dfa409d640a3..d44d91e020a4 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -425,6 +425,7 @@ struct kvm_vcpu_arch { #define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */ #define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */ #define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */ +#define KVM_ARM64_HOST_SME_ENABLED (1 << 16) /* SME enabled for EL0 */
#define vcpu_has_sve(vcpu) (system_supports_sve() && \ ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index 80e54a600441..e4d1ee49bc2d 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -84,6 +84,26 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN) vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED; + + /* + * We don't currently support SME guests but if we leave + * things in streaming mode then when the guest starts running + * FPSIMD or SVE code it may generate SME traps so as a + * special case if we are in streaming mode we force the host + * state to be saved now and exit streaming mode so that we + * don't have to handle any SME traps for valid guest + * operations. Do this for ZA as well for now for simplicity. + */ + if (system_supports_sme()) { + if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) + vcpu->arch.flags |= KVM_ARM64_HOST_SME_ENABLED; + + if (read_sysreg_s(SYS_SVCR_EL0) & + (SYS_SVCR_EL0_SM_MASK | SYS_SVCR_EL0_ZA_MASK)) { + vcpu->arch.flags &= ~KVM_ARM64_FP_HOST; + fpsimd_save_and_flush_cpu_state(); + } + } }
/* @@ -125,6 +145,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
local_irq_save(flags);
+ /* + * If we have VHE then the Hyp code will reset CPACR_EL1 to + * CPACR_EL1_DEFAULT and we need to reenable SME. + */ + if (has_vhe() && system_supports_sme()) { + /* Also restore EL0 state seen on entry */ + if (vcpu->arch.flags & KVM_ARM64_HOST_SME_ENABLED) + sysreg_clear_set(CPACR_EL1, 0, + CPACR_EL1_SMEN_EL0EN | + CPACR_EL1_SMEN_EL1EN); + else + sysreg_clear_set(CPACR_EL1, + CPACR_EL1_SMEN_EL0EN, + CPACR_EL1_SMEN_EL1EN); + } + if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { if (guest_has_sve) { __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.19-rc3 commit 039f49c4cafb785504c678f28664d088e0108d35 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
On each vcpu load, we set the KVM_ARM64_HOST_SME_ENABLED flag if SME is enabled for EL0 on the host. This is used to restore the correct state on vpcu put.
However, it appears that nothing ever clears this flag. Once set, it will stick until the vcpu is destroyed, which has the potential to spuriously enable SME for userspace. As it turns out, this is due to the SME code being more or less copied from SVE, and inheriting the same shortcomings.
We never saw the issue because nothing uses SME, and the amount of testing is probably still pretty low.
Fixes: 861262ab8627 ("KVM: arm64: Handle SME host state when running guests") Signed-off-by: Marc Zyngier maz@kernel.org Reviwed-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220528113829.1043361-3-maz@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kvm/fpsimd.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index e4d1ee49bc2d..b3c9b06220f9 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -95,6 +95,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) * operations. Do this for ZA as well for now for simplicity. */ if (system_supports_sme()) { + vcpu->arch.flags &= ~KVM_ARM64_HOST_SME_ENABLED; if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) vcpu->arch.flags |= KVM_ARM64_HOST_SME_ENABLED;
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit f171f9e4097d29db88a99ea96bb6c08e819a52a4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Currently (as of DDI0487H.a) the architecture defines the vector length control field in ZCR and SMCR as being 4 bits wide with an additional 5 bits reserved above it marked as RAZ/WI for future expansion. The kernel currently attempts to anticipate such expansion by treating these extra bits as part of the LEN field but this will be inconvenient when we start generating the defines and would cause problems in the event that the architecture goes a different direction with these fields. Let's instead change the defines to reflect the currently defined architecture, we can update in future as needed.
No change in behaviour should be seen in any system, even emulated systems using the maximum allowed vector length for the current architecture.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220510161208.631259-2-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/sysreg.h | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 4691bc456bd5..e49cacb13034 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -1043,26 +1043,16 @@ #define DCZID_DZP_SHIFT 4 #define DCZID_BS_SHIFT 0
-/* - * The ZCR_ELx_LEN_* definitions intentionally include bits [8:4] which - * are reserved by the SVE architecture for future expansion of the LEN - * field, with compatible semantics. - */ #define ZCR_ELx_LEN_SHIFT 0 -#define ZCR_ELx_LEN_SIZE 9 -#define ZCR_ELx_LEN_MASK 0x1ff +#define ZCR_ELx_LEN_SIZE 4 +#define ZCR_ELx_LEN_MASK 0xf
#define SMCR_ELx_FA64_SHIFT 31 #define SMCR_ELx_FA64_MASK (1 << SMCR_ELx_FA64_SHIFT)
-/* - * The SMCR_ELx_LEN_* definitions intentionally include bits [8:4] which - * are reserved by the SME architecture for future expansion of the LEN - * field, with compatible semantics. - */ #define SMCR_ELx_LEN_SHIFT 0 -#define SMCR_ELx_LEN_SIZE 9 -#define SMCR_ELx_LEN_MASK 0x1ff +#define SMCR_ELx_LEN_SIZE 4 +#define SMCR_ELx_LEN_MASK 0xf
#define CPACR_EL1_SMEN_EL1EN (BIT(24)) /* enable EL1 access */ #define CPACR_EL1_SMEN_EL0EN (BIT(25)) /* enable EL0 access, if EL1EN set */
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit 5b06dcfd9e0a5dd63ecadf9169ee92a80b063322 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The SVE and SVE length configuration field LEN have constants specifying their width called _SIZE rather than the more normal _WIDTH, in preparation for automatic generation rename to _WIDTH. No functional change.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220510161208.631259-3-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/sysreg.h | 4 ++-- arch/arm64/kernel/cpufeature.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index e49cacb13034..88804a17cf34 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -1044,14 +1044,14 @@ #define DCZID_BS_SHIFT 0
#define ZCR_ELx_LEN_SHIFT 0 -#define ZCR_ELx_LEN_SIZE 4 +#define ZCR_ELx_LEN_WIDTH 4 #define ZCR_ELx_LEN_MASK 0xf
#define SMCR_ELx_FA64_SHIFT 31 #define SMCR_ELx_FA64_MASK (1 << SMCR_ELx_FA64_SHIFT)
#define SMCR_ELx_LEN_SHIFT 0 -#define SMCR_ELx_LEN_SIZE 4 +#define SMCR_ELx_LEN_WIDTH 4 #define SMCR_ELx_LEN_MASK 0xf
#define CPACR_EL1_SMEN_EL1EN (BIT(24)) /* enable EL1 access */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5fad051bacb3..b1f8a2b93b3d 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -541,13 +541,13 @@ static const struct arm64_ftr_bits ftr_id_dfr1[] = {
static const struct arm64_ftr_bits ftr_zcr[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, - ZCR_ELx_LEN_SHIFT, ZCR_ELx_LEN_SIZE, 0), /* LEN */ + ZCR_ELx_LEN_SHIFT, ZCR_ELx_LEN_WIDTH, 0), /* LEN */ ARM64_FTR_END, };
static const struct arm64_ftr_bits ftr_smcr[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, - SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_SIZE, 0), /* LEN */ + SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_WIDTH, 0), /* LEN */ ARM64_FTR_END, };
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit a6dab6cc0f4cd0b341f002ce7d0683701612f527 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
We currently have a non-standard SYS_ prefix in the constants generated for SMIDR_EL1 bitfields. Drop this in preparation for automatic register definition generation, no functional change.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220510161208.631259-4-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/sysreg.h | 6 +++--- arch/arm64/kernel/head.S | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 88804a17cf34..abb7c95275c0 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -400,9 +400,9 @@ #define SYS_SMIDR_EL1 sys_reg(3, 1, 0, 0, 6) #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
-#define SYS_SMIDR_EL1_IMPLEMENTER_SHIFT 24 -#define SYS_SMIDR_EL1_SMPS_SHIFT 15 -#define SYS_SMIDR_EL1_AFFINITY_SHIFT 0 +#define SMIDR_EL1_IMPLEMENTER_SHIFT 24 +#define SMIDR_EL1_SMPS_SHIFT 15 +#define SMIDR_EL1_AFFINITY_SHIFT 0
#define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index fdb7be3307b7..681c2f1f1288 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -681,7 +681,7 @@ set_hcr: msr_s SYS_SMCR_EL2, x1 // length for EL1.
mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported? - ubfx x1, x1, #SYS_SMIDR_EL1_SMPS_SHIFT, #1 + ubfx x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1 cbz x1, .Lskip_sme_@
msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit e65fc01bf271cefa6269b7ab1badcf7cddae5d40 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The bitfield definitions for SVCR have a SYS_ added to the names of the constant which will be a problem for automatic generation. Remove the prefixes, no functional change.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220510161208.631259-5-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 4 ++-- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/include/asm/sysreg.h | 4 ++-- arch/arm64/kernel/fpsimd.c | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 153d10747128..d72c8980936b 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -68,12 +68,12 @@ extern void fpsimd_save_and_flush_cpu_state(void);
static inline bool thread_sm_enabled(struct thread_struct *thread) { - return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK); + return system_supports_sme() && (thread->svcr & SVCR_EL0_SM_MASK); }
static inline bool thread_za_enabled(struct thread_struct *thread) { - return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_ZA_MASK); + return system_supports_sme() && (thread->svcr & SVCR_EL0_ZA_MASK); }
/* Maximum VL that SVE/SME VL-agnostic software can transparently support */ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 8b2bb5ce3256..a9f72b72276c 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -192,7 +192,7 @@ static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
static inline unsigned int thread_get_cur_vl(struct thread_struct *thread) { - if (system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK)) + if (system_supports_sme() && (thread->svcr & SVCR_EL0_SM_MASK)) return thread_get_sme_vl(thread); else return thread_get_sve_vl(thread); diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index abb7c95275c0..026fb7e37a0d 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -413,8 +413,8 @@ #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1)
#define SYS_SVCR_EL0 sys_reg(3, 3, 4, 2, 2) -#define SYS_SVCR_EL0_ZA_MASK 2 -#define SYS_SVCR_EL0_SM_MASK 1 +#define SVCR_EL0_ZA_MASK 2 +#define SVCR_EL0_SM_MASK 1
#define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1e1de099777c..e61a38a1fb6a 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1878,7 +1878,7 @@ void __efi_fpsimd_begin(void) svcr = read_sysreg_s(SYS_SVCR_EL0);
if (!system_supports_fa64()) - ffr = svcr & SYS_SVCR_EL0_SM_MASK; + ffr = svcr & SVCR_EL0_SM_MASK;
__this_cpu_write(efi_sm_state, ffr); } @@ -1889,7 +1889,7 @@ void __efi_fpsimd_begin(void)
if (system_supports_sme()) sysreg_clear_set_s(SYS_SVCR_EL0, - SYS_SVCR_EL0_SM_MASK, 0); + SVCR_EL0_SM_MASK, 0);
} else { fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state)); @@ -1924,7 +1924,7 @@ void __efi_fpsimd_end(void) if (__this_cpu_read(efi_sm_state)) { sysreg_clear_set_s(SYS_SVCR_EL0, 0, - SYS_SVCR_EL0_SM_MASK); + SVCR_EL0_SM_MASK); if (!system_supports_fa64()) ffr = efi_sm_state; }
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc1 commit ec0067a63e5a37de74025d46095cfe7a7af3114a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The defines for SVCR call it SVCR_EL0 however the architecture calls the register SVCR with no _EL0 suffix. In preparation for generating the sysreg definitions rename to match the architecture, no functional change.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220510161208.631259-6-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Conflicts: Leave kvm's modification Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/fpsimd.h | 4 ++-- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/include/asm/sysreg.h | 6 +++--- arch/arm64/kernel/fpsimd.c | 26 +++++++++++++------------- arch/arm64/kernel/ptrace.c | 8 ++++---- arch/arm64/kernel/signal.c | 14 +++++++------- arch/arm64/kernel/syscall.c | 4 ++-- arch/arm64/kvm/fpsimd.c | 4 ++-- arch/arm64/kvm/sys_regs.c | 2 +- 9 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index d72c8980936b..a2b340e9a9d2 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -68,12 +68,12 @@ extern void fpsimd_save_and_flush_cpu_state(void);
static inline bool thread_sm_enabled(struct thread_struct *thread) { - return system_supports_sme() && (thread->svcr & SVCR_EL0_SM_MASK); + return system_supports_sme() && (thread->svcr & SVCR_SM_MASK); }
static inline bool thread_za_enabled(struct thread_struct *thread) { - return system_supports_sme() && (thread->svcr & SVCR_EL0_ZA_MASK); + return system_supports_sme() && (thread->svcr & SVCR_ZA_MASK); }
/* Maximum VL that SVE/SME VL-agnostic software can transparently support */ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index a9f72b72276c..a29559483f26 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -192,7 +192,7 @@ static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
static inline unsigned int thread_get_cur_vl(struct thread_struct *thread) { - if (system_supports_sme() && (thread->svcr & SVCR_EL0_SM_MASK)) + if (system_supports_sme() && (thread->svcr & SVCR_SM_MASK)) return thread_get_sme_vl(thread); else return thread_get_sve_vl(thread); diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 026fb7e37a0d..f3ec3820f1fc 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -412,9 +412,9 @@ #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0) #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1)
-#define SYS_SVCR_EL0 sys_reg(3, 3, 4, 2, 2) -#define SVCR_EL0_ZA_MASK 2 -#define SVCR_EL0_SM_MASK 1 +#define SYS_SVCR sys_reg(3, 3, 4, 2, 2) +#define SVCR_ZA_MASK 2 +#define SVCR_SM_MASK 1
#define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index e61a38a1fb6a..649b6653a570 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -392,7 +392,7 @@ static void task_fpsimd_load(void) if (test_thread_flag(TIF_SME)) sme_set_vq(sve_vq_from_vl(sme_vl) - 1);
- write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0); + write_sysreg_s(current->thread.svcr, SYS_SVCR);
if (thread_za_enabled(¤t->thread)) za_load_state(current->thread.za_state); @@ -438,15 +438,15 @@ static void fpsimd_save(void)
if (system_supports_sme()) { u64 *svcr = last->svcr; - *svcr = read_sysreg_s(SYS_SVCR_EL0); + *svcr = read_sysreg_s(SYS_SVCR);
- *svcr = read_sysreg_s(SYS_SVCR_EL0); + *svcr = read_sysreg_s(SYS_SVCR);
- if (*svcr & SYS_SVCR_EL0_ZA_MASK) + if (*svcr & SVCR_ZA_MASK) za_save_state(last->za_state);
/* If we are in streaming mode override regular SVE. */ - if (*svcr & SYS_SVCR_EL0_SM_MASK) { + if (*svcr & SVCR_SM_MASK) { save_sve_regs = true; save_ffr = system_supports_fa64(); vl = last->sme_vl; @@ -828,8 +828,8 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, sve_to_fpsimd(task);
if (system_supports_sme() && type == ARM64_VEC_SME) { - task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK | - SYS_SVCR_EL0_ZA_MASK); + task->thread.svcr &= ~(SVCR_SM_MASK | + SVCR_ZA_MASK); clear_thread_flag(TIF_SME); }
@@ -1875,10 +1875,10 @@ void __efi_fpsimd_begin(void) __this_cpu_write(efi_sve_state_used, true);
if (system_supports_sme()) { - svcr = read_sysreg_s(SYS_SVCR_EL0); + svcr = read_sysreg_s(SYS_SVCR);
if (!system_supports_fa64()) - ffr = svcr & SVCR_EL0_SM_MASK; + ffr = svcr & SVCR_SM_MASK;
__this_cpu_write(efi_sm_state, ffr); } @@ -1888,8 +1888,8 @@ void __efi_fpsimd_begin(void) ffr);
if (system_supports_sme()) - sysreg_clear_set_s(SYS_SVCR_EL0, - SVCR_EL0_SM_MASK, 0); + sysreg_clear_set_s(SYS_SVCR, + SVCR_SM_MASK, 0);
} else { fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state)); @@ -1922,9 +1922,9 @@ void __efi_fpsimd_end(void) */ if (system_supports_sme()) { if (__this_cpu_read(efi_sm_state)) { - sysreg_clear_set_s(SYS_SVCR_EL0, + sysreg_clear_set_s(SYS_SVCR, 0, - SVCR_EL0_SM_MASK); + SVCR_SM_MASK); if (!system_supports_fa64()) ffr = efi_sm_state; } diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 46a806a88df9..3ee76c69e7f6 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -869,10 +869,10 @@ static int sve_set_common(struct task_struct *target,
switch (type) { case ARM64_VEC_SVE: - target->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK; + target->thread.svcr &= ~SVCR_SM_MASK; break; case ARM64_VEC_SME: - target->thread.svcr |= SYS_SVCR_EL0_SM_MASK; + target->thread.svcr |= SVCR_SM_MASK; break; default: WARN_ON_ONCE(1); @@ -1102,7 +1102,7 @@ static int za_set(struct task_struct *target,
/* If there is no data then disable ZA */ if (!count) { - target->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + target->thread.svcr &= ~SVCR_ZA_MASK; goto out; }
@@ -1127,7 +1127,7 @@ static int za_set(struct task_struct *target,
/* Mark ZA as active and let userspace use it */ set_tsk_thread_flag(target, TIF_SME); - target->thread.svcr |= SYS_SVCR_EL0_ZA_MASK; + target->thread.svcr |= SVCR_ZA_MASK;
out: fpsimd_flush_task_state(target); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 2bec57cafbec..3307896f19c1 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -240,7 +240,7 @@ int restore_sve_fpsimd_context(struct user_ctxs *user)
if (sve.head.size <= sizeof(*user->sve)) { clear_thread_flag(TIF_SVE); - current->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK; + current->thread.svcr &= ~SVCR_SM_MASK; goto fpsimd_only; }
@@ -273,7 +273,7 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) return -EFAULT;
if (sve.flags & SVE_SIG_FLAG_SM) - current->thread.svcr |= SYS_SVCR_EL0_SM_MASK; + current->thread.svcr |= SVCR_SM_MASK; else set_thread_flag(TIF_SVE);
@@ -344,7 +344,7 @@ int restore_za_context(struct user_ctxs __user *user) return -EINVAL;
if (za.head.size <= sizeof(*user->za)) { - current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + current->thread.svcr &= ~SVCR_ZA_MASK; return 0; }
@@ -365,7 +365,7 @@ int restore_za_context(struct user_ctxs __user *user)
sme_alloc(current); if (!current->thread.za_state) { - current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK; + current->thread.svcr &= ~SVCR_ZA_MASK; clear_thread_flag(TIF_SME); return -ENOMEM; } @@ -378,7 +378,7 @@ int restore_za_context(struct user_ctxs __user *user) return -EFAULT;
set_thread_flag(TIF_SME); - current->thread.svcr |= SYS_SVCR_EL0_ZA_MASK; + current->thread.svcr |= SVCR_ZA_MASK;
return 0; } @@ -706,8 +706,8 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
/* Signal handlers are invoked with ZA and streaming mode disabled */ if (system_supports_sme()) { - current->thread.svcr &= ~(SYS_SVCR_EL0_ZA_MASK | - SYS_SVCR_EL0_SM_MASK); + current->thread.svcr &= ~(SVCR_ZA_MASK | + SVCR_SM_MASK); sme_smstop(); }
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 14986b67054d..44eda63c0799 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -187,9 +187,9 @@ static inline void fp_user_discard(void) * need updating. */ if (system_supports_sme() && test_thread_flag(TIF_SME)) { - u64 svcr = read_sysreg_s(SYS_SVCR_EL0); + u64 svcr = read_sysreg_s(SYS_SVCR);
- if (svcr & SYS_SVCR_EL0_SM_MASK) + if (svcr & SVCR_SM_MASK) sme_smstop_sm(); }
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index b3c9b06220f9..28f7372a5230 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -99,8 +99,8 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN) vcpu->arch.flags |= KVM_ARM64_HOST_SME_ENABLED;
- if (read_sysreg_s(SYS_SVCR_EL0) & - (SYS_SVCR_EL0_SM_MASK | SYS_SVCR_EL0_ZA_MASK)) { + if (read_sysreg_s(SYS_SVCR) & + (SVCR_SM_MASK | SVCR_ZA_MASK)) { vcpu->arch.flags &= ~KVM_ARM64_FP_HOST; fpsimd_save_and_flush_cpu_state(); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 33692d669065..1be46b970647 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1645,7 +1645,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_SMIDR_EL1), undef_access }, { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 }, { SYS_DESC(SYS_CTR_EL0), access_ctr }, - { SYS_DESC(SYS_SVCR_EL0), undef_access }, + { SYS_DESC(SYS_SVCR), undef_access },
{ SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, PMCR_EL0 }, { SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, PMCNTENSET_EL0 },
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc2 commit a3d52ac7750025b5a1f99eb1ccea0e31b58bf7bb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
For both ID_AA64SMFR0_EL1.I16I64 and ID_AA64SMFR0_EL1.I8I32 we check for the presence of the feature by looking for a specific ID value of 0x4 but should instead be checking for the value 0xf defined by the architecture.
This had no practical effect since we are looking for values >= our define and the only valid values in the architecture are 0b0000 and 0b1111 so we would detect things appropriately with the architecture as it stands even with the incorrect defines.
Signed-off-by: Mark Brown broonie@kernel.org Fixes: b4adc83b0770 ("arm64/sme: System register and exception syndrome definitions") Link: https://lore.kernel.org/r/20220607165128.2833157-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/include/asm/sysreg.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index f3ec3820f1fc..b75a7697eee6 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -831,9 +831,9 @@ #define ID_AA64SMFR0_F32F32_SHIFT 32
#define ID_AA64SMFR0_FA64 0x1 -#define ID_AA64SMFR0_I16I64 0x4 +#define ID_AA64SMFR0_I16I64 0xf #define ID_AA64SMFR0_F64F64 0x1 -#define ID_AA64SMFR0_I8I32 0x4 +#define ID_AA64SMFR0_I8I32 0xf #define ID_AA64SMFR0_F16F32 0x1 #define ID_AA64SMFR0_B16F32 0x1 #define ID_AA64SMFR0_F32F32 0x1
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc2 commit f539316fe8106b4f4b4e95c1e70a31b545523b03 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Fix a cut'n'paste error.
Reported-by: Luis Machado luis.machado@arm.com Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220608115915.251870-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- Documentation/arm64/sme.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst index 8ba677b87e90..937147f58cc5 100644 --- a/Documentation/arm64/sme.rst +++ b/Documentation/arm64/sme.rst @@ -371,7 +371,7 @@ The regset data starts with struct user_za_header, containing: Appendix A. SME programmer's model (informative) =================================================
-This section provides a minimal description of the additions made by SVE to the +This section provides a minimal description of the additions made by SME to the ARMv8-A programmer's model that are relevant to this document.
Note: This section is for information only and not intended to be complete or
From: Mark Brown broonie@kernel.org
mainline inclusion from mainline-v5.19-rc2 commit 2e990e63220bb01e2755b55b93878ce7c8cbe747 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ITJT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
The EFI save/restore code is confused. When saving the check for saving FFR is inverted due to confusion with the streaming mode check, and when restoring we check if we need to restore FFR by checking the percpu efi_sm_state without the required wrapper rather than based on the combination of FA64 support and streaming mode.
Fixes: e0838f6373e5 ("arm64/sme: Save and restore streaming mode over EFI runtime calls") Reported-by: kernel test robot lkp@intel.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20220602124132.3528951-1-broonie@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com --- arch/arm64/kernel/fpsimd.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 649b6653a570..18b760388540 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1877,10 +1877,15 @@ void __efi_fpsimd_begin(void) if (system_supports_sme()) { svcr = read_sysreg_s(SYS_SVCR);
- if (!system_supports_fa64()) - ffr = svcr & SVCR_SM_MASK; + __this_cpu_write(efi_sm_state, + svcr & SVCR_SM_MASK);
- __this_cpu_write(efi_sm_state, ffr); + /* + * Unless we have FA64 FFR does not + * exist in streaming mode. + */ + if (!system_supports_fa64()) + ffr = !(svcr & SVCR_SM_MASK); }
sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()), @@ -1925,8 +1930,13 @@ void __efi_fpsimd_end(void) sysreg_clear_set_s(SYS_SVCR, 0, SVCR_SM_MASK); + + /* + * Unless we have FA64 FFR does not + * exist in streaming mode. + */ if (!system_supports_fa64()) - ffr = efi_sm_state; + ffr = false; } }