hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZN72 CVE: NA
---------------------------
Generally, clear_page() clear the page through 'dc zva', while the page may not be used immediately mostly, so the cache flush is in vain.
Commit f0616abd4e67 ("arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1") introduced an alternative implementation by 'stnp', also bring performance profit in some case. Make it switchable through boot cmdline 'mm.use_clearpage_stnp'.
In the hugetlb clear test, we gained about 53.7% performance improvement:
Set mm.use_clearpage_stnp = 0 | Set mm.use_clearpage_stnp = 1 [root@localhost liwei]# ./a.out 50 20 | [root@localhost liwei]# ./a.out 50 20 size is 50 Gib, test times is 20 | size is 50 Gib, test times is 20 test_time[0] : use 8.438046 sec | test_time[0] : use 3.722682 sec test_time[1] : use 8.028493 sec | test_time[1] : use 3.640274 sec test_time[2] : use 8.646547 sec | test_time[2] : use 4.095052 sec test_time[3] : use 8.122490 sec | test_time[3] : use 3.998446 sec test_time[4] : use 8.053038 sec | test_time[4] : use 4.084259 sec test_time[5] : use 8.843512 sec | test_time[5] : use 3.933871 sec test_time[6] : use 8.308906 sec | test_time[6] : use 3.934334 sec test_time[7] : use 8.093817 sec | test_time[7] : use 3.869142 sec test_time[8] : use 8.303504 sec | test_time[8] : use 3.902916 sec test_time[9] : use 8.178336 sec | test_time[9] : use 3.541885 sec test_time[10] : use 8.003625 sec | test_time[10] : use 3.595554 sec test_time[11] : use 8.163807 sec | test_time[11] : use 3.583813 sec test_time[12] : use 8.267464 sec | test_time[12] : use 3.863033 sec test_time[13] : use 8.055326 sec | test_time[13] : use 3.770953 sec test_time[14] : use 8.246986 sec | test_time[14] : use 3.808006 sec test_time[15] : use 8.546992 sec | test_time[15] : use 3.653194 sec test_time[16] : use 8.727256 sec | test_time[16] : use 3.722395 sec test_time[17] : use 8.288951 sec | test_time[17] : use 3.683508 sec test_time[18] : use 8.019322 sec | test_time[18] : use 4.253087 sec test_time[19] : use 8.250685 sec | test_time[19] : use 4.082845 sec hugetlb test end! | hugetlb test end!
Signed-off-by: Wei Li liwei391@huawei.com --- arch/arm64/kernel/cpufeature.c | 24 ++++++++++++++++++++++++ arch/arm64/lib/clear_page.S | 3 ++- arch/arm64/tools/cpucaps | 1 + 3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 455e72ce080a..3517dd2b0358 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2242,6 +2242,24 @@ cpucap_panic_on_conflict(const struct arm64_cpu_capabilities *cap) return !!(cap->type & ARM64_CPUCAP_PANIC_ON_CONFLICT); }
+static bool use_clearpage_stnp; + +static int __init early_use_clearpage_stnp(char *p) +{ + return strtobool(p, &use_clearpage_stnp); +} +early_param("mm.use_clearpage_stnp", early_use_clearpage_stnp); + +static bool is_datazero_prohibited(const struct arm64_cpu_capabilities *entry, + int scope) +{ + /* Active if DC ZVA is prohibited */ + if (read_sysreg(dczid_el0) & BIT(DCZID_EL0_DZP_SHIFT)) + return true; + + return use_clearpage_stnp; +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .capability = ARM64_ALWAYS_BOOT, @@ -2722,6 +2740,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP) }, + { + .desc = "Data Zero Prohibited", + .capability = ARM64_HAS_NO_DCZVA, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = is_datazero_prohibited, + }, {}, };
diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S index ebde40e7fa2b..3884572296d6 100644 --- a/arch/arm64/lib/clear_page.S +++ b/arch/arm64/lib/clear_page.S @@ -7,6 +7,7 @@ #include <linux/const.h> #include <asm/assembler.h> #include <asm/page.h> +#include <asm/alternative.h>
/* * Clear page @dest @@ -16,7 +17,7 @@ */ SYM_FUNC_START(__pi_clear_page) mrs x1, dczid_el0 - tbnz x1, #4, 2f /* Branch if DC ZVA is prohibited */ + ALTERNATIVE("nop", "b 2f", ARM64_HAS_NO_DCZVA) and w1, w1, #0xf mov x2, #4 lsl x1, x2, x1 diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 569ecec76c16..88f1f4022b8f 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -39,6 +39,7 @@ HAS_LDAPR HAS_LSE_ATOMICS HAS_MOPS HAS_NESTED_VIRT +HAS_NO_DCZVA HAS_NO_FPSIMD HAS_NO_HW_PREFETCH HAS_PAN