Applied, thanks!
在 2021/12/13 11:34, shenzijun 写道:
From: 沈子俊 shenzijun@kylinos.cn
This patchset support test GCM/CCM mode for SM4. The GCM/CCM mode of SM4 is defined in the RFC 8998 specification: https://datatracker.ietf.org/doc/html/rfc8998
This patchset extracts the public SM4 algorithm as a separate library, At the same time, the acceleration implementation of SM4 in arm64 was adjusted to adapt to this SM4 library. Then introduces an accelerated implementation of the AESNI/AVX and AESNI/AVX2 on x86_64. The AESNI/AVX2 implementation reuses the function of AESNI/AVX.
The optimization supports the four modes of SM4, ECB, CBC, CFB, and CTR. Since CBC and CFB do not support multiple block parallel encryption, the optimization effect is not obvious. And all selftests have passed already.
The main algorithm implementation comes from SM4 AES-NI work by libgcrypt and Markku-Juhani O. Saarinen at: https://github.com/mjosaarinen/sm4ni
Finally, add the configuration for accelerated of SM4. And introduce some bugfix about AESNI/AVX2 accelerated implementation.
Benchmark on Intel i5-6200U 2.30GHz, performance data of three implementation methods, pure software sm4-generic, aesni/avx acceleration, and aesni/avx2 acceleration, the data comes from the 218 mode and 518 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s:
block-size | 16 64 128 256 1024 1420 4096 sm4-generic ECB enc | 60.94 70.41 72.27 73.02 73.87 73.58 73.59 ECB dec | 61.87 70.53 72.15 73.09 73.89 73.92 73.86 CBC enc | 56.71 66.31 68.05 69.84 70.02 70.12 70.24 CBC dec | 54.54 65.91 68.22 69.51 70.63 70.79 70.82 CFB enc | 57.21 67.24 69.10 70.25 70.73 70.52 71.42 CFB dec | 57.22 64.74 66.31 67.24 67.40 67.64 67.58 CTR enc | 59.47 68.64 69.91 71.02 71.86 71.61 71.95 CTR dec | 59.94 68.77 69.95 71.00 71.84 71.55 71.95 sm4-aesni-avx ECB enc | 44.95 177.35 292.06 316.98 339.48 322.27 330.59 ECB dec | 45.28 178.66 292.31 317.52 339.59 322.52 331.16 CBC enc | 57.75 67.68 69.72 70.60 71.48 71.63 71.74 CBC dec | 44.32 176.83 284.32 307.24 328.61 312.61 325.82 CFB enc | 57.81 67.64 69.63 70.55 71.40 71.35 71.70 CFB dec | 43.14 167.78 282.03 307.20 328.35 318.24 325.95 CTR enc | 42.35 163.32 279.11 302.93 320.86 310.56 317.93 CTR dec | 42.39 162.81 278.49 302.37 321.11 310.33 318.37 sm4-aesni-avx2 ECB enc | 45.19 177.41 292.42 316.12 339.90 322.53 330.54 ECB dec | 44.83 178.90 291.45 317.31 339.85 322.55 331.07 CBC enc | 57.66 67.62 69.73 70.55 71.58 71.66 71.77 CBC dec | 44.34 176.86 286.10 501.68 559.58 483.87 527.46 CFB enc | 57.43 67.60 69.61 70.52 71.43 71.28 71.65 CFB dec | 43.12 167.75 268.09 499.33 558.35 490.36 524.73 CTR enc | 42.42 163.39 256.17 493.95 552.45 481.58 517.19 CTR dec | 42.49 163.11 256.36 493.34 552.62 481.49 516.83
From the benchmark data, it can be seen that when the block size is
1024, compared to AVX acceleration, the performance achieved by AVX2 has increased by about 70%, it is also 7.7 times of the pure software implementation of sm4-generic.
沈子俊 (13): crypto: tcrypt - Fix missing return value check crypto: testmgr - Add GCM/CCM mode test of SM4 algorithm crypto: tcrypt - add GCM/CCM mode test for SM4 algorithm crypto: sm4 - create SM4 library based on sm4 generic code crypto: arm64/sm4-ce - Make dependent on sm4 library instead of sm4-generic crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation crypto: tcrypt - add the asynchronous speed test for SM4 crypto: x86/sm4 - export reusable AESNI/AVX functions crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation Add the configuration for accelerated of SM4 crypto: x86/sm4 - Fix frame pointer stack corruption crypto: sm4 - Do not change section of ck and sbox crypto: x86/sm4 - Fix invalid section entry size
arch/arm64/crypto/Kconfig | 2 +- arch/arm64/crypto/sm4-ce-glue.c | 20 +- arch/x86/configs/openeuler_defconfig | 2 + arch/x86/crypto/Makefile | 6 + arch/x86/crypto/sm4-aesni-avx-asm_64.S | 594 ++++++++++++++++++++++++ arch/x86/crypto/sm4-aesni-avx2-asm_64.S | 501 ++++++++++++++++++++ arch/x86/crypto/sm4-avx.h | 24 + arch/x86/crypto/sm4_aesni_avx2_glue.c | 169 +++++++ arch/x86/crypto/sm4_aesni_avx_glue.c | 487 +++++++++++++++++++ crypto/Kconfig | 44 ++ crypto/sm4_generic.c | 180 +------ crypto/tcrypt.c | 99 +++- crypto/testmgr.c | 29 ++ crypto/testmgr.h | 148 ++++++ include/crypto/sm4.h | 25 +- lib/crypto/Kconfig | 3 + lib/crypto/Makefile | 3 + lib/crypto/sm4.c | 176 +++++++ 18 files changed, 2325 insertions(+), 187 deletions(-) create mode 100644 arch/x86/crypto/sm4-aesni-avx-asm_64.S create mode 100644 arch/x86/crypto/sm4-aesni-avx2-asm_64.S create mode 100644 arch/x86/crypto/sm4-avx.h create mode 100644 arch/x86/crypto/sm4_aesni_avx2_glue.c create mode 100644 arch/x86/crypto/sm4_aesni_avx_glue.c create mode 100644 lib/crypto/sm4.c