New subject: [PATCH OLK-5.10 01/13] crypto: tcrypt - Fix missing return value check

13 Dec 2021

      From: 沈子俊 <shenzijun@kylinos.cn>

This patchset support test GCM/CCM mode for SM4. The GCM/CCM mode of
SM4 is defined in the RFC 8998 specification:
https://datatracker.ietf.org/doc/html/rfc8998

This patchset extracts the public SM4 algorithm as a separate library,
At the same time, the acceleration implementation of SM4 in arm64 was
adjusted to adapt to this SM4 library. Then introduces an accelerated
implementation of the AESNI/AVX and AESNI/AVX2 on x86_64. The
AESNI/AVX2 implementation reuses the function of AESNI/AVX.

The optimization supports the four modes of SM4, ECB, CBC, CFB, and
CTR. Since CBC and CFB do not support multiple block parallel
encryption, the optimization effect is not obvious. And all selftests
have passed already.

The main algorithm implementation comes from SM4 AES-NI work by
libgcrypt and Markku-Juhani O. Saarinen at:
https://github.com/mjosaarinen/sm4ni

Finally, add the configuration for accelerated of SM4. And introduce
some bugfix about AESNI/AVX2 accelerated implementation.

Benchmark on Intel i5-6200U 2.30GHz, performance data of three
implementation methods, pure software sm4-generic, aesni/avx
acceleration, and aesni/avx2 acceleration, the data comes from
the 218 mode and 518 mode of tcrypt. The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

block-size  |    16      64     128     256    1024    1420    4096
sm4-generic
    ECB enc | 60.94   70.41   72.27   73.02   73.87   73.58   73.59
    ECB dec | 61.87   70.53   72.15   73.09   73.89   73.92   73.86
    CBC enc | 56.71   66.31   68.05   69.84   70.02   70.12   70.24
    CBC dec | 54.54   65.91   68.22   69.51   70.63   70.79   70.82
    CFB enc | 57.21   67.24   69.10   70.25   70.73   70.52   71.42
    CFB dec | 57.22   64.74   66.31   67.24   67.40   67.64   67.58
    CTR enc | 59.47   68.64   69.91   71.02   71.86   71.61   71.95
    CTR dec | 59.94   68.77   69.95   71.00   71.84   71.55   71.95
sm4-aesni-avx
    ECB enc | 44.95  177.35  292.06  316.98  339.48  322.27  330.59
    ECB dec | 45.28  178.66  292.31  317.52  339.59  322.52  331.16
    CBC enc | 57.75   67.68   69.72   70.60   71.48   71.63   71.74
    CBC dec | 44.32  176.83  284.32  307.24  328.61  312.61  325.82
    CFB enc | 57.81   67.64   69.63   70.55   71.40   71.35   71.70
    CFB dec | 43.14  167.78  282.03  307.20  328.35  318.24  325.95
    CTR enc | 42.35  163.32  279.11  302.93  320.86  310.56  317.93
    CTR dec | 42.39  162.81  278.49  302.37  321.11  310.33  318.37
sm4-aesni-avx2
    ECB enc | 45.19  177.41  292.42  316.12  339.90  322.53  330.54
    ECB dec | 44.83  178.90  291.45  317.31  339.85  322.55  331.07
    CBC enc | 57.66   67.62   69.73   70.55   71.58   71.66   71.77
    CBC dec | 44.34  176.86  286.10  501.68  559.58  483.87  527.46
    CFB enc | 57.43   67.60   69.61   70.52   71.43   71.28   71.65
    CFB dec | 43.12  167.75  268.09  499.33  558.35  490.36  524.73
    CTR enc | 42.42  163.39  256.17  493.95  552.45  481.58  517.19
    CTR dec | 42.49  163.11  256.36  493.34  552.62  481.49  516.83

From the benchmark data, it can be seen that when the block size is
1024, compared to AVX acceleration, the performance achieved by AVX2
has increased by about 70%, it is also 7.7 times of the pure software
implementation of sm4-generic.

沈子俊 (13):
  crypto: tcrypt - Fix missing return value check
  crypto: testmgr - Add GCM/CCM mode test of SM4 algorithm
  crypto: tcrypt - add GCM/CCM mode test for SM4 algorithm
  crypto: sm4 - create SM4 library based on sm4 generic code
  crypto: arm64/sm4-ce - Make dependent on sm4 library instead of
    sm4-generic
  crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation
  crypto: tcrypt - add the asynchronous speed test for SM4
  crypto: x86/sm4 - export reusable AESNI/AVX functions
  crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation
  Add the configuration for accelerated of SM4
  crypto: x86/sm4 - Fix frame pointer stack corruption
  crypto: sm4 - Do not change section of ck and sbox
  crypto: x86/sm4 - Fix invalid section entry size

 arch/arm64/crypto/Kconfig               |   2 +-
 arch/arm64/crypto/sm4-ce-glue.c         |  20 +-
 arch/x86/configs/openeuler_defconfig    |   2 +
 arch/x86/crypto/Makefile                |   6 +
 arch/x86/crypto/sm4-aesni-avx-asm_64.S  | 594 ++++++++++++++++++++++++
 arch/x86/crypto/sm4-aesni-avx2-asm_64.S | 501 ++++++++++++++++++++
 arch/x86/crypto/sm4-avx.h               |  24 +
 arch/x86/crypto/sm4_aesni_avx2_glue.c   | 169 +++++++
 arch/x86/crypto/sm4_aesni_avx_glue.c    | 487 +++++++++++++++++++
 crypto/Kconfig                          |  44 ++
 crypto/sm4_generic.c                    | 180 +------
 crypto/tcrypt.c                         |  99 +++-
 crypto/testmgr.c                        |  29 ++
 crypto/testmgr.h                        | 148 ++++++
 include/crypto/sm4.h                    |  25 +-
 lib/crypto/Kconfig                      |   3 +
 lib/crypto/Makefile                     |   3 +
 lib/crypto/sm4.c                        | 176 +++++++
 18 files changed, 2325 insertions(+), 187 deletions(-)
 create mode 100644 arch/x86/crypto/sm4-aesni-avx-asm_64.S
 create mode 100644 arch/x86/crypto/sm4-aesni-avx2-asm_64.S
 create mode 100644 arch/x86/crypto/sm4-avx.h
 create mode 100644 arch/x86/crypto/sm4_aesni_avx2_glue.c
 create mode 100644 arch/x86/crypto/sm4_aesni_avx_glue.c
 create mode 100644 lib/crypto/sm4.c

-- 
2.30.0

[PATCH OLK-5.10 00/13] Introduce AESNI/AVX and AESNI/AVX2 accelerated implementation for SM4 algorithm

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

shenzijun

Zheng Zengkai

shenzijun

shenzijun

Xie XiuQi

shenzijun

shenzijun

Zheng Zengkai

tags

participants (3)