[PATCH 00/11] uadk: Fixing compression-related issues and functional enhancements
This patch set primarily fixes multiple issues related to compression in the uadk driver and adds some functional enhancements. Main fixes include: - Resolving the compression status issue when the storage buffer is not cleared - Moving the tail packet append function to the hisi_comp.c file - Removing redundant print information in v1/hisi_zip_udrv - Fixing the hashjoin key alignment size issue - Cleaning up wd_agg code - Addressing some code parameter issues - Fixing large number comparison issues - Correcting the HPRE parameter comparison method - Supporting empty final blocks in stream mode - Fixing long timeout issues in synchronous mode when a device failure occurs. Chenghai Huang (4): uadk: clear the literal length in the ctx of all LZ77 algorithms uadk: fix the status of compression when store buffer is not yet cleared uadk: move tail packet appending function to hisi_comp.c uadk: delete redundant print messages for v1/hisi_zip_udrv Longfang Liu (3): uadk: bugfix some code parameter issues. uadk: bugfix big number comparison issues uadk: bugfix HPRE parameter comparison method Wenkai Lin (2): uadk: fix for hashjoin key align size uadk: clean code for wd_agg ZongYu Wu (1): uadk: support empty final block for raw DEFLATE in stream mode lizhi (1): uadk/v1: fix long timeout of asymmetric algorithm in sync mode during device failure drv/hisi_comp.c | 111 +++++++++++++++++++++++++++++-------- drv/hisi_dae_join_gather.c | 2 +- drv/hisi_hpre.c | 23 +++++--- drv/hisi_sec.c | 2 +- include/drv/wd_comp_drv.h | 2 + include/wd_internal.h | 2 + libwd.map | 1 + v1/drv/hisi_zip_udrv.c | 33 ++++------- v1/wd_dh.c | 12 ++-- v1/wd_ecc.c | 12 ++-- v1/wd_rsa.c | 14 +++-- v1/wd_util.c | 31 +++++++++++ v1/wd_util.h | 2 + wd.c | 19 +++++++ wd_aead.c | 4 +- wd_agg.c | 34 ++++++------ wd_cipher.c | 2 +- wd_comp.c | 72 +----------------------- wd_join_gather.c | 2 +- 19 files changed, 218 insertions(+), 162 deletions(-) -- 2.33.0
From: Chenghai Huang <huangchenghai2@huawei.com> All LZ77 algorithms need to clear the literal length in the context to avoid coupling of literal data between data segments in the stream mode. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drv/hisi_comp.c | 7 +++---- v1/drv/hisi_zip_udrv.c | 7 +++---- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drv/hisi_comp.c b/drv/hisi_comp.c index a6b2947..eaa8153 100644 --- a/drv/hisi_comp.c +++ b/drv/hisi_comp.c @@ -961,11 +961,10 @@ static int fill_buf_lz77_zstd(handle_t h_qp, struct hisi_zip_sqe *sqe, else memcpy(msg->ctx_buf + CTX_REPCODE2_OFFSET, msg->ctx_buf + CTX_REPCODE1_OFFSET, REPCODE_SIZE); - - /* The literal length info of each bd needs to be cleared. */ - memset(ctx_buf + CTX_HW_REPCODE_OFFSET + REPCODE_SIZE, 0, - SEQ_LIT_LEN_SIZE); } + /* The literal length info of each bd needs to be cleared. */ + memset(ctx_buf + CTX_HW_REPCODE_OFFSET + REPCODE_SIZE, 0, + SEQ_LIT_LEN_SIZE); } fill_buf_size_lz77_zstd(sqe, in_size, lits_size, seq_avail_out); diff --git a/v1/drv/hisi_zip_udrv.c b/v1/drv/hisi_zip_udrv.c index 903df1c..6e2c8d4 100644 --- a/v1/drv/hisi_zip_udrv.c +++ b/v1/drv/hisi_zip_udrv.c @@ -705,11 +705,10 @@ static void fill_zip_sqe_hw_info_lz77_zstd(void *ssqe, struct wcrypto_comp_msg * else memcpy(msg->ctx_buf + CTX_REPCODE2_OFFSET, msg->ctx_buf + CTX_REPCODE1_OFFSET, REPCODE_SIZE); - - /* The literal length info of each bd needs to be cleared. */ - memset(msg->ctx_buf + CTX_HW_REPCODE_OFFSET + CTX_BUFFER_OFFSET + - REPCODE_SIZE, 0, SEQ_LIT_LEN_SIZE); } + /* The literal length info of each bd needs to be cleared. */ + memset(msg->ctx_buf + CTX_HW_REPCODE_OFFSET + CTX_BUFFER_OFFSET + + REPCODE_SIZE, 0, SEQ_LIT_LEN_SIZE); } sqe->isize = msg->isize; -- 2.33.0
From: Chenghai Huang <huangchenghai2@huawei.com> When tail data is being compress, if the buffer data is not cleared, a nospace or again status needs to be returned to notify the user to add output space. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drv/hisi_comp.c | 10 +++++----- v1/drv/hisi_zip_udrv.c | 8 ++++---- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drv/hisi_comp.c b/drv/hisi_comp.c index eaa8153..8fac2d1 100644 --- a/drv/hisi_comp.c +++ b/drv/hisi_comp.c @@ -320,7 +320,7 @@ static int check_store_buf(struct wd_comp_msg *msg) } else { /* Still data need to be copied */ buf->output_offset += copy_len; - req->status = WD_SUCCESS; + req->status = WD_EAGAIN; } return 1; @@ -347,10 +347,10 @@ static void copy_from_hw(struct wd_comp_msg *msg, struct hisi_comp_buf *buf) * The end flag is cached. It can be output only * after the data is completely copied to the output. */ - if (req->status == WD_STREAM_END) { - buf->status = WD_STREAM_END; - req->status = WD_EAGAIN; - } + if (req->status == WD_STREAM_END) + buf->status = req->status; + + req->status = WD_EAGAIN; } } diff --git a/v1/drv/hisi_zip_udrv.c b/v1/drv/hisi_zip_udrv.c index 6e2c8d4..f6c5ca9 100644 --- a/v1/drv/hisi_zip_udrv.c +++ b/v1/drv/hisi_zip_udrv.c @@ -210,10 +210,10 @@ static void copy_from_buf(struct wcrypto_comp_msg *msg, struct hisi_zip_buf *buf * The end flag is cached. It can be output only * after the data is completely copied to the output. */ - if (msg->status == WCRYPTO_DECOMP_END) { - buf->status = WCRYPTO_DECOMP_END; - msg->status = WCRYPTO_DECOMP_END_NOSPACE; - } + if (msg->status == WCRYPTO_DECOMP_END) + buf->status = msg->status; + + msg->status = WCRYPTO_DECOMP_END_NOSPACE; } } -- 2.33.0
From: Chenghai Huang <huangchenghai2@huawei.com> The tail packet padding function depends on the driver. Therefore, this function is moved to the driver layer for implementation, so as to remove unnecessary processes at the algorithm layer. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drv/hisi_comp.c | 89 ++++++++++++++++++++++++++++++++------- include/drv/wd_comp_drv.h | 2 + wd_comp.c | 72 +------------------------------ 3 files changed, 77 insertions(+), 86 deletions(-) diff --git a/drv/hisi_comp.c b/drv/hisi_comp.c index 8fac2d1..2386cf2 100644 --- a/drv/hisi_comp.c +++ b/drv/hisi_comp.c @@ -39,6 +39,8 @@ (((x) & 0x00ff0000) >> 8) | \ (((x) & 0xff000000) >> 24)) +#define cpu_to_be32(x) swab32(x) + #define STREAM_FLUSH_SHIFT 25 #define STREAM_POS_SHIFT 2 #define STREAM_MODE_SHIFT 1 @@ -110,6 +112,9 @@ #define CTX_HEAD_BIT_CNT_MASK 0xfC00 #define WIN_LEN_ALIGN(len) (((len) + 15) & ~(__u32)0x0F) +#define STORE_BLOCK_SIZE 5 +static const unsigned char store_block[STORE_BLOCK_SIZE] = {0x1, 0x00, 0x00, 0xff, 0xff}; + enum alg_type { HW_DEFLATE = 0x1, HW_ZLIB, @@ -143,8 +148,6 @@ enum lz77_compress_status { }; struct hisi_comp_buf { - /* Denoted whether the output is copied from the storage buffer */ - bool skip_hw; /* Denoted internal store buf */ __u8 dst[STORE_BUF_SIZE]; /* Denoted data size left in uadk */ @@ -310,7 +313,7 @@ static int check_store_buf(struct wd_comp_msg *msg) copy_len = copy_to_out(msg, buf, buf->pending_out); buf->pending_out -= copy_len; msg->produced = copy_len; - buf->skip_hw = true; + msg->skip_hw = true; if (!buf->pending_out) { /* All data copied to output */ @@ -369,6 +372,58 @@ static int check_enable_store_buf(struct wd_comp_msg *msg, __u32 out_size, int h return 0; } +static unsigned int bit_reverse(register unsigned int target) +{ + register unsigned int x = target; + + x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); + x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); + x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); + x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); + + return ((x >> 16) | (x << 16)); +} + +/** + * append_store_block() - output an fixed store block when input + * a empty block as last stream block. And supplement the packet + * tail according to the protocol. + * @msg: The last msg which is empty. + */ +static int append_store_block(struct wd_comp_msg *msg) +{ + struct wd_comp_req *req = &msg->req; + __u32 checksum = msg->checksum; + __u32 isize = msg->isize; + + if (msg->alg_type == WD_ZLIB) { + if (unlikely(msg->avail_out < STORE_BLOCK_SIZE + sizeof(checksum))) + return -WD_EINVAL; + memcpy(req->dst, store_block, STORE_BLOCK_SIZE); + checksum = (__u32)cpu_to_be32(checksum); + /* if zlib, ADLER32 */ + memcpy(req->dst + STORE_BLOCK_SIZE, &checksum, sizeof(checksum)); + msg->produced = STORE_BLOCK_SIZE + sizeof(checksum); + } else if (msg->alg_type == WD_GZIP) { + if (unlikely(msg->avail_out < STORE_BLOCK_SIZE + + sizeof(checksum) + sizeof(isize))) + return -WD_EINVAL; + memcpy(req->dst, store_block, STORE_BLOCK_SIZE); + checksum = ~checksum; + checksum = bit_reverse(checksum); + /* if gzip, CRC32 and ISIZE */ + memcpy(req->dst + STORE_BLOCK_SIZE, &checksum, sizeof(checksum)); + memcpy(req->dst + STORE_BLOCK_SIZE + sizeof(checksum), + &isize, sizeof(isize)); + msg->produced = STORE_BLOCK_SIZE + sizeof(checksum) + sizeof(isize); + } + + req->status = 0; + msg->skip_hw = true; + + return 0; +} + static int get_sgl_from_pool(handle_t h_qp, struct comp_sgl *c_sgl, struct wd_mm_ops *mm_ops) { handle_t h_sgl_pool; @@ -1547,6 +1602,10 @@ static int hisi_zip_comp_send(struct wd_alg_driver *drv, handle_t ctx, void *com if (ret) return 0; + if (msg->stream_mode == WD_COMP_STATEFUL && msg->alg_type <= WD_GZIP && + msg->req.op_type == WD_DIR_COMPRESS && msg->req.last == 1 && msg->req.src_len == 0) + return append_store_block(msg); + hisi_set_msg_id(h_qp, &msg->tag); ret = fill_zip_comp_sqe(qp, msg, &sqe); if (unlikely(ret < 0)) { @@ -1731,16 +1790,13 @@ static int hisi_zip_comp_recv(struct wd_alg_driver *drv, handle_t ctx, void *com __u16 count = 0; int ret; - if (recv_msg->ctx_buf) { - buf = (struct hisi_comp_buf *)(recv_msg->ctx_buf + CTX_STOREBUF_OFFSET); - /* - * The output has been copied from the storage buffer, - * and no data need to be received. - */ - if (buf->skip_hw) { - buf->skip_hw = false; - return 0; - } + /* + * The output has been copied from the storage buffer, + * and no data need to be received. + */ + if (recv_msg->skip_hw) { + recv_msg->skip_hw = false; + return 0; } ret = hisi_qm_recv(h_qp, &sqe, 1, &count); @@ -1752,8 +1808,11 @@ static int hisi_zip_comp_recv(struct wd_alg_driver *drv, handle_t ctx, void *com return ret; /* There are data in buf, copy to output */ - if (buf && buf->pending_out) - copy_from_hw(recv_msg, buf); + if (recv_msg->ctx_buf) { + buf = (struct hisi_comp_buf *)(recv_msg->ctx_buf + CTX_STOREBUF_OFFSET); + if (buf->pending_out) + copy_from_hw(recv_msg, buf); + } return 0; } diff --git a/include/drv/wd_comp_drv.h b/include/drv/wd_comp_drv.h index 2311d79..068a168 100644 --- a/include/drv/wd_comp_drv.h +++ b/include/drv/wd_comp_drv.h @@ -66,6 +66,8 @@ struct wd_comp_msg { __u32 checksum; /* Request identifier */ __u32 tag; + /* Skip hardware reception */ + bool skip_hw; }; struct wd_comp_msg *wd_comp_get_msg(__u32 idx, __u32 tag); diff --git a/wd_comp.c b/wd_comp.c index c67b7f1..00445b2 100644 --- a/wd_comp.c +++ b/wd_comp.c @@ -20,14 +20,6 @@ #define WD_ZLIB_HEADER_SZ 2 #define WD_GZIP_HEADER_SZ 10 -#define swap_byte(x) \ - ((((x) & 0x000000ff) << 24) | \ - (((x) & 0x0000ff00) << 8) | \ - (((x) & 0x00ff0000) >> 8) | \ - (((x) & 0xff000000) >> 24)) - -#define cpu_to_be32(x) swap_byte(x) - static const char *wd_comp_alg_name[WD_COMP_ALG_MAX] = { "zlib", "gzip", "deflate", "lz77_zstd", "lz4", "lz77_only" }; @@ -789,70 +781,12 @@ int wd_do_comp_sync2(handle_t h_sess, struct wd_comp_req *req) return 0; } -static unsigned int bit_reverse(register unsigned int target) -{ - register unsigned int x = target; - - x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); - x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); - x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); - x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); - - return ((x >> 16) | (x << 16)); -} - -/** - * append_store_block() - output an fixed store block when input - * a empty block as last stream block. And supplement the packet - * tail according to the protocol. - * @sess: The session which request will be sent to. - * @req: The last request which is empty. - */ -static int append_store_block(struct wd_comp_sess *sess, - struct wd_comp_req *req) -{ - unsigned char store_block[5] = {0x1, 0x00, 0x00, 0xff, 0xff}; - int blocksize = ARRAY_SIZE(store_block); - __u32 checksum = sess->checksum; - __u32 isize = sess->isize; - - if (sess->alg_type == WD_ZLIB) { - if (unlikely(req->dst_len < blocksize + sizeof(checksum))) - return -WD_EINVAL; - memcpy(req->dst, store_block, blocksize); - req->dst_len = blocksize; - checksum = (__u32) cpu_to_be32(checksum); - /* if zlib, ADLER32 */ - memcpy(req->dst + blocksize, &checksum, sizeof(checksum)); - req->dst_len += sizeof(checksum); - } else if (sess->alg_type == WD_GZIP) { - if (unlikely(req->dst_len < blocksize + - sizeof(checksum) + sizeof(isize))) - return -WD_EINVAL; - memcpy(req->dst, store_block, blocksize); - req->dst_len = blocksize; - checksum = ~checksum; - checksum = bit_reverse(checksum); - /* if gzip, CRC32 and ISIZE */ - memcpy(req->dst + blocksize, &checksum, sizeof(checksum)); - memcpy(req->dst + blocksize + sizeof(checksum), - &isize, sizeof(isize)); - req->dst_len += sizeof(checksum); - req->dst_len += sizeof(isize); - } - - req->status = 0; - sess->stream_pos = WD_COMP_STREAM_NEW; - - return 0; -} - static void wd_do_comp_strm_end_check(struct wd_comp_sess *sess, struct wd_comp_req *req, __u32 src_len) { if (req->op_type == WD_DIR_COMPRESS && req->last == 1 && - req->src_len == src_len) + req->src_len == src_len && req->status == WD_SUCCESS) sess->stream_pos = WD_COMP_STREAM_NEW; else if (req->op_type == WD_DIR_DECOMPRESS && req->status == WD_STREAM_END) @@ -875,10 +809,6 @@ int wd_do_comp_strm(handle_t h_sess, struct wd_comp_req *req) return -WD_EINVAL; } - if (sess->alg_type <= WD_GZIP && req->op_type == WD_DIR_COMPRESS && - req->last == 1 && req->src_len == 0) - return append_store_block(sess, req); - fill_comp_msg(sess, &msg, req); msg.stream_pos = sess->stream_pos; msg.ctx_buf = sess->ctx_buf; -- 2.33.0
From: Chenghai Huang <huangchenghai2@huawei.com> When the output space is too large, the function is not affected. Therefore, the printf of displaying a large number of messages is deleted. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- v1/drv/hisi_zip_udrv.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/v1/drv/hisi_zip_udrv.c b/v1/drv/hisi_zip_udrv.c index f6c5ca9..6761b9c 100644 --- a/v1/drv/hisi_zip_udrv.c +++ b/v1/drv/hisi_zip_udrv.c @@ -315,11 +315,9 @@ int qm_fill_zip_sqe(void *smsg, struct qm_queue_info *info, __u16 i) WD_ERR("The in_len is out of range in_len(%u)!\n", msg->in_size); return -WD_EINVAL; } - if (unlikely(msg->data_fmt != WD_SGL_BUF && msg->avail_out > MAX_BUFFER_SIZE)) { - WD_ERR("warning: avail_out is out of range (%u), will set 8MB size max!\n", - msg->avail_out); + if (unlikely(msg->data_fmt != WD_SGL_BUF && msg->avail_out > MAX_BUFFER_SIZE)) msg->avail_out = MAX_BUFFER_SIZE; - } + sqe->input_data_length = msg->in_size; sqe->dest_avail_out = msg->avail_out; @@ -500,11 +498,8 @@ static int fill_zip_buffer_size_deflate(void *ssqe, struct wcrypto_comp_msg *msg } if (unlikely(msg->data_fmt != WD_SGL_BUF && - msg->avail_out > MAX_BUFFER_SIZE)) { - WD_ERR("warning: avail_out is out of range (%u), will set 8MB size max!\n", - msg->avail_out); + msg->avail_out > MAX_BUFFER_SIZE)) msg->avail_out = MAX_BUFFER_SIZE; - } sqe->input_data_length = msg->in_size; sqe->dest_avail_out = msg->avail_out; @@ -547,11 +542,8 @@ static int fill_zip_buffer_size_zstd(void *ssqe, struct wcrypto_comp_msg *msg) /* fill the sequences output size */ sqe->dest_avail_out = zstd_out->seq_sz; } else { - if (unlikely(msg->avail_out > MAX_BUFFER_SIZE)) { - WD_ERR("warning: avail_out is out of range (%u), will set 8MB size max!\n", - msg->avail_out); - msg->avail_out = MAX_BUFFER_SIZE; - } + if (unlikely(msg->avail_out > MAX_BUFFER_SIZE + lit_size)) + msg->avail_out = MAX_BUFFER_SIZE + lit_size; /* * For lz77_zstd, the hardware need 784 Bytes buffer to output -- 2.33.0
From: lizhi <lizhi206@huawei.com> 1. Add max timeout control and adaptive backoff to prevent long usleep delays on device failure 2. Implement exponential backoff to reduce context switching overhead under high load. Signed-off-by: lizhi <lizhi206@huawei.com> --- v1/wd_dh.c | 12 +++++++----- v1/wd_ecc.c | 12 +++++++----- v1/wd_rsa.c | 14 ++++++++------ v1/wd_util.c | 31 +++++++++++++++++++++++++++++++ v1/wd_util.h | 2 ++ 5 files changed, 55 insertions(+), 16 deletions(-) diff --git a/v1/wd_dh.c b/v1/wd_dh.c index 12f7b19..e8ecf5d 100644 --- a/v1/wd_dh.c +++ b/v1/wd_dh.c @@ -363,6 +363,8 @@ int wcrypto_do_dh(void *ctx, struct wcrypto_dh_op_data *opdata, void *tag) struct wcrypto_dh_cookie *cookie; struct wcrypto_dh_msg *req; uint32_t rx_cnt = 0; + __u64 slept = 0; + bool is_timeout; int ret; ret = do_dh_prepare(opdata, &cookie, ctxt, &req, tag); @@ -383,14 +385,14 @@ int wcrypto_do_dh(void *ctx, struct wcrypto_dh_op_data *opdata, void *tag) if (ret > 0) { break; } else if (!ret) { - if (unlikely(rx_cnt++ >= DH_RECV_MAX_CNT)) { - WD_ERR("failed to receive: timeout!\n"); + is_timeout = wd_adaptive_backoff_sleep(balance, DH_BALANCE_THRHD, + rx_cnt, &slept); + if (unlikely(rx_cnt++ >= DH_RECV_MAX_CNT || is_timeout)) { + WD_ERR("dh recv timeout: rx_cnt = %u, slept = %llu us\n", + rx_cnt, slept); ret = -WD_ETIMEDOUT; goto fail_with_cookie; } - - if (balance > DH_BALANCE_THRHD) - usleep(1); } else { WD_ERR("do dh wd_recv err!\n"); goto fail_with_cookie; diff --git a/v1/wd_ecc.c b/v1/wd_ecc.c index bb65dfb..aec13e8 100644 --- a/v1/wd_ecc.c +++ b/v1/wd_ecc.c @@ -1558,6 +1558,8 @@ static int ecc_sync_recv(struct wcrypto_ecc_ctx *ctx, { struct wcrypto_ecc_msg *resp; __u32 rx_cnt = 0; + __u64 slept = 0; + bool is_timeout; int ret; resp = (void *)(uintptr_t)ctx->ctx_id; @@ -1567,13 +1569,13 @@ static int ecc_sync_recv(struct wcrypto_ecc_ctx *ctx, if (ret > 0) { break; } else if (!ret) { - if (rx_cnt++ >= ECC_RECV_MAX_CNT) { - WD_ERR("failed to recv: timeout!\n"); + is_timeout = wd_adaptive_backoff_sleep(balance, ECC_BALANCE_THRHD, + rx_cnt, &slept); + if (unlikely(rx_cnt++ >= ECC_RECV_MAX_CNT || is_timeout)) { + WD_ERR("ecc recv timeout: rx_cnt = %u, slept = %llu us\n", + rx_cnt, slept); return -WD_ETIMEDOUT; } - - if (balance > ECC_BALANCE_THRHD) - usleep(1); } else { WD_ERR("failed to recv: error = %d!\n", ret); return ret; diff --git a/v1/wd_rsa.c b/v1/wd_rsa.c index 1703dd3..08e6151 100644 --- a/v1/wd_rsa.c +++ b/v1/wd_rsa.c @@ -1031,7 +1031,9 @@ int wcrypto_do_rsa(void *ctx, struct wcrypto_rsa_op_data *opdata, void *tag) struct wcrypto_rsa_ctx *ctxt = ctx; struct wcrypto_rsa_cookie *cookie; struct wcrypto_rsa_msg *req; - uint32_t rx_cnt = 0; + __u32 rx_cnt = 0; + __u64 slept = 0; + bool is_timeout; int ret; ret = do_rsa_prepare(ctxt, opdata, &cookie, &req, tag); @@ -1051,14 +1053,14 @@ int wcrypto_do_rsa(void *ctx, struct wcrypto_rsa_op_data *opdata, void *tag) if (ret > 0) { break; } else if (!ret) { - if (unlikely(rx_cnt++ >= RSA_RECV_MAX_CNT)) { - WD_ERR("failed to recv: timeout!\n"); + is_timeout = wd_adaptive_backoff_sleep(balance, RSA_BALANCE_THRHD, + rx_cnt, &slept); + if (unlikely(rx_cnt++ >= RSA_RECV_MAX_CNT || is_timeout)) { + WD_ERR("rsa recv timeout: rx_cnt = %u, slept = %llu us\n", + rx_cnt, slept); ret = -WD_ETIMEDOUT; goto fail_with_cookie; } - - if (balance > RSA_BALANCE_THRHD) - usleep(1); } else { WD_ERR("do rsa wd_recv err!\n"); goto fail_with_cookie; diff --git a/v1/wd_util.c b/v1/wd_util.c index 0bc9d04..eae148b 100644 --- a/v1/wd_util.c +++ b/v1/wd_util.c @@ -24,6 +24,37 @@ #define BYTE_TO_BIT 8 #define LOCK_TRY_CNT (0x800000000U) +#define MAX_SLEEP_SINGLE_US 1000000u +#define MAX_SLEEP_TOTAL_US 60000000u + +#define EXPONENTIAL_BASE 1u +#define BACKOFF_DIV_SHIFT 3 +#define MAX_SHIFT_LIMIT 15 + +bool wd_adaptive_backoff_sleep(int balance, int threshold, + __u32 cnt, __u64 *slept) +{ + __u32 shift; + __u32 delay; + + if (likely(balance <= threshold)) + return false; + + shift = cnt >> BACKOFF_DIV_SHIFT; + if (unlikely(shift >= MAX_SHIFT_LIMIT)) + delay = MAX_SLEEP_SINGLE_US; + else + delay = EXPONENTIAL_BASE << shift; + + if (unlikely(*slept + delay > MAX_SLEEP_TOTAL_US)) + return true; + + usleep(delay); + *slept += delay; + + return false; +} + void wd_spinlock(struct wd_lock *lock) { int val = 0; diff --git a/v1/wd_util.h b/v1/wd_util.h index 9767be4..efe2ed9 100644 --- a/v1/wd_util.h +++ b/v1/wd_util.h @@ -395,6 +395,8 @@ static inline uint32_t wd_reg_read(void *reg_addr) return *((uint32_t *)reg_addr); } +bool wd_adaptive_backoff_sleep(int balance, int threshold, + __u32 cnt, __u64 *slept); void wd_spinlock(struct wd_lock *lock); void wd_unspinlock(struct wd_lock *lock); void wd_fair_init(struct wd_fair_lock *lock); -- 2.33.0
From: Wenkai Lin <linwenkai6@hisilicon.com> The key align size of hashjoin is changed now, make it match the hardware. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> --- drv/hisi_dae_join_gather.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drv/hisi_dae_join_gather.c b/drv/hisi_dae_join_gather.c index 8c45f57..5a96d76 100644 --- a/drv/hisi_dae_join_gather.c +++ b/drv/hisi_dae_join_gather.c @@ -19,7 +19,7 @@ #define PROBE_INDEX_ROW_SIZE 4 /* align size */ -#define DAE_KEY_ALIGN_SIZE 4 +#define DAE_KEY_ALIGN_SIZE 8 #define DAE_BREAKPOINT_SIZE 81920 #define DAE_ADDR_INDEX_SHIFT 1 -- 2.33.0
From: Wenkai Lin <linwenkai6@hisilicon.com> Some clean code to improve the readability of the wd_agg code. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> --- wd_agg.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/wd_agg.c b/wd_agg.c index 8c54d10..503bd49 100644 --- a/wd_agg.c +++ b/wd_agg.c @@ -163,25 +163,25 @@ static int check_col_data_info(enum wd_dae_data_type type, __u16 col_data_info) } static int get_col_data_type_size(enum wd_dae_data_type type, __u16 col_data_info, - __u64 *col, __u32 idx) + __u64 *size) { switch (type) { case WD_DAE_DATE: case WD_DAE_INT: - col[idx] = DAE_INT_SIZE; + *size = DAE_INT_SIZE; break; case WD_DAE_LONG: case WD_DAE_SHORT_DECIMAL: - col[idx] = DAE_LONG_SIZE; + *size = DAE_LONG_SIZE; break; case WD_DAE_LONG_DECIMAL: - col[idx] = DAE_LONG_DECIMAL_SIZE; + *size = DAE_LONG_DECIMAL_SIZE; break; case WD_DAE_CHAR: - col[idx] = col_data_info; + *size = col_data_info; break; case WD_DAE_VARCHAR: - col[idx] = 0; + *size = 0; break; default: return -WD_EINVAL; @@ -324,19 +324,19 @@ static int fill_agg_session(struct wd_agg_sess *sess, struct wd_agg_sess_setup * for (i = 0; i < setup->key_cols_num; i++) (void)get_col_data_type_size(key[i].input_data_type, key[i].col_data_info, - sess->key_conf.data_size, i); + &sess->key_conf.data_size[i]); sess->agg_conf.data_size = sess->key_conf.data_size + setup->key_cols_num; for (i = 0; i < setup->agg_cols_num; i++) (void)get_col_data_type_size(agg[i].input_data_type, agg[i].col_data_info, - sess->agg_conf.data_size, i); + &sess->agg_conf.data_size[i]); sess->agg_conf.out_data_size = sess->agg_conf.data_size + setup->agg_cols_num; for (i = 0, k = 0; i < setup->agg_cols_num; i++) for (j = 0; j < agg[i].col_alg_num; j++, k++) (void)get_col_data_type_size(agg[i].output_data_types[j], agg[i].col_data_info, - sess->agg_conf.out_data_size, k); + &sess->agg_conf.out_data_size[k]); sess->key_conf.cols_num = setup->key_cols_num; sess->agg_conf.cols_num = setup->agg_cols_num; @@ -1350,7 +1350,7 @@ static int wd_agg_set_keycol_size(struct wd_agg_sess *sess, struct wd_dae_col_ad int ret; for (i = 0; i < sess->key_conf.cols_num; i++) { - ret = set_col_size_inner(key + i, expt, row_count, sess->key_conf.data_size[i], + ret = set_col_size_inner(&key[i], expt, row_count, sess->key_conf.data_size[i], sess->key_conf.cols_info[i].input_data_type); if (unlikely(ret)) return ret; @@ -1368,7 +1368,7 @@ static int wd_agg_set_aggcol_size(struct wd_agg_sess *sess, struct wd_dae_col_ad for (i = 0, k = 0; i < sess->agg_conf.cols_num; i++) { for (j = 0; j < sess->agg_conf.cols_info[i].col_alg_num; j++, k++) { - ret = set_col_size_inner(agg + k, expt, row_count, + ret = set_col_size_inner(&agg[k], expt, row_count, sess->agg_conf.out_data_size[k], sess->agg_conf.cols_info[i].output_data_types[j]); if (unlikely(ret)) @@ -1377,8 +1377,8 @@ static int wd_agg_set_aggcol_size(struct wd_agg_sess *sess, struct wd_dae_col_ad } if (sess->agg_conf.is_count_all) { - (void)get_col_data_type_size(sess->agg_conf.count_all_data_type, 0, &data_size, 0); - ret = set_col_size_inner(agg + k, expt, row_count, data_size, + (void)get_col_data_type_size(sess->agg_conf.count_all_data_type, 0, &data_size); + ret = set_col_size_inner(&agg[k], expt, row_count, data_size, sess->agg_conf.count_all_data_type); if (unlikely(ret)) return ret; -- 2.33.0
From: Longfang Liu <liulongfang@huawei.com> Fix parameter validation during the coding process. Adopt constant-time programming techniques for handling sensitive data to enhance code resilience against attacks. Signed-off-by: Longfang Liu <liulongfang@huawei.com> --- drv/hisi_hpre.c | 6 +++--- drv/hisi_sec.c | 2 +- include/wd_internal.h | 2 ++ libwd.map | 1 + wd.c | 19 +++++++++++++++++++ wd_aead.c | 4 ++-- wd_agg.c | 8 +++++--- wd_cipher.c | 2 +- wd_join_gather.c | 2 +- 9 files changed, 35 insertions(+), 11 deletions(-) diff --git a/drv/hisi_hpre.c b/drv/hisi_hpre.c index 3c41826..83d8195 100644 --- a/drv/hisi_hpre.c +++ b/drv/hisi_hpre.c @@ -1176,7 +1176,7 @@ static bool less_than_latter(struct wd_dtb *d, struct wd_dtb *n) return true; shift = n->bsize - n->dsize; - ret = memcmp(d->data + shift, n->data + shift, n->dsize); + ret = memcmp_consttime(d->data + shift, n->data + shift, n->dsize); if (ret < 0) return true; else @@ -2434,7 +2434,7 @@ static void sm2_xor(struct wd_dtb *val1, struct wd_dtb *val2) static int is_equal(struct wd_dtb *src, struct wd_dtb *dst) { if (src->dsize == dst->dsize && - !memcmp(src->data, dst->data, src->dsize)) { + !memcmp_consttime(src->data, dst->data, src->dsize)) { return 0; } @@ -2913,7 +2913,7 @@ static void ecc_sess_eops_params_cfg(struct wd_alg_driver *drv, if (key_size != SECP256R1_KEY_SIZE) return; - ret = memcmp(data, cv->p.data, SECP256R1_PARAM_SIZE); + ret = memcmp_consttime(data, cv->p.data, SECP256R1_PARAM_SIZE); if (!ret) ecc_ctx->enable_hpcore = 1; } diff --git a/drv/hisi_sec.c b/drv/hisi_sec.c index c8b831c..aaaaa1d 100644 --- a/drv/hisi_sec.c +++ b/drv/hisi_sec.c @@ -2958,7 +2958,7 @@ static int gcm_do_soft_mac(struct wd_aead_msg *msg) msg->mac[i] = g[i] ^ ctr_r[i]; if (msg->op_type == WD_CIPHER_DECRYPTION_DIGEST) { - ret = memcmp(msg->mac, msg->dec_mac, msg->auth_bytes); + ret = memcmp_consttime(msg->mac, msg->dec_mac, msg->auth_bytes); if (ret) { msg->result = WD_IN_EPARA; WD_ERR("failed to do the gcm authentication!\n"); diff --git a/include/wd_internal.h b/include/wd_internal.h index d899555..4f19d3a 100644 --- a/include/wd_internal.h +++ b/include/wd_internal.h @@ -64,6 +64,8 @@ struct wd_datalist { struct wd_datalist *next; }; +int memcmp_consttime(const void *s1, const void *s2, size_t n); + #ifdef __cplusplus } #endif diff --git a/libwd.map b/libwd.map index 1267a8d..0635198 100644 --- a/libwd.map +++ b/libwd.map @@ -70,5 +70,6 @@ global: wd_get_free_num; wd_get_fail_num; wd_get_bufsize; + memcmp_consttime; local: *; }; diff --git a/wd.c b/wd.c index 7f21dc0..9cdc70f 100644 --- a/wd.c +++ b/wd.c @@ -56,6 +56,25 @@ static const char * const zip_dae_algs[] = { "gather", }; +/** + * Constant-time memory comparison function (primarily used for equality verification) + * It only supports equality/inequality results rather than full comparison results, + * the main goal is to provide timing-safe equality checks + */ +int memcmp_consttime(const void *s1, const void *s2, size_t n) +{ + const unsigned char *p1 = (const unsigned char *)s1; + const unsigned char *p2 = (const unsigned char *)s2; + unsigned char diff = 0; + size_t i; + + /* Constant-time byte-wise XOR accumulation */ + for (i = 0; i < n; i++) + diff |= p1[i] ^ p2[i]; + + return diff; +} + static int wd_check_ctx_type(handle_t h_ctx) { struct wd_ctx_h *ctx = (struct wd_ctx_h *)h_ctx; diff --git a/wd_aead.c b/wd_aead.c index 8467409..74d652b 100644 --- a/wd_aead.c +++ b/wd_aead.c @@ -191,7 +191,7 @@ int wd_aead_set_ckey(handle_t h_sess, const __u8 *key, __u16 key_len) struct wd_aead_sess *sess = (struct wd_aead_sess *)h_sess; int ret; - if (unlikely(!key || !sess)) { + if (unlikely(!key || !sess || !sess->ckey)) { WD_ERR("failed to check cipher key input param!\n"); return -WD_EINVAL; } @@ -230,7 +230,7 @@ int wd_aead_set_akey(handle_t h_sess, const __u8 *key, __u16 key_len) } sess->akey_bytes = key_len; - if (key_len) + if (key_len && sess->akey) memcpy(sess->akey, key, key_len); return 0; diff --git a/wd_agg.c b/wd_agg.c index 503bd49..bb834cb 100644 --- a/wd_agg.c +++ b/wd_agg.c @@ -308,6 +308,7 @@ static int fill_agg_session(struct wd_agg_sess *sess, struct wd_agg_sess_setup * sess->key_conf.cols_info = malloc(key_size); if (!sess->key_conf.cols_info) return -WD_ENOMEM; + sess->agg_conf.cols_info = malloc(agg_size); if (!sess->agg_conf.cols_info) goto out_key; @@ -1504,8 +1505,7 @@ int wd_agg_rehash_sync(handle_t h_sess, struct wd_agg_req *req) if (ret) { __atomic_store_n(&sess->state, WD_AGG_SESS_RESET, __ATOMIC_RELEASE); WD_ERR("failed to do agg rehash task!\n"); - free(cols); - return ret; + goto free_cols; } if (req->output_done) break; @@ -1513,8 +1513,10 @@ int wd_agg_rehash_sync(handle_t h_sess, struct wd_agg_req *req) } __atomic_store_n(&sess->state, WD_AGG_SESS_INPUT, __ATOMIC_RELEASE); + +free_cols: free(cols); - return WD_SUCCESS; + return ret; } struct wd_agg_msg *wd_agg_get_msg(__u32 idx, __u32 tag) diff --git a/wd_cipher.c b/wd_cipher.c index 58656dc..1c54d72 100644 --- a/wd_cipher.c +++ b/wd_cipher.c @@ -147,7 +147,7 @@ static bool is_des_weak_key(const __u8 *key) int i; for (i = 0; i < DES_WEAK_KEY_NUM; i++) { - if (memcmp(des_weak_keys[i], key, DES_KEY_SIZE) == 0) + if (memcmp_consttime(des_weak_keys[i], key, DES_KEY_SIZE) == 0) return true; } diff --git a/wd_join_gather.c b/wd_join_gather.c index 915c1b8..37e0022 100644 --- a/wd_join_gather.c +++ b/wd_join_gather.c @@ -217,7 +217,7 @@ static int check_key_cols_info(struct wd_join_gather_sess_setup *setup) return -WD_EINVAL; } - ret = memcmp(table->build_key_cols, table->probe_key_cols, + ret = memcmp_consttime(table->build_key_cols, table->probe_key_cols, table->build_key_cols_num * sizeof(struct wd_join_gather_col_info)); if (ret) { WD_ERR("invalid: build and probe table key infomation is not same!\n"); -- 2.33.0
From: Longfang Liu <liulongfang@huawei.com> In the HPRE module's big number comparison, the previous implementation used memcmp to compare big numbers of the same length. On little-endian platforms, this approach compares data starting from the Least Significant Bit (LSB), which can lead to incorrect results—such as a larger value in the lower bits being misinterpreted as greater overall, even if the higher bits are smaller. To address this, the comparison logic must be modified to consistently start from the Most Significant Bit (MSB). Signed-off-by: Longfang Liu <liulongfang@huawei.com> --- drv/hisi_hpre.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drv/hisi_hpre.c b/drv/hisi_hpre.c index 83d8195..07ebe92 100644 --- a/drv/hisi_hpre.c +++ b/drv/hisi_hpre.c @@ -1168,19 +1168,25 @@ static bool big_than_one(const char *data, __u32 data_sz) static bool less_than_latter(struct wd_dtb *d, struct wd_dtb *n) { - int ret, shift; + unsigned char *d_data, *n_data; + int shift, i; if (d->dsize > n->dsize) return false; else if (d->dsize < n->dsize) return true; + /* d->dsize == n->dsize */ shift = n->bsize - n->dsize; - ret = memcmp_consttime(d->data + shift, n->data + shift, n->dsize); - if (ret < 0) - return true; - else - return false; + d_data = d->data + shift; + n_data = n->data + shift; + for (i = d->dsize - 1; i >= 0; i--) { + if (d_data[i] < n_data[i]) + return true; + else if (d_data[i] > n_data[i]) + return false; + } + return false; } static int ecc_prepare_prikey(struct wd_ecc_key *key, void **data, int id) -- 2.33.0
From: Longfang Liu <liulongfang@huawei.com> The HPRE driver must store and process data in big-endian (MSB) format as specified by the chip design. Therefore, the comparison logic for this data should also be described and implemented following the big-endian convention. Signed-off-by: Longfang Liu <liulongfang@huawei.com> --- drv/hisi_hpre.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drv/hisi_hpre.c b/drv/hisi_hpre.c index 07ebe92..e114da3 100644 --- a/drv/hisi_hpre.c +++ b/drv/hisi_hpre.c @@ -1169,7 +1169,7 @@ static bool big_than_one(const char *data, __u32 data_sz) static bool less_than_latter(struct wd_dtb *d, struct wd_dtb *n) { unsigned char *d_data, *n_data; - int shift, i; + __u32 shift, i; if (d->dsize > n->dsize) return false; @@ -1180,7 +1180,8 @@ static bool less_than_latter(struct wd_dtb *d, struct wd_dtb *n) shift = n->bsize - n->dsize; d_data = d->data + shift; n_data = n->data + shift; - for (i = d->dsize - 1; i >= 0; i--) { + /* Task parameter data must be stored in big-endian format in DDR */ + for (i = 0; i < d->dsize; i++) { if (d_data[i] < n_data[i]) return true; else if (d_data[i] > n_data[i]) -- 2.33.0
Add WD_DEFLATE branch in append_store_block() to output a 5-byte store block when stream compression receives an empty last packet Signed-off-by: ZongYu Wu <wuzongyu1@huawei.com> --- drv/hisi_comp.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drv/hisi_comp.c b/drv/hisi_comp.c index 2386cf2..97bcf39 100644 --- a/drv/hisi_comp.c +++ b/drv/hisi_comp.c @@ -416,6 +416,11 @@ static int append_store_block(struct wd_comp_msg *msg) memcpy(req->dst + STORE_BLOCK_SIZE + sizeof(checksum), &isize, sizeof(isize)); msg->produced = STORE_BLOCK_SIZE + sizeof(checksum) + sizeof(isize); + } else if (msg->alg_type == WD_DEFLATE) { + if (unlikely(msg->avail_out < STORE_BLOCK_SIZE)) + return -WD_EINVAL; + memcpy(req->dst, store_block, STORE_BLOCK_SIZE); + msg->produced = STORE_BLOCK_SIZE; } req->status = 0; -- 2.33.0
participants (1)
-
ZongYu Wu