The current uadk initialization process is: 1.Call wd_request_ctx() to request ctxs from devices. 2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). 3.Initialize the sched. 4.Call wd_<alg>_init() with ctx_config and sched.
Logic is reasonable. But in practice, the step of `wd_ request_ Ctx() ` and `wd_ sched_ rr_alloc() ` are very tedious. This makes it difficult for users to use the interface. One of the main reasons for this is that uadk has made a lot of configurations in the scheduler in order to provide users with better performance. Based on this consideration, the current uadk requires the user to arrange the division of hardware resources according to the device topology during initialization. Therefore, as a high-level interface, this scheme can provide customized scheme configuration for users with deep needs.
All algorithm initialization interfaces have the same input parameters and behavioral logic. The pre-processing of the wd_<alg>_init is actually the configuration of `struct wd_ctx_config` and `struct wd_sched`. Therefore, the next thing to be done is to use limited and easy-to-use input parameters to describe users' requirements on the two input parameters, ensuring that the functions of the new interface init2 are the same as those of init. For ease of description, v1 is used to refer to the existing interface, and v2 is used to refer to the layer of encapsulation.
At present, at least 4 parameters are required to meet the user configuration requirements with the V1 interface function remains unchanged. @device_list: The available uacce device list. Users can get it by wd_get_accel_list(). @numa_bitmask: The bitmask provided by libnuma. Users can use this parameter to control requesting ctxs devices in the bind NUMA scenario. @ctx_nums: The requested ctx number for each numa node. Due to users may have different requirements for different types of ctx numbers, needs a two-dimensional array as input. @sched_type: Scheduling type the user wants to use.
What's more, some users want uadk to provide the default value about input parameters for some performance insensitive scenes. C code has no way to.
Changelog:
v1->v2: - Update the desdescription about wd_<alg>_init in wd_design.md.
Yang Shen (6): uadk - support algorithms initialization reentry protect uadk/doc - update wd_alg_init support reentrancy uadk - support return error number as pointer uadk - mv some function to header file uadk/comp - add wd_comp_init2 uadk/docs - support a simple interface for initialization
Makefile.am | 4 +- docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++ docs/wd_design.md | 5 +- include/wd.h | 54 ++++++++- include/wd_alg_common.h | 24 ++++ include/wd_comp.h | 28 +++++ include/wd_util.h | 57 +++++++++ wd.c | 97 ++++++++++++--- wd_aead.c | 33 ++++-- wd_cipher.c | 35 ++++-- wd_comp.c | 257 ++++++++++++++++++++++++++++++++++++++-- wd_dh.c | 34 ++++-- wd_digest.c | 33 ++++-- wd_ecc.c | 33 ++++-- wd_rsa.c | 33 ++++-- wd_util.c | 75 +++++++++++- 16 files changed, 872 insertions(+), 106 deletions(-) create mode 100644 docs/wd_alg_init2.md
-- 2.24.0
The 'wd_<alg>_init()' is designed as non-reentrant. So add a status to protect for this situation.
When 'wd_<alg>_init()' is called, it will read the status at first. If the status is WD_UNINIT, it will set status as WD_INITING and change status to WD_INIT if succeed or reduction status to WD_UNINIT if something is wrong. If the status is WD_INIT, it can return directly. If the status is WD_INITING, that meaning other thread is initializing, so it need to wait for the result.
Signed-off-by: Yang Shen shenyang39@huawei.com --- include/wd_util.h | 38 ++++++++++++++++++++++++++++++++++++++ wd_aead.c | 33 ++++++++++++++++++++++----------- wd_cipher.c | 35 +++++++++++++++++++++++------------ wd_comp.c | 35 ++++++++++++++++++++++++----------- wd_dh.c | 34 ++++++++++++++++++++++------------ wd_digest.c | 33 ++++++++++++++++++++++----------- wd_ecc.c | 33 ++++++++++++++++++++++----------- wd_rsa.c | 33 ++++++++++++++++++++++----------- wd_util.c | 16 ++++++++++++++++ 9 files changed, 211 insertions(+), 79 deletions(-)
diff --git a/include/wd_util.h b/include/wd_util.h index 83a9684..3737f27 100644 --- a/include/wd_util.h +++ b/include/wd_util.h @@ -21,6 +21,12 @@ extern "C" { for (i = 0, config_numa = config->config_per_numa; \ i < config->numa_num; config_numa++, i++)
+enum wd_status { + WD_UNINIT, + WD_INITING, + WD_INIT, +}; + struct wd_async_msg_pool { struct msg_pool *pools; __u32 pool_num; @@ -356,6 +362,38 @@ int wd_handle_msg_sync(struct wd_msg_handle *msg_handle, handle_t ctx, */ int wd_init_param_check(struct wd_ctx_config *config, struct wd_sched *sched);
+/** + * wd_alg_try_init() - Check the algorithm status and set it as WD_INITING + * if need initialization. + * @status: algorithm initialization status. + * + * Return true if need initialization and false if initialized, otherwise will wait + * last initialization result. + */ +bool wd_alg_try_init(enum wd_status *status); + +/** + * wd_alg_set_init() - Set the algorithm status as WD_INIT. + * @status: algorithm initialization status. + */ +static inline void wd_alg_set_init(enum wd_status *status) +{ + enum wd_status setting = WD_INIT; + + __atomic_store(status, &setting, __ATOMIC_RELAXED); +} + +/** + * wd_alg_clear_init() - Set the algorithm status as WD_UNINIT. + * @status: algorithm initialization status. + */ +static inline void wd_alg_clear_init(enum wd_status *status) +{ + enum wd_status setting = WD_UNINIT; + + __atomic_store(status, &setting, __ATOMIC_RELAXED); +} + /** * wd_dfx_msg_cnt() - Message counter interface for ctx * @msg: Shared memory addr. diff --git a/wd_aead.c b/wd_aead.c index d43ace1..c00d8f9 100644 --- a/wd_aead.c +++ b/wd_aead.c @@ -28,6 +28,7 @@ static int g_aead_mac_len[WD_DIGEST_TYPE_MAX] = { };
struct wd_aead_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; struct wd_aead_driver *driver; @@ -389,24 +390,29 @@ static int aead_param_check(struct wd_aead_sess *sess, int wd_aead_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_aead_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_AEAD_EPOLL_EN", &wd_aead_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_aead_setting.config, config); if (ret) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_aead_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config;
/* set driver */ #ifdef WD_STATIC_DRV @@ -418,33 +424,37 @@ int wd_aead_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_aead_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* init ctx related resources in specific driver */ priv = calloc(1, wd_aead_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; } wd_aead_setting.priv = priv;
ret = wd_aead_setting.driver->init(&wd_aead_setting.config, priv); if (ret < 0) { WD_ERR("failed to init aead dirver!\n"); - goto out_init; + goto out_free_priv; }
+ wd_alg_set_init(&wd_aead_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_aead_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_aead_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_aead_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_aead_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_aead_setting.status); return ret; }
@@ -462,6 +472,7 @@ void wd_aead_uninit(void) wd_uninit_async_request_pool(&wd_aead_setting.pool); wd_clear_sched(&wd_aead_setting.sched); wd_clear_ctx_config(&wd_aead_setting.config); + wd_alg_clear_init(&wd_aead_setting.status); }
static void fill_request_msg(struct wd_aead_msg *msg, struct wd_aead_req *req, diff --git a/wd_cipher.c b/wd_cipher.c index 3d00598..ec9b3cc 100644 --- a/wd_cipher.c +++ b/wd_cipher.c @@ -42,6 +42,7 @@ static const unsigned char des_weak_keys[DES_WEAK_KEY_NUM][DES_KEY_SIZE] = { };
struct wd_cipher_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; void *sched_ctx; @@ -228,24 +229,29 @@ void wd_cipher_free_sess(handle_t h_sess) int wd_cipher_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_cipher_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_CIPHER_EPOLL_EN", &wd_cipher_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_cipher_setting.config, config); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_cipher_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config;
#ifdef WD_STATIC_DRV /* set driver */ @@ -257,33 +263,37 @@ int wd_cipher_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_cipher_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* init ctx related resources in specific driver */ priv = calloc(1, wd_cipher_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; } wd_cipher_setting.priv = priv;
ret = wd_cipher_setting.driver->init(&wd_cipher_setting.config, priv); if (ret < 0) { - WD_ERR("hisi sec init failed.\n"); - goto out_init; + WD_ERR("failed to do dirver init, ret = %d.\n", ret); + goto out_free_priv; }
+ wd_alg_set_init(&wd_cipher_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_cipher_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_cipher_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_cipher_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_cipher_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_cipher_setting.status); return ret; }
@@ -301,6 +311,7 @@ void wd_cipher_uninit(void) wd_uninit_async_request_pool(&wd_cipher_setting.pool); wd_clear_sched(&wd_cipher_setting.sched); wd_clear_ctx_config(&wd_cipher_setting.config); + wd_alg_clear_init(&wd_cipher_setting.status); }
static void fill_request_msg(struct wd_cipher_msg *msg, diff --git a/wd_comp.c b/wd_comp.c index eacebd3..44593a6 100644 --- a/wd_comp.c +++ b/wd_comp.c @@ -41,6 +41,7 @@ struct wd_comp_sess { };
struct wd_comp_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; struct wd_comp_driver *driver; @@ -81,24 +82,29 @@ void wd_comp_set_driver(struct wd_comp_driver *drv) int wd_comp_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_comp_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_COMP_EPOLL_EN", &wd_comp_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_comp_setting.config, config); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_comp_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config; /* * Fix me: ctx could be passed into wd_comp_set_static_drv to help to * choose static compiled vendor driver. For dynamic vendor driver, @@ -118,31 +124,36 @@ int wd_comp_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_comp_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* init ctx related resources in specific driver */ priv = calloc(1, wd_comp_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; } wd_comp_setting.priv = priv; ret = wd_comp_setting.driver->init(&wd_comp_setting.config, priv); if (ret < 0) { WD_ERR("failed to do driver init, ret = %d!\n", ret); - goto out_init; + goto out_free_priv; } + + wd_alg_set_init(&wd_comp_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_comp_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_comp_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_comp_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_comp_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_comp_setting.status); return ret; }
@@ -163,6 +174,8 @@ void wd_comp_uninit(void) /* unset config, sched, driver */ wd_clear_sched(&wd_comp_setting.sched); wd_clear_ctx_config(&wd_comp_setting.config); + + wd_alg_clear_init(&wd_comp_setting.status); }
struct wd_comp_msg *wd_comp_get_msg(__u32 idx, __u32 tag) diff --git a/wd_dh.c b/wd_dh.c index 461f04e..115d576 100644 --- a/wd_dh.c +++ b/wd_dh.c @@ -32,6 +32,7 @@ struct wd_dh_sess { };
static struct wd_dh_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; void *sched_ctx; @@ -78,24 +79,29 @@ void wd_dh_set_driver(struct wd_dh_driver *drv) int wd_dh_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_dh_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_DH_EPOLL_EN", &wd_dh_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_dh_setting.config, config); if (ret) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_dh_setting.sched, sched); if (ret) - goto out; + goto out_clear_ctx_config;
#ifdef WD_STATIC_DRV wd_dh_set_static_drv(); @@ -106,13 +112,13 @@ int wd_dh_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_dh_msg)); if (ret) - goto out_sched; + goto out_clear_sched;
/* initialize ctx related resources in specific driver */ priv = calloc(1, wd_dh_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; }
wd_dh_setting.priv = priv; @@ -120,21 +126,24 @@ int wd_dh_init(struct wd_ctx_config *config, struct wd_sched *sched) wd_dh_setting.driver->alg_name); if (ret < 0) { WD_ERR("failed to init dh driver, ret= %d!\n", ret); - goto out_init; + goto out_free_priv; }
+ wd_alg_set_init(&wd_dh_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_dh_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_dh_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_dh_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_dh_setting.config); - +out_clear_init: + wd_alg_clear_init(&wd_dh_setting.status); return ret; }
@@ -156,6 +165,7 @@ void wd_dh_uninit(void) /* unset config, sched, driver */ wd_clear_sched(&wd_dh_setting.sched); wd_clear_ctx_config(&wd_dh_setting.config); + wd_alg_clear_init(&wd_dh_setting.status); }
static int fill_dh_msg(struct wd_dh_msg *msg, struct wd_dh_req *req, diff --git a/wd_digest.c b/wd_digest.c index 43b4bc5..e4287dd 100644 --- a/wd_digest.c +++ b/wd_digest.c @@ -26,6 +26,7 @@ static int g_digest_mac_len[WD_DIGEST_TYPE_MAX] = { };
struct wd_digest_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; struct wd_digest_driver *driver; @@ -151,24 +152,29 @@ void wd_digest_free_sess(handle_t h_sess) int wd_digest_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_digest_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_DIGEST_EPOLL_EN", &wd_digest_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_digest_setting.config, config); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_digest_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config;
/* set driver */ #ifdef WD_STATIC_DRV @@ -180,33 +186,37 @@ int wd_digest_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_digest_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* init ctx related resources in specific driver */ priv = calloc(1, wd_digest_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; } wd_digest_setting.priv = priv;
ret = wd_digest_setting.driver->init(&wd_digest_setting.config, priv); if (ret < 0) { WD_ERR("failed to init digest dirver!\n"); - goto out_init; + goto out_free_priv; }
+ wd_alg_set_init(&wd_digest_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_digest_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_digest_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_digest_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_digest_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_digest_setting.status); return ret; }
@@ -225,6 +235,7 @@ void wd_digest_uninit(void)
wd_clear_sched(&wd_digest_setting.sched); wd_clear_ctx_config(&wd_digest_setting.config); + wd_alg_clear_init(&wd_digest_setting.status); }
static int digest_param_check(struct wd_digest_sess *sess, diff --git a/wd_ecc.c b/wd_ecc.c index 4cf287b..ea2f73b 100644 --- a/wd_ecc.c +++ b/wd_ecc.c @@ -64,6 +64,7 @@ struct wd_ecc_curve_list { };
static struct wd_ecc_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; void *sched_ctx; @@ -133,24 +134,29 @@ void wd_ecc_set_driver(struct wd_ecc_driver *drv) int wd_ecc_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_ecc_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_ECC_EPOLL_EN", &wd_ecc_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_ecc_setting.config, config); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_ecc_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config;
#ifdef WD_STATIC_DRV wd_ecc_set_static_drv(); @@ -161,13 +167,13 @@ int wd_ecc_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_ecc_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* initialize ctx related resources in specific driver */ priv = calloc(1, wd_ecc_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; }
wd_ecc_setting.priv = priv; @@ -175,20 +181,24 @@ int wd_ecc_init(struct wd_ctx_config *config, struct wd_sched *sched) wd_ecc_setting.driver->alg_name); if (ret < 0) { WD_ERR("failed to init ecc driver, ret = %d!\n", ret); - goto out_init; + goto out_free_priv; }
+ wd_alg_set_init(&wd_ecc_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_ecc_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_ecc_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_ecc_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_ecc_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_ecc_setting.status); return ret; }
@@ -210,6 +220,7 @@ void wd_ecc_uninit(void) /* unset config, sched, driver */ wd_clear_sched(&wd_ecc_setting.sched); wd_clear_ctx_config(&wd_ecc_setting.config); + wd_alg_clear_init(&wd_ecc_setting.status); }
static int trans_to_binpad(char *dst, const char *src, diff --git a/wd_rsa.c b/wd_rsa.c index e76da09..a6293bf 100644 --- a/wd_rsa.c +++ b/wd_rsa.c @@ -72,6 +72,7 @@ struct wd_rsa_sess { };
static struct wd_rsa_setting { + enum wd_status status; struct wd_ctx_config_internal config; struct wd_sched sched; void *sched_ctx; @@ -118,24 +119,29 @@ void wd_rsa_set_driver(struct wd_rsa_driver *drv) int wd_rsa_init(struct wd_ctx_config *config, struct wd_sched *sched) { void *priv; + bool flag; int ret;
+ flag = wd_alg_try_init(&wd_rsa_setting.status); + if (!flag) + return 0; + ret = wd_init_param_check(config, sched); if (ret) - return ret; + goto out_clear_init;
ret = wd_set_epoll_en("WD_RSA_EPOLL_EN", &wd_rsa_setting.config.epoll_en); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_ctx_config(&wd_rsa_setting.config, config); if (ret < 0) - return ret; + goto out_clear_init;
ret = wd_init_sched(&wd_rsa_setting.sched, sched); if (ret < 0) - goto out; + goto out_clear_ctx_config;
#ifdef WD_STATIC_DRV wd_rsa_set_static_drv(); @@ -146,13 +152,13 @@ int wd_rsa_init(struct wd_ctx_config *config, struct wd_sched *sched) config->ctx_num, WD_POOL_MAX_ENTRIES, sizeof(struct wd_rsa_msg)); if (ret < 0) - goto out_sched; + goto out_clear_sched;
/* initialize ctx related resources in specific driver */ priv = calloc(1, wd_rsa_setting.driver->drv_ctx_size); if (!priv) { ret = -WD_ENOMEM; - goto out_priv; + goto out_clear_pool; }
wd_rsa_setting.priv = priv; @@ -160,20 +166,24 @@ int wd_rsa_init(struct wd_ctx_config *config, struct wd_sched *sched) wd_rsa_setting.driver->alg_name); if (ret < 0) { WD_ERR("failed to init rsa driver, ret = %d!\n", ret); - goto out_init; + goto out_free_priv; }
+ wd_alg_set_init(&wd_rsa_setting.status); + return 0;
-out_init: +out_free_priv: free(priv); wd_rsa_setting.priv = NULL; -out_priv: +out_clear_pool: wd_uninit_async_request_pool(&wd_rsa_setting.pool); -out_sched: +out_clear_sched: wd_clear_sched(&wd_rsa_setting.sched); -out: +out_clear_ctx_config: wd_clear_ctx_config(&wd_rsa_setting.config); +out_clear_init: + wd_alg_clear_init(&wd_rsa_setting.status); return ret; }
@@ -195,6 +205,7 @@ void wd_rsa_uninit(void) /* unset config, sched, driver */ wd_clear_sched(&wd_rsa_setting.sched); wd_clear_ctx_config(&wd_rsa_setting.config); + wd_alg_clear_init(&wd_rsa_setting.status); }
static int fill_rsa_msg(struct wd_rsa_msg *msg, struct wd_rsa_req *req, diff --git a/wd_util.c b/wd_util.c index 04a2a5b..00dea74 100644 --- a/wd_util.c +++ b/wd_util.c @@ -1776,3 +1776,19 @@ int wd_init_param_check(struct wd_ctx_config *config, struct wd_sched *sched)
return 0; } + +bool wd_alg_try_init(enum wd_status *status) +{ + enum wd_status expected; + bool ret; + + do { + expected = WD_UNINIT; + ret = __atomic_compare_exchange_n(status, &expected, WD_INITING, true, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); + if (expected == WD_INIT) + return false; + } while (!ret); + + return true; +}
Now the uadk support initialization interface multi-thread concurrency and reentrant.
Signed-off-by: Yang Shen shenyang39@huawei.com --- docs/wd_design.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/docs/wd_design.md b/docs/wd_design.md index ba5a5b9..3e5297e 100644 --- a/docs/wd_design.md +++ b/docs/wd_design.md @@ -81,6 +81,7 @@ | | |2) Change *user* layer to *sched* layer since | | | | sample_sched is moved from user space into UADK | | | | framework. | +| 1.4 | |1) Update *wd_alg_init* reentrancy. |
## Terminology @@ -493,7 +494,9 @@ device. Return 0 if it succeeds. And return error number if it fails.
In *wd_comp_init()*, context resources, user scheduler and vendor driver are -initialized. +initialized. This function supports multi-threaded concurrent calls and +reentrant. When one thread is initializing, other threads will wait for +completion.
***void wd_comp_uninit(void)***
Add a new set of interface 'WD_ERR_PTR()' and 'WD_PTR_ERR()' for return error value.
Signed-off-by: Yang Shen shenyang39@huawei.com --- include/wd.h | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/wd.h b/include/wd.h index b0580ba..a78ee38 100644 --- a/include/wd.h +++ b/include/wd.h @@ -91,11 +91,6 @@ typedef void (*wd_log)(const char *format, ...); #define WD_IS_ERR(h) ((uintptr_t)(h) > \ (uintptr_t)(-1000))
-static inline void *WD_ERR_PTR(uintptr_t error) -{ - return (void *)error; -} - enum wcrypto_type { WD_CIPHER, WD_DIGEST, @@ -185,6 +180,16 @@ static inline void wd_iowrite64(void *addr, uint64_t value) *((volatile uint64_t *)addr) = value; }
+static inline void *WD_ERR_PTR(uintptr_t error) +{ + return (void *)error; +} + +static inline long WD_PTR_ERR(const void *ptr) +{ + return (long)ptr; +} + /** * wd_request_ctx() - Request a communication context from a device. * @dev: Indicate one device.
On 2022/7/11 17:12, Yang Shen Wrote:
Add a new set of interface 'WD_ERR_PTR()' and 'WD_PTR_ERR()' for return error value.
Signed-off-by: Yang Shen shenyang39@huawei.com
include/wd.h | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/wd.h b/include/wd.h index b0580ba..a78ee38 100644 --- a/include/wd.h +++ b/include/wd.h @@ -91,11 +91,6 @@ typedef void (*wd_log)(const char *format, ...); #define WD_IS_ERR(h) ((uintptr_t)(h) > \ (uintptr_t)(-1000))
-static inline void *WD_ERR_PTR(uintptr_t error) -{
- return (void *)error;
-}
enum wcrypto_type { WD_CIPHER, WD_DIGEST, @@ -185,6 +180,16 @@ static inline void wd_iowrite64(void *addr, uint64_t value) *((volatile uint64_t *)addr) = value; }
+static inline void *WD_ERR_PTR(uintptr_t error) +{
- return (void *)error;
+}
+static inline long WD_PTR_ERR(const void *ptr)
There are two consecutive spaces here. Thanks. Longfang
+{
- return (long)ptr;
+}
/**
- wd_request_ctx() - Request a communication context from a device.
- @dev: Indicate one device.
On 2022/8/10 11:28, liulongfang wrote:
On 2022/7/11 17:12, Yang Shen Wrote:
Add a new set of interface 'WD_ERR_PTR()' and 'WD_PTR_ERR()' for return error value.
Signed-off-by: Yang Shen shenyang39@huawei.com
include/wd.h | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/wd.h b/include/wd.h index b0580ba..a78ee38 100644 --- a/include/wd.h +++ b/include/wd.h @@ -91,11 +91,6 @@ typedef void (*wd_log)(const char *format, ...); #define WD_IS_ERR(h) ((uintptr_t)(h) > \ (uintptr_t)(-1000))
-static inline void *WD_ERR_PTR(uintptr_t error) -{
- return (void *)error;
-}
enum wcrypto_type { WD_CIPHER, WD_DIGEST, @@ -185,6 +180,16 @@ static inline void wd_iowrite64(void *addr, uint64_t value) *((volatile uint64_t *)addr) = value; }
+static inline void *WD_ERR_PTR(uintptr_t error) +{
- return (void *)error;
+}
+static inline long WD_PTR_ERR(const void *ptr)
There are two consecutive spaces here. Thanks. Longfang
OK, I'll fix this for next version.
Thanks.
+{
- return (long)ptr;
+}
/**
- wd_request_ctx() - Request a communication context from a device.
- @dev: Indicate one device.
.
Since two function will be used for mutil files, move them to header file.
Signed-off-by: Yang Shen shenyang39@huawei.com --- include/wd.h | 15 +++++++++++++++ wd.c | 35 +++++++++++++++++------------------ 2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/include/wd.h b/include/wd.h index a78ee38..4f3a32f 100644 --- a/include/wd.h +++ b/include/wd.h @@ -508,6 +508,21 @@ void wd_mempool_stats(handle_t mempool, struct wd_mempool_stats *stats); */ void wd_blockpool_stats(handle_t blkpool, struct wd_blockpool_stats *stats);
+/** + * wd_clone_dev() - clone a new uacce device. + * @dev: The source device. + * + * Return a pointer value if succeed, and NULL if fail. + */ +struct uacce_dev *wd_clone_dev(struct uacce_dev *dev); + +/** + * wd_add_to_list() - add a node to end of list. + * @head: The list head. + * @node: The node need to be add. + */ +void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node); + /** * wd_ctx_get_dev_name() - Get the device name about task. * @h_ctx: The handle of context. diff --git a/wd.c b/wd.c index b0c3dec..66a6df3 100644 --- a/wd.c +++ b/wd.c @@ -365,7 +365,13 @@ out: return strndup(name, len); }
-static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) +static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +{ + memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs, + sizeof(ctx->qfrs_offs)); +} + +struct uacce_dev *wd_clone_dev(struct uacce_dev *dev) { struct uacce_dev *new;
@@ -378,10 +384,14 @@ static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) return new; }
-static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node) { - memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs, - sizeof(ctx->qfrs_offs)); + struct uacce_dev_list *tmp = head; + + while (tmp->next) + tmp = tmp->next; + + tmp->next = node; }
handle_t wd_request_ctx(struct uacce_dev *dev) @@ -416,7 +426,7 @@ handle_t wd_request_ctx(struct uacce_dev *dev) if (!ctx->drv_name) goto free_dev_name;
- ctx->dev = clone_uacce_dev(dev); + ctx->dev = wd_clone_dev(dev); if (!ctx->dev) goto free_drv_name;
@@ -656,17 +666,6 @@ static bool dev_has_alg(const char *dev_alg_name, const char *alg_name) return false; }
-static void add_uacce_dev_to_list(struct uacce_dev_list *head, - struct uacce_dev_list *node) -{ - struct uacce_dev_list *tmp = head; - - while (tmp->next) - tmp = tmp->next; - - tmp->next = node; -} - static int check_alg_name(const char *alg_name) { int i = 0; @@ -729,7 +728,7 @@ struct uacce_dev_list *wd_get_accel_list(const char *alg_name) if (!head) head = node; else - add_uacce_dev_to_list(head, node); + wd_add_to_list(head, node); }
closedir(wd_class); @@ -788,7 +787,7 @@ struct uacce_dev *wd_get_accel_dev(const char *alg_name) }
if (dev) - target = clone_uacce_dev(dev); + target = wd_clone_dev(dev);
wd_free_list_accels(head);
On 2022/7/11 17:12, Yang Shen wrote:
Since two function will be used for mutil files, move them to header file.
Signed-off-by: Yang Shen shenyang39@huawei.com
include/wd.h | 15 +++++++++++++++ wd.c | 35 +++++++++++++++++------------------ 2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/include/wd.h b/include/wd.h index a78ee38..4f3a32f 100644 --- a/include/wd.h +++ b/include/wd.h @@ -508,6 +508,21 @@ void wd_mempool_stats(handle_t mempool, struct wd_mempool_stats *stats); */ void wd_blockpool_stats(handle_t blkpool, struct wd_blockpool_stats *stats);
+/**
- wd_clone_dev() - clone a new uacce device.
- @dev: The source device.
- Return a pointer value if succeed, and NULL if fail.
- */
+struct uacce_dev *wd_clone_dev(struct uacce_dev *dev);
+/**
- wd_add_to_list() - add a node to end of list.
- @head: The list head.
- @node: The node need to be add.
- */
+void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node);
It is recommended to change the function name to wd_add_dev_to_list. Thanks Longfang
/**
- wd_ctx_get_dev_name() - Get the device name about task.
- @h_ctx: The handle of context.
diff --git a/wd.c b/wd.c index b0c3dec..66a6df3 100644 --- a/wd.c +++ b/wd.c @@ -365,7 +365,13 @@ out: return strndup(name, len); }
-static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) +static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +{
- memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs,
sizeof(ctx->qfrs_offs));
+}
+struct uacce_dev *wd_clone_dev(struct uacce_dev *dev) { struct uacce_dev *new;
@@ -378,10 +384,14 @@ static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) return new; }
-static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node) {
- memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs,
sizeof(ctx->qfrs_offs));
- struct uacce_dev_list *tmp = head;
- while (tmp->next)
tmp = tmp->next;
- tmp->next = node;
}
handle_t wd_request_ctx(struct uacce_dev *dev) @@ -416,7 +426,7 @@ handle_t wd_request_ctx(struct uacce_dev *dev) if (!ctx->drv_name) goto free_dev_name;
- ctx->dev = clone_uacce_dev(dev);
- ctx->dev = wd_clone_dev(dev); if (!ctx->dev) goto free_drv_name;
@@ -656,17 +666,6 @@ static bool dev_has_alg(const char *dev_alg_name, const char *alg_name) return false; }
-static void add_uacce_dev_to_list(struct uacce_dev_list *head,
struct uacce_dev_list *node)
-{
- struct uacce_dev_list *tmp = head;
- while (tmp->next)
tmp = tmp->next;
- tmp->next = node;
-}
static int check_alg_name(const char *alg_name) { int i = 0; @@ -729,7 +728,7 @@ struct uacce_dev_list *wd_get_accel_list(const char *alg_name) if (!head) head = node; else
add_uacce_dev_to_list(head, node);
wd_add_to_list(head, node);
}
closedir(wd_class);
@@ -788,7 +787,7 @@ struct uacce_dev *wd_get_accel_dev(const char *alg_name) }
if (dev)
target = clone_uacce_dev(dev);
target = wd_clone_dev(dev);
wd_free_list_accels(head);
On 2022/8/10 11:32, liulongfang wrote:
On 2022/7/11 17:12, Yang Shen wrote:
Since two function will be used for mutil files, move them to header file.
Signed-off-by: Yang Shen shenyang39@huawei.com
include/wd.h | 15 +++++++++++++++ wd.c | 35 +++++++++++++++++------------------ 2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/include/wd.h b/include/wd.h index a78ee38..4f3a32f 100644 --- a/include/wd.h +++ b/include/wd.h @@ -508,6 +508,21 @@ void wd_mempool_stats(handle_t mempool, struct wd_mempool_stats *stats); */ void wd_blockpool_stats(handle_t blkpool, struct wd_blockpool_stats *stats);
+/**
- wd_clone_dev() - clone a new uacce device.
- @dev: The source device.
- Return a pointer value if succeed, and NULL if fail.
- */
+struct uacce_dev *wd_clone_dev(struct uacce_dev *dev);
+/**
- wd_add_to_list() - add a node to end of list.
- @head: The list head.
- @node: The node need to be add.
- */
+void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node);
It is recommended to change the function name to wd_add_dev_to_list. Thanks Longfang
OK, I'll fix this for next version.
Thanks.
/**
- wd_ctx_get_dev_name() - Get the device name about task.
- @h_ctx: The handle of context.
diff --git a/wd.c b/wd.c index b0c3dec..66a6df3 100644 --- a/wd.c +++ b/wd.c @@ -365,7 +365,13 @@ out: return strndup(name, len); }
-static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) +static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +{
- memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs,
sizeof(ctx->qfrs_offs));
+}
+struct uacce_dev *wd_clone_dev(struct uacce_dev *dev) { struct uacce_dev *new;
@@ -378,10 +384,14 @@ static struct uacce_dev *clone_uacce_dev(struct uacce_dev *dev) return new; }
-static void wd_ctx_init_qfrs_offs(struct wd_ctx_h *ctx) +void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node) {
- memcpy(&ctx->qfrs_offs, &ctx->dev->qfrs_offs,
sizeof(ctx->qfrs_offs));
- struct uacce_dev_list *tmp = head;
- while (tmp->next)
tmp = tmp->next;
- tmp->next = node;
}
handle_t wd_request_ctx(struct uacce_dev *dev) @@ -416,7 +426,7 @@ handle_t wd_request_ctx(struct uacce_dev *dev) if (!ctx->drv_name) goto free_dev_name;
- ctx->dev = clone_uacce_dev(dev);
- ctx->dev = wd_clone_dev(dev); if (!ctx->dev) goto free_drv_name;
@@ -656,17 +666,6 @@ static bool dev_has_alg(const char *dev_alg_name, const char *alg_name) return false; }
-static void add_uacce_dev_to_list(struct uacce_dev_list *head,
struct uacce_dev_list *node)
-{
- struct uacce_dev_list *tmp = head;
- while (tmp->next)
tmp = tmp->next;
- tmp->next = node;
-}
static int check_alg_name(const char *alg_name) { int i = 0; @@ -729,7 +728,7 @@ struct uacce_dev_list *wd_get_accel_list(const char *alg_name) if (!head) head = node; else
add_uacce_dev_to_list(head, node);
wd_add_to_list(head, node);
}
closedir(wd_class);
@@ -788,7 +787,7 @@ struct uacce_dev *wd_get_accel_dev(const char *alg_name) }
if (dev)
target = clone_uacce_dev(dev);
target = wd_clone_dev(dev);
wd_free_list_accels(head);
.
Due to performance, uadk tries to leave many configuration options to users. This gives users great flexibility, but it also leads to a problem that the current initialization interface has high complexity. Therefore, in order to facilitate users to adapt quickly, a new set of interfaces is provided.
The 'wd_alg_init2()' will complete all initialization steps. There are 4 parameters to describe the user configuration requirements. @device_list: The available uacce device list. Users can get it by wd_get_accel_list(). @numa_bitmask: The bitmask provided by libnuma. Users can use this parameter to control requesting ctxs devices in the bind NUMA scenario. @ctx_nums: The requested ctx number for each numa node. Due to users may have different requirements for different types of ctx numbers, needs a two-dimensional array as input. @sched_type: Scheduling type the user wants to use.
Signed-off-by: Yang Shen shenyang39@huawei.com --- Makefile.am | 4 +- include/wd.h | 24 +++++ include/wd_alg_common.h | 24 +++++ include/wd_comp.h | 28 +++++ include/wd_util.h | 19 ++++ wd.c | 62 +++++++++++ wd_comp.c | 222 ++++++++++++++++++++++++++++++++++++++++ wd_util.c | 59 ++++++++++- 8 files changed, 439 insertions(+), 3 deletions(-)
diff --git a/Makefile.am b/Makefile.am index 05d6bc7..2a2c0a7 100644 --- a/Makefile.am +++ b/Makefile.am @@ -86,7 +86,7 @@ AM_CFLAGS += -DWD_NO_LOG
libwd_la_LIBADD = $(libwd_la_OBJECTS) -lnuma
-libwd_comp_la_LIBADD = $(libwd_la_OBJECTS) -ldl +libwd_comp_la_LIBADD = $(libwd_la_OBJECTS) -ldl -lnuma libwd_comp_la_DEPENDENCIES = libwd.la
libhisi_zip_la_LIBADD = -ldl @@ -103,7 +103,7 @@ else libwd_la_LDFLAGS=$(UADK_VERSION) libwd_la_LIBADD= -lnuma
-libwd_comp_la_LIBADD= -lwd -ldl +libwd_comp_la_LIBADD= -lwd -ldl -lnuma libwd_comp_la_LDFLAGS=$(UADK_VERSION) libwd_comp_la_DEPENDENCIES= libwd.la
diff --git a/include/wd.h b/include/wd.h index 4f3a32f..9893f43 100644 --- a/include/wd.h +++ b/include/wd.h @@ -348,6 +348,16 @@ int wd_get_avail_ctx(struct uacce_dev *dev); */ struct uacce_dev_list *wd_get_accel_list(const char *alg_name);
+/** + * wd_find_dev_by_numa() - get device with max available ctx number from an + * device list according to numa id. + * @list: The device list. + * @numa_id: The numa_id. + * + * Return device if succeed and other error number if fail. + */ +struct uacce_dev *wd_find_dev_by_numa(struct uacce_dev_list *list, int numa_id); + /** * wd_get_accel_dev() - Get device supporting the algorithm with smallest numa distance to current numa node. @@ -523,6 +533,20 @@ struct uacce_dev *wd_clone_dev(struct uacce_dev *dev); */ void wd_add_to_list(struct uacce_dev_list *head, struct uacce_dev_list *node);
+/** + * wd_create_device_nodemask() - create a numa node mask of device list. + * @list: The devices list. + * + * Return a pointer value if succeed, and error number if fail. + */ +struct bitmask *wd_create_device_nodemask(struct uacce_dev_list *list); + +/** + * wd_free_device_nodemask() - free a numa node mask. + * @bmp: A numa node mask. + */ +void wd_free_device_nodemask(struct bitmask *bmp); + /** * wd_ctx_get_dev_name() - Get the device name about task. * @h_ctx: The handle of context. diff --git a/include/wd_alg_common.h b/include/wd_alg_common.h index c455dc3..f261830 100644 --- a/include/wd_alg_common.h +++ b/include/wd_alg_common.h @@ -63,6 +63,30 @@ struct wd_ctx_config { void *priv; };
+/** + * struct wd_ctx_nums - Define the ctx sets numbers. + * @sync_ctx_num: The ctx numbers which are used for sync mode for each + * ctx sets. + * @async_ctx_num: The ctx numbers which are used for async mode for each + * ctx sets. + */ +struct wd_ctx_nums { + __u32 sync_ctx_num; + __u32 async_ctx_num; +}; + +/** + * struct wd_ctx_params - Define the ctx sets params which are used for init + * algorithms. + * @ctx_set_num: Number of ctx sets to be created. Usually users can + * set it according to <alg>_op_type. + * @ctx_set_size: Each ctx sets numbers. + */ +struct wd_ctx_params { + __u32 ctx_set_num; + struct wd_ctx_nums *ctx_set_size; +}; + struct wd_ctx_internal { handle_t ctx; __u8 op_type; diff --git a/include/wd_comp.h b/include/wd_comp.h index e043a83..1d4f32c 100644 --- a/include/wd_comp.h +++ b/include/wd_comp.h @@ -7,6 +7,7 @@ #ifndef __WD_COMP_H #define __WD_COMP_H
+#include <numa.h> #include "wd.h" #include "wd_alg_common.h"
@@ -113,6 +114,33 @@ int wd_comp_init(struct wd_ctx_config *config, struct wd_sched *sched); */ void wd_comp_uninit(void);
+/** + * wd_comp_init2() - A simplify interface to initializate uadk + * compression/decompression. Users can use wd_get_accel_list() to + * get the usable device list with the algrithms. Users should provide + * a device numa node mask to show which numa devices will be + * selected. wd_create_device_nodemask() can create a node mask + * according the list. If all numa devices on the list are match + * the requirement, just use the return of it. Otherwise, users can + * use the function in libnuma to set the node mask. + * To make the initializate simpler, bmp and cparams support set NULL. + * And then the function will set them as default. + * + * @list: The device list. + * @bmp: Node mask of the required devices. + * @cparams: The ctx settings. + * @sched_type: The scheduler type. + * + * Return 0 if succeed and others if fail. + */ +int wd_comp_init2(struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params *cparams, __u32 sched_type); + +/** + * wd_comp_uninit2() - Uninitialise ctx configuration and scheduler. + */ +void wd_comp_uninit2(void); + struct wd_comp_sess_setup { enum wd_comp_alg_type alg_type; /* Denoted by enum wd_comp_alg_type */ enum wd_comp_level comp_lv; /* Denoted by enum wd_comp_level */ diff --git a/include/wd_util.h b/include/wd_util.h index 3737f27..4ee03ce 100644 --- a/include/wd_util.h +++ b/include/wd_util.h @@ -7,6 +7,7 @@ #ifndef __WD_UTIL_H #define __WD_UTIL_H
+#include <numa.h> #include <stdbool.h> #include <sys/ipc.h> #include <sys/shm.h> @@ -394,6 +395,24 @@ static inline void wd_alg_clear_init(enum wd_status *status) __atomic_store(status, &setting, __ATOMIC_RELAXED); }
+/** + * wd_get_usable_list() - choose the devices according bitmask. + * @list: The device list. + * @bmp: The devices node mask. + * + * Return a list that meet user's requirement if succeed, and error number if fail. + */ +struct uacce_dev_list *wd_get_usable_list(struct uacce_dev_list *list, struct bitmask *bmp); + +/** + * wd_get_ctx_numbers() - count the ctx number for first to end. + * @cparams: the input ctx setting numbers. + * @end: the end index of cparams. + * + * Return the sum of top '@end' cparams ctx number. + */ +__u32 wd_get_ctx_numbers(struct wd_ctx_params cparams, int end); + /** * wd_dfx_msg_cnt() - Message counter interface for ctx * @msg: Shared memory addr. diff --git a/wd.c b/wd.c index 66a6df3..21ddd62 100644 --- a/wd.c +++ b/wd.c @@ -741,6 +741,35 @@ free_list: return NULL; }
+struct uacce_dev *wd_find_dev_by_numa(struct uacce_dev_list *list, int numa_id) +{ + struct uacce_dev *dev = WD_ERR_PTR(-WD_ENODEV); + struct uacce_dev_list *p = list; + int ctx_num, ctx_max = 0; + + if (!list) { + WD_ERR("invalid: list is NULL!\n"); + return WD_ERR_PTR(-WD_EINVAL); + } + + while (p) { + if (numa_id != p->dev->numa_id) { + p = p->next; + continue; + } + + ctx_num = wd_get_avail_ctx(p->dev); + if (ctx_num > ctx_max) { + dev = p->dev; + ctx_max = ctx_num; + } + + p = p->next; + } + + return dev; +} + void wd_free_list_accels(struct uacce_dev_list *list) { struct uacce_dev_list *curr, *next; @@ -807,6 +836,39 @@ int wd_ctx_set_io_cmd(handle_t h_ctx, unsigned long cmd, void *arg) return ioctl(ctx->fd, cmd, arg); }
+struct bitmask *wd_create_device_nodemask(struct uacce_dev_list *list) +{ + struct uacce_dev_list *p; + struct bitmask *bmp; + + if (!list) { + WD_ERR("invalid: list is NULL!\n"); + return WD_ERR_PTR(-WD_EINVAL); + } + + bmp = numa_allocate_nodemask(); + if (!bmp) { + WD_ERR("failed to alloc bitmask(%d)!\n", errno); + return WD_ERR_PTR(-WD_EINVAL); + } + + p = list; + while (p) { + numa_bitmask_setbit(bmp, p->dev->numa_id); + p = p->next; + } + + return bmp; +} + +void wd_free_device_nodemask(struct bitmask *bmp) +{ + if (!bmp) + return; + + numa_free_nodemask(bmp); +} + void wd_get_version(void) { const char *wd_released_time = UADK_RELEASED_TIME; diff --git a/wd_comp.c b/wd_comp.c index 44593a6..cd3b4f3 100644 --- a/wd_comp.c +++ b/wd_comp.c @@ -14,6 +14,7 @@
#include "config.h" #include "drv/wd_comp_drv.h" +#include "wd_sched.h" #include "wd_util.h" #include "wd_comp.h"
@@ -21,6 +22,8 @@ #define HW_CTX_SIZE (64 * 1024) #define STREAM_CHUNK (128 * 1024)
+#define SCHED_RR_NAME "sched_rr" + #define swap_byte(x) \ ((((x) & 0x000000ff) << 24) | \ (((x) & 0x0000ff00) << 8) | \ @@ -42,6 +45,7 @@ struct wd_comp_sess {
struct wd_comp_setting { enum wd_status status; + enum wd_status status2; struct wd_ctx_config_internal config; struct wd_sched sched; struct wd_comp_driver *driver; @@ -52,6 +56,19 @@ struct wd_comp_setting {
struct wd_env_config wd_comp_env_config;
+static struct wd_ctx_config wd_comp_ctx; +static struct wd_sched *wd_comp_sched; +static int wd_comp_numa_count; + +static struct wd_ctx_nums wd_comp_ctx_num[] = { + {1, 1}, {1, 1}, {} +}; + +static struct wd_ctx_params wd_comp_cparams = { + .ctx_set_num = WD_DIR_MAX, + .ctx_set_size = wd_comp_ctx_num +}; + #ifdef WD_STATIC_DRV static void wd_comp_set_static_drv(void) { @@ -178,6 +195,209 @@ void wd_comp_uninit(void) wd_alg_clear_init(&wd_comp_setting.status); }
+static int wd_comp_request_ctx(struct uacce_dev_list *list, + struct wd_ctx_nums ctx_nums, + int idx, int numa_id, int op_type) +{ + int ctx_set_size = ctx_nums.sync_ctx_num + ctx_nums.async_ctx_num; + struct uacce_dev *dev; + int i; + + dev = wd_find_dev_by_numa(list, numa_id); + if (!dev) + return -WD_EBUSY; + + for (i = idx; i < idx + ctx_set_size; i++) { + wd_comp_ctx.ctxs[i].ctx = wd_request_ctx(dev); + if (errno == WD_EBUSY) { + dev = wd_find_dev_by_numa(list, numa_id); + if (!dev) + return -WD_EBUSY; + i--; + } + wd_comp_ctx.ctxs[i].op_type = op_type; + wd_comp_ctx.ctxs[i].ctx_mode = + ((i - idx) < ctx_nums.sync_ctx_num) ? + CTX_MODE_SYNC : CTX_MODE_ASYNC; + } + + return 0; +} + +static void wd_comp_release_ctx(void) +{ + int i; + + for (i = 0; i < wd_comp_ctx.ctx_num; i++) + if (wd_comp_ctx.ctxs[i].ctx) { + wd_release_ctx(wd_comp_ctx.ctxs[i].ctx); + wd_comp_ctx.ctxs[i].ctx = 0; + } +} + +static int wd_comp_instance_sched(struct wd_ctx_nums ctx_nums, int idx, + int numa_id, int op_type) +{ + struct sched_params sparams; + int i, ret = 0; + + for (i = 0; i < CTX_MODE_MAX; i++) { + sparams.numa_id = numa_id; + sparams.type = op_type; + sparams.mode = i; + sparams.begin = idx + ctx_nums.sync_ctx_num * i; + sparams.end = idx - 1 + ctx_nums.sync_ctx_num + ctx_nums.async_ctx_num * i; + if (sparams.begin > sparams.end) + continue; + ret = wd_sched_rr_instance(wd_comp_sched, &sparams); + if (ret) + goto out; + } + +out: + return ret; +} + +static int __wd_comp_init2(struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params cparams) +{ + int ctx_set_num = cparams.ctx_set_num; + int max_node = numa_max_node() + 1; + struct wd_ctx_nums ctx_nums; + int i, j, ret; + int idx = 0; + + for (i = 0; i < max_node; i++) { + if (!numa_bitmask_isbitset(bmp, i)) + continue; + for (j = 0; j < ctx_set_num; j++) { + ctx_nums = cparams.ctx_set_size[j]; + ret = wd_comp_request_ctx(list, ctx_nums, idx, i, j); + if (ret) + goto free_ctxs; + ret = wd_comp_instance_sched(ctx_nums, idx, i, j); + if (ret) + goto free_ctxs; + idx += (ctx_nums.sync_ctx_num + ctx_nums.async_ctx_num); + } + } + + ret = wd_comp_init(&wd_comp_ctx, wd_comp_sched); + if (ret) + goto free_ctxs; + + return 0; + +free_ctxs: + wd_comp_release_ctx(); + + return ret; +} + +int wd_comp_init2(struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params *cparams, __u32 sched_type) +{ + struct uacce_dev_list *used_list = NULL; + int ctx_set_num, ctx_set_size, ret; + struct bitmask *used_bmp; + bool flag; + + flag = wd_alg_try_init(&wd_comp_setting.status2); + if (!flag) + return 0; + + if (!list) { + WD_ERR("invalid: list is NULL!\n"); + ret = -WD_EINVAL; + goto out_uninit; + } + + if (!cparams) + cparams = &wd_comp_cparams; + + ctx_set_num = cparams->ctx_set_num; + ctx_set_size = wd_get_ctx_numbers(*cparams, ctx_set_num); + if (!ctx_set_num || !ctx_set_size) { + WD_ERR("invalid: ctx_set_num is %d, ctx_set_size is %d!\n", + ctx_set_num, ctx_set_size); + ret = -WD_EINVAL; + goto out_uninit; + } + + if (!bmp) { + used_bmp = wd_create_device_nodemask(list); + if (WD_IS_ERR(bmp)) { + ret = WD_PTR_ERR(bmp); + goto out_uninit; + } + } else { + used_list = wd_get_usable_list(list, bmp); + if (WD_IS_ERR(used_list)) { + ret = WD_PTR_ERR(used_list); + WD_ERR("failed to get usable devices(%d)!\n", ret); + goto out_uninit; + } + used_bmp = wd_create_device_nodemask(used_list); + } + + ret = numa_bitmask_weight(used_bmp); + if (!ret) { + WD_ERR("invalid: bmp is clear!\n"); + goto out_freenodemask; + } + wd_comp_numa_count = ret; + + wd_comp_ctx.ctx_num = ctx_set_size * wd_comp_numa_count; + wd_comp_ctx.ctxs = calloc(wd_comp_ctx.ctx_num, sizeof(struct wd_ctx)); + if (!wd_comp_ctx.ctxs) { + ret = -WD_ENOMEM; + WD_ERR("failed to alloc ctxs!\n"); + goto out_freenodemask; + } + + wd_comp_sched = wd_sched_rr_alloc(sched_type, ctx_set_num, + numa_max_node() + 1, wd_comp_poll_ctx); + if (!wd_comp_sched) { + ret = -WD_EINVAL; + goto out_freectxs; + } + wd_comp_sched->name = SCHED_RR_NAME; + + ret = __wd_comp_init2(!used_list ? list : used_list, used_bmp, *cparams); + if (ret) + goto out_freesched; + + wd_free_list_accels(used_list); + wd_free_device_nodemask(used_bmp); + + wd_alg_set_init(&wd_comp_setting.status2); + + return ret; + +out_freesched: + wd_sched_rr_release(wd_comp_sched); + +out_freectxs: + free(wd_comp_ctx.ctxs); + +out_freenodemask: + wd_free_device_nodemask(used_bmp); + wd_free_list_accels(used_list); + +out_uninit: + wd_alg_clear_init(&wd_comp_setting.status2); + + return ret; +} + +void wd_comp_uninit2(void) +{ + wd_comp_uninit(); + wd_comp_release_ctx(); + wd_sched_rr_release(wd_comp_sched); + wd_alg_clear_init(&wd_comp_setting.status2); +} + struct wd_comp_msg *wd_comp_get_msg(__u32 idx, __u32 tag) { return wd_find_msg_in_pool(&wd_comp_setting.pool, idx, tag); @@ -289,6 +509,7 @@ handle_t wd_comp_alloc_sess(struct wd_comp_sess_setup *setup) sess->comp_lv = setup->comp_lv; sess->win_sz = setup->win_sz; sess->stream_pos = WD_COMP_STREAM_NEW; + /* Some simple scheduler don't need scheduling parameters */ sess->sched_key = (void *)wd_comp_setting.sched.sched_init( wd_comp_setting.sched.h_sched_ctx, setup->sched_param); @@ -318,6 +539,7 @@ void wd_comp_free_sess(handle_t h_sess)
if (sess->sched_key) free(sess->sched_key); + free(sess); }
diff --git a/wd_util.c b/wd_util.c index 00dea74..713261a 100644 --- a/wd_util.c +++ b/wd_util.c @@ -5,7 +5,6 @@ */
#define _GNU_SOURCE -#include <numa.h> #include <pthread.h> #include <semaphore.h> #include <string.h> @@ -1792,3 +1791,61 @@ bool wd_alg_try_init(enum wd_status *status)
return true; } + +struct uacce_dev_list *wd_get_usable_list(struct uacce_dev_list *list, struct bitmask *bmp) +{ + struct uacce_dev_list *p, *node, *result = NULL; + struct uacce_dev *dev; + int numa_id, ret; + + p = list; + while (p) { + dev = p->dev; + numa_id = dev->numa_id; + ret = numa_bitmask_isbitset(bmp, numa_id); + if (!ret) { + p = p->next; + continue; + } + + node = calloc(1, sizeof(*node)); + if (!node) { + result = WD_ERR_PTR(-WD_ENOMEM); + goto out_free_list; + } + + node->dev = wd_clone_dev(dev); + if (!node->dev) { + result = WD_ERR_PTR(-WD_ENOMEM); + goto out_free_node; + } + + if (!result) + result = node; + else + wd_add_to_list(result, node); + + p = p->next; + } + + return result; + +out_free_node: + free(node); +out_free_list: + wd_free_list_accels(result); + return result; +} + +__u32 wd_get_ctx_numbers(struct wd_ctx_params cparams, int end) +{ + __u32 count = 0; + int i; + + for (i = 0; i < end; i++) { + count += cparams.ctx_set_size[i].sync_ctx_num; + count += cparams.ctx_set_size[i].async_ctx_num; + } + + return count; +}
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com --- docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2 + +## Preface + +The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched. + +```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +``` + +Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs. + +## wd_alg_init2 + +### Design + +Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match. + +But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters. + +All algorithms have the same input parameters and initialization logic. + +```c +struct wd_ctx_config { + __u32 ctx_num; + struct wd_ctx *ctxs; + void *priv; +}; + +struct wd_sched { + const char *name; + int sched_policy; + handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param); + __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key, + const int sched_mode); + int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count); + handle_t h_sched_ctx; +}; + +int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +``` + +`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance. + +`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors. + +Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation. + +Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node. + +At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged. + +@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`. + +@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h. + +@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input. + +@sched_type: Scheduling type the user wants to use. + +To sum up, the wd_alg_init2 is as follows + +```c +struct wd_ctx_nums { + __u32 sync_ctx_num; + __u32 async_ctx_num; +}; + +struct wd_ctx_params { + __u32 ctx_set_num; + struct wd_ctx_nums *ctx_set_size; +}; + +init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params *cparams, __u32 sched_type); +``` + +Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary. + +What's more, uadk provides a new set of interface to get device list +bit mask. + +```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list); + +void wd_free_device_nodemask(struct bitmask *bmp); +``` + +## Demo + +The simplest user initialization process is: + +```c +{ + …… + struct uacce_dev_list *list; + int ret; + + list = wd_get_accel_list(alg); + ret = wd_<alg>_init2_(list, NULL, NULL, sched_type); + wd_free_list_accel(list); + …… +} +```
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config {
- __u32 ctx_num;
- struct wd_ctx *ctxs;
- void *priv;
+};
+struct wd_sched {
- const char *name;
- int sched_policy;
- handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
- __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
const int sched_mode);
- int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count);
- handle_t h_sched_ctx;
+};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums {
- __u32 sync_ctx_num;
- __u32 async_ctx_num;
+};
+struct wd_ctx_params {
- __u32 ctx_set_num;
- struct wd_ctx_nums *ctx_set_size;
+};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
struct wd_ctx_params *cparams, __u32 sched_type);
+```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{
- ……
- struct uacce_dev_list *list;
- int ret;
- list = wd_get_accel_list(alg);
- ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申请。 现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
- wd_free_list_accel(list);
- ……
+} +```
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config {
- __u32 ctx_num;
- struct wd_ctx *ctxs;
- void *priv;
+};
+struct wd_sched {
- const char *name;
- int sched_policy;
- handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
- __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
const int sched_mode);
- int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32
*count);
- handle_t h_sched_ctx;
+};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums {
- __u32 sync_ctx_num;
- __u32 async_ctx_num;
+};
+struct wd_ctx_params {
- __u32 ctx_set_num;
- struct wd_ctx_nums *ctx_set_size;
+};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
struct wd_ctx_params *cparams, __u32 sched_type);
+```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{
- ……
- struct uacce_dev_list *list;
- int ret;
- list = wd_get_accel_list(alg);
- ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户uninit后再init。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk
是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
- wd_free_list_accel(list);
- ……
+} +```
.
在 2022/7/22 11:46, Yang Shen 写道:
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config { + __u32 ctx_num; + struct wd_ctx *ctxs; + void *priv; +};
+struct wd_sched { + const char *name; + int sched_policy; + handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param); + __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key, + const int sched_mode); + int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count); + handle_t h_sched_ctx; +};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums { + __u32 sync_ctx_num; + __u32 async_ctx_num; +};
+struct wd_ctx_params { + __u32 ctx_set_num; + struct wd_ctx_nums *ctx_set_size; +};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params *cparams, __u32 sched_type); +```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{ + …… + struct uacce_dev_list *list; + int ret;
+ list = wd_get_accel_list(alg); + ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户uninit后再init。
看起来,如果是两个不同的设备,device_list也支持不了,只固定绑了一个设备的driver。 所以根本上静态绑定驱动的方式要修改下一步支持动态注册。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk 是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
是为了降低对ctx的消耗,但是目前调度均衡申请,反而会浪费更多的ctx。所以这个需要考虑优化。 如果是均衡申请,又绑核的情况,就会导致性能不是很好。所以不绑核的话,随着cpu迁移的话动态调度也要支持。
如果不考虑那么复杂,就是需要绑核使用,那uadk内部默认从最近numa申请ctx做任务,感觉就会简单一点。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
这样理解,估计是这个接口可以sys/uacce下自动搜索各个设备所以认为它是high level的接口。但是这个应该是很底层的接口,类比内核态alloc_qp。 因为libwd是对uacce暴露给user space设备的一层很薄封装。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
其实uadk里做调度,初衷是不想把ctx和dev丢给app感知,不然用户自己去做调度,uadk反而省了很多事。 调度放在算法层是否可以挪到driver里? 不过会很复杂。
init2简化只要传sched_type,目前的简化已经可以。
后续演进支持看,只会对资源初始化进行升级,正常可能会同时有init1,init2,init3...
/lgtm
+ wd_free_list_accel(list); + …… +} +```
.
.
On 2022/8/10 15:29, fanghao (A) wrote:
在 2022/7/22 11:46, Yang Shen 写道:
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config {
- __u32 ctx_num;
- struct wd_ctx *ctxs;
- void *priv;
+};
+struct wd_sched {
- const char *name;
- int sched_policy;
- handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
- __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
const int sched_mode);
- int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32
*count);
- handle_t h_sched_ctx;
+};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums {
- __u32 sync_ctx_num;
- __u32 async_ctx_num;
+};
+struct wd_ctx_params {
- __u32 ctx_set_num;
- struct wd_ctx_nums *ctx_set_size;
+};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
struct wd_ctx_params *cparams, __u32 sched_type);
+```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{
- ……
- struct uacce_dev_list *list;
- int ret;
- list = wd_get_accel_list(alg);
- ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个 算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和 gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户 uninit后再init。
看起来,如果是两个不同的设备,device_list也支持不了,只固定绑了一个设 备的driver。 所以根本上静态绑定驱动的方式要修改下一步支持动态注册。
两个device_list完全可以支持,这里我们可以从device的attrs去获取支持的列表,然后做不同算法的 ctx_alloc这个问题。当然只是目前代码没支持这个功能而已。我这里说的device_list跟drivers是两个 事情,我们同一个drivers完全有可能支持多个devices。
而且我理解这个还是跟静态动态没有什么关系,关键还是需要把对外的接口给抽象出来。我们这里即看到 devices又看到drivers,这两个相互耦合,但是实际使用过程中又可能存在各种情况,我们当前的模型完 全是从hisilicon的硬件模型出发。
一种可行的方案是我们参考crypto子系统,只看到drivers。所有devices的东西都放到drivers中自己去处 理。我们可以新增一个uadk_device。然后uadk_device完全由drivers去初始化,而不是当前放在uadk中完 成一堆操作。甚至其实这个uadk_device也不是很有必要,直接保留ctx即可。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看 这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、 uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk 是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是 由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接 口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候 没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要 的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个 问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也 不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后 期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对 此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
是为了降低对ctx的消耗,但是目前调度均衡申请,反而会浪费更多的ctx。所以这 个需要考虑优化。 如果是均衡申请,又绑核的情况,就会导致性能不是很好。所以不绑核的话,随着 cpu迁移的话动态调度也要支持。
如果不考虑那么复杂,就是需要绑核使用,那uadk内部默认从最近numa申请ctx做 任务,感觉就会简单一点。
我没太理解这里说的均衡申请是指什么场景。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
这样理解,估计是这个接口可以sys/uacce下自动搜索各个设备所以认为它是 high level的接口。但是这个应该是很底层的接口,类比内核态alloc_qp。 因为libwd是对uacce暴露给user space设备的一层很薄封装。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
其实uadk里做调度,初衷是不想把ctx和dev丢给app感知,不然用户自己去做调 度,uadk反而省了很多事。 调度放在算法层是否可以挪到driver里? 不过会很复杂。
init2简化只要传sched_type,目前的简化已经可以。
后续演进支持看,只会对资源初始化进行升级,正常可能会同时有 init1,init2,init3...
/lgtm
- wd_free_list_accel(list);
- ……
+} +```
.
.
.
在 2022/8/10 16:05, Yang Shen 写道:
On 2022/8/10 15:29, fanghao (A) wrote:
在 2022/7/22 11:46, Yang Shen 写道:
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config { + __u32 ctx_num; + struct wd_ctx *ctxs; + void *priv; +};
+struct wd_sched { + const char *name; + int sched_policy; + handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param); + __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key, + const int sched_mode); + int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count); + handle_t h_sched_ctx; +};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums { + __u32 sync_ctx_num; + __u32 async_ctx_num; +};
+struct wd_ctx_params { + __u32 ctx_set_num; + struct wd_ctx_nums *ctx_set_size; +};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp, + struct wd_ctx_params *cparams, __u32 sched_type); +```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{ + …… + struct uacce_dev_list *list; + int ret;
+ list = wd_get_accel_list(alg); + ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个 算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和 gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户 uninit后再init。
看起来,如果是两个不同的设备,device_list也支持不了,只固定绑了一个设 备的driver。 所以根本上静态绑定驱动的方式要修改下一步支持动态注册。
两个device_list完全可以支持,这里我们可以从device的attrs去获取支持的列表,然后做不同算法的 ctx_alloc这个问题。当然只是目前代码没支持这个功能而已。我这里说的device_list跟drivers是两个 事情,我们同一个drivers完全有可能支持多个devices。
如果说的是两个相同的设备,那肯定是同一个driver。不过都是相同的设备话,应该不会一个支持zlib,一个支持gzip。 这些都是底层实现和细节。关键还是看接口抽象。
找到之前的一篇推演可以看下 https://zhuanlan.zhihu.com/p/157973336
而且我理解这个还是跟静态动态没有什么关系,关键还是需要把对外的接口给抽象出来。我们这里即看到 devices又看到drivers,这两个相互耦合,但是实际使用过程中又可能存在各种情况,我们当前的模型完 全是从hisilicon的硬件模型出发。
一种可行的方案是我们参考crypto子系统,只看到drivers。所有devices的东西都放到drivers中自己去处 理。我们可以新增一个uadk_device。然后uadk_device完全由drivers去初始化,而不是当前放在uadk中完 成一堆操作。甚至其实这个uadk_device也不是很有必要,直接保留ctx即可。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看 这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、 uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk 是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是 由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接 口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候 没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要 的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个 问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也 不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后 期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对 此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
是为了降低对ctx的消耗,但是目前调度均衡申请,反而会浪费更多的ctx。所以这 个需要考虑优化。 如果是均衡申请,又绑核的情况,就会导致性能不是很好。所以不绑核的话,随着 cpu迁移的话动态调度也要支持。
如果不考虑那么复杂,就是需要绑核使用,那uadk内部默认从最近numa申请ctx做 任务,感觉就会简单一点。
我没太理解这里说的均衡申请是指什么场景。
每个numa都申请了同步异步,压缩解压缩。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
这样理解,估计是这个接口可以sys/uacce下自动搜索各个设备所以认为它是 high level的接口。但是这个应该是很底层的接口,类比内核态alloc_qp。 因为libwd是对uacce暴露给user space设备的一层很薄封装。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
其实uadk里做调度,初衷是不想把ctx和dev丢给app感知,不然用户自己去做调 度,uadk反而省了很多事。 调度放在算法层是否可以挪到driver里? 不过会很复杂。
init2简化只要传sched_type,目前的简化已经可以。
后续演进支持看,只会对资源初始化进行升级,正常可能会同时有 init1,init2,init3...
/lgtm
+ wd_free_list_accel(list); + …… +} +```
.
.
.
.
On 2022/8/11 11:02, fanghao (A) wrote:
在 2022/8/10 16:05, Yang Shen 写道:
On 2022/8/10 15:29, fanghao (A) wrote:
在 2022/7/22 11:46, Yang Shen 写道:
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config {
- __u32 ctx_num;
- struct wd_ctx *ctxs;
- void *priv;
+};
+struct wd_sched {
- const char *name;
- int sched_policy;
- handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
- __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
const int sched_mode);
- int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32
*count);
- handle_t h_sched_ctx;
+};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums {
- __u32 sync_ctx_num;
- __u32 async_ctx_num;
+};
+struct wd_ctx_params {
- __u32 ctx_set_num;
- struct wd_ctx_nums *ctx_set_size;
+};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
struct wd_ctx_params *cparams, __u32 sched_type);
+```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{
- ……
- struct uacce_dev_list *list;
- int ret;
- list = wd_get_accel_list(alg);
- ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个 算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和 gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户 uninit后再init。
看起来,如果是两个不同的设备,device_list也支持不了,只固定绑了一个设 备的driver。 所以根本上静态绑定驱动的方式要修改下一步支持动态注册。
两个device_list完全可以支持,这里我们可以从device的attrs去获取支持的列 表,然后做不同算法的 ctx_alloc这个问题。当然只是目前代码没支持这个功能而已。我这里说的 device_list跟drivers是两个 事情,我们同一个drivers完全有可能支持多个devices。
如果说的是两个相同的设备,那肯定是同一个driver。不过都是相同的设备话,应 该不会一个支持zlib,一个支持gzip。 这些都是底层实现和细节。关键还是看接口抽象。
找到之前的一篇推演可以看下 https://zhuanlan.zhihu.com/p/157973336
这里只是举个例子。就是说如果直接传算法,就暗含了一个约束,一个device需要支持所有comp算法。
而且我理解这个还是跟静态动态没有什么关系,关键还是需要把对外的接口给抽 象出来。我们这里即看到 devices又看到drivers,这两个相互耦合,但是实际使用过程中又可能存在各种 情况,我们当前的模型完 全是从hisilicon的硬件模型出发。
一种可行的方案是我们参考crypto子系统,只看到drivers。所有devices的东西 都放到drivers中自己去处 理。我们可以新增一个uadk_device。然后uadk_device完全由drivers去初始 化,而不是当前放在uadk中完 成一堆操作。甚至其实这个uadk_device也不是很有必要,直接保留ctx即可。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到 alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看 这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、 uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk 是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是 由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接 口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候 没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要 的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个 问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也 不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后 期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对 此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
是为了降低对ctx的消耗,但是目前调度均衡申请,反而会浪费更多的ctx。所以这 个需要考虑优化。 如果是均衡申请,又绑核的情况,就会导致性能不是很好。所以不绑核的话,随着 cpu迁移的话动态调度也要支持。
如果不考虑那么复杂,就是需要绑核使用,那uadk内部默认从最近numa申请ctx做 任务,感觉就会简单一点。
我没太理解这里说的均衡申请是指什么场景。
每个numa都申请了同步异步,压缩解压缩。
这个地方的配置是用户决定的啊,用户可以配也可以不配。我的接口只是给用户提供一种途径。 用户的使用个数由它自己来决定,所以这里不存在什么浪费。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调 用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
这样理解,估计是这个接口可以sys/uacce下自动搜索各个设备所以认为它是 high level的接口。但是这个应该是很底层的接口,类比内核态alloc_qp。 因为libwd是对uacce暴露给user space设备的一层很薄封装。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
其实uadk里做调度,初衷是不想把ctx和dev丢给app感知,不然用户自己去做调 度,uadk反而省了很多事。 调度放在算法层是否可以挪到driver里? 不过会很复杂。
init2简化只要传sched_type,目前的简化已经可以。
后续演进支持看,只会对资源初始化进行升级,正常可能会同时有 init1,init2,init3...
/lgtm
- wd_free_list_accel(list);
- ……
+} +```
.
.
.
.
.
On 2022/8/11 14:45, Yang Shen write:
On 2022/8/11 11:02, fanghao (A) wrote:
在 2022/8/10 16:05, Yang Shen 写道:
On 2022/8/10 15:29, fanghao (A) wrote:
在 2022/7/22 11:46, Yang Shen 写道:
On 2022/7/21 16:38, fanghao (A) wrote:
在 2022/7/11 17:12, Yang Shen 写道: > Due to the complexity of wd_alg_init, add wd_alg_init2 interface for > users. And add the design documents. > > Signed-off-by: Yang Shen shenyang39@huawei.com > --- > docs/wd_alg_init2.md | 176 > +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 176 insertions(+) > create mode 100644 docs/wd_alg_init2.md > > diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md > new file mode 100644 > index 0000000..3fb570c > --- /dev/null > +++ b/docs/wd_alg_init2.md > @@ -0,0 +1,176 @@ > +# wd_alg_init2 > + > +## Preface > + > +The current uadk initialization process is: > +1.Call wd_request_ctx() to request ctxs from devices. > +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler > alloc function if exits). > +3.Initialize the sched. > +4.Call wd_alg_init() with ctx_config and sched. > + > +```flow > +st=>start: Start > +o1=>operation: request ctxs > +o2=>operation: create uadk_sched and instance ctxs to sched region > +o3=>operation: call wd_alg_init > +e=>end > +st->o1->o2->o3->e > +``` > + > +Logic is reasonable. But in practice, the step of `wd_request_ctx()` > +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult > +for users to use the interface. One of the main reasons for this is > +that uadk has made a lot of configurations in the scheduler in order > +to provide users with better performance. Based on this > consideration, > +the current uadk requires the user to arrange the division of > hardware > +resources according to the device topology during initialization. > +Therefore, as a high-level interface, this scheme can provide > customized > +scheme configuration for users with deep needs. > + > +## wd_alg_init2 > + > +### Design > + > +Is there any way to simplify these steps? Not currently. Because the > +architecture model designed by uadk is to manage hardware resources > +through a scheduler, users can no longer perceive after specifying > +hardware resources, and all subsequent tasks are handled by the > scheduler. > +The original intention of this design is to make the scenarios > supported > +by uadk more flexible. Because the resource requirements of > different > +business scenarios are different from the task model of the business > +itself, the best performance experience can be obtained through the > +scheduler to match. > + > +But we can try to provide a layer of encapsulation. The original > design > +intention of this layer of encapsulation is that users only need to > +specify available resources and requirements, and the > configuration of > +resources is completed internally by the interface. Because the > previous > +interface complexity mainly lies in the parameter configuration > of CTX > +and scheduler, it is easy for users to make configuration errors and > +generate bugs because of their misunderstanding of parameters. > + > +All algorithms have the same input parameters and initialization > logic. > + > +```c > +struct wd_ctx_config { > + __u32 ctx_num; > + struct wd_ctx *ctxs; > + void *priv; > +}; > + > +struct wd_sched { > + const char *name; > + int sched_policy; > + handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param); > + __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key, > + const int sched_mode); > + int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 > *count); > + handle_t h_sched_ctx; > +}; > + > +int wd_alg_init(struct wd_ctx_config *config, struct wd_sched > *sched); > +``` > + > +`wd_ctx_config` is the requested ctxs descriptor, and the attributes > +of ctxs are contained in their own structure. The attributes will be > +used in scheduler for picking ctx according to request type. The > main > +difficulty in this step is that users need to apply for CTXs from > the > +appropriate device nodes according to their own business > distribution. > +If the user does not consider the appropriate device distribution, > +it may lead to cross chip or cross numa node which will affect > +performance. > + > +`wd_sched` is the scheduler descriptor of the request. It will > create > +the scheduling domain based parameters passed by the users. User > needs > +to allocate the ctxs applied to the scheduling domain that meets the > +attribute, so that uadk can select the appropriate ctxs according to > +the issued business. The main difficulty in this step is that the > user > +needs to initialize the correct scheduling domain according to the > ctxs > +attributes previously applied. However, there are many attributes of > +ctxs here, which should be divided by multiple dimensions. If the > +parameters are not understood enough, it is easy to make queue > +allocation errors, resulting in the scheduling of the wrong ctxs > when > +the task is finally issued, and cause unexpected errors. > + > +Therefore, the next thing to be done is to use limited and > easy-to-use > +input parameters to describe users' requirements on the two input > +parameters, ensuring that the functions of the new interface init2 > +are the same as those of init. For ease of description, v1 is used > +to refer to the existing interface, and v2 is used to refer to the > +layer of encapsulation. > + > +Let's clarify the following logic first: all uacce devices under a > +numa node can be regarded as the same. So although we request for > +ctxs from the device, we manage ctxs according to numa nodes. > +That means if users want to get the same performance for all cpu, > +the uadk configure should be same for all numa node. > + > +At present, at least 4 parameters are required to meet the user > +configuration requirements with the V1 interface function remains > +unchanged. > + > +@device_list: The available uacce device list. Users can get it by > +`wd_get_accel_list()`. > + > +@numa_bitmask: The bitmask provided by libnuma. Users can use this > +parameter to control requesting ctxs devices in the bind NUMA > scenario. > +This parameter is mainly convenient for users to use in the binding > +cpu scenario. It can avoid resource waste or initialization failure > +caused by insufficient resources. Libnuma provides a complete > operation > +interface which can be found in numa.h. > + > +@ctx_nums: The requested ctx number for each numa node. Due to users > +may have different requirements for different types of ctx numbers, > +needs a two-dimensional array as input. > + > +@sched_type: Scheduling type the user wants to use. > + > +To sum up, the wd_alg_init2 is as follows > + > +```c > +struct wd_ctx_nums { > + __u32 sync_ctx_num; > + __u32 async_ctx_num; > +}; > + > +struct wd_ctx_params { > + __u32 ctx_set_num; > + struct wd_ctx_nums *ctx_set_size; > +}; > + > +init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp, > + struct wd_ctx_params *cparams, __u32 sched_type); > +``` > + > +Somebody may say that the wd_alg_init2 is still complex for three > +input parameters are structure. So the interface support default > value > +for some parameters. The @bmp can be set as NULL, and then it > will be > +initialized according to device list. The @cparams can be set as > NULL, > +and it has a default value in wd_alg.c. The @list and sched_type are > +necessary. > + > +What's more, uadk provides a new set of interface to get device list > +bit mask. > + > +```c > +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list > *list); > + > +void wd_free_device_nodemask(struct bitmask *bmp); > +``` > + > +## Demo > + > +The simplest user initialization process is: > + > +```c > +{ > + …… > + struct uacce_dev_list *list; > + int ret; > + > + list = wd_get_accel_list(alg); > + ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
这里的逻辑还是在于我们wd_alg层用一个全局变量 wd_alg_setting 来维护整个 算法层的资源。 所以其实用alg和用device_list是两种不同的思路。以COMPRESS为例,zlib和 gzip分属不同设备 的情况下,device_list能够支持这种场景,而alg则没有办法满足,需要用户 uninit后再init。
看起来,如果是两个不同的设备,device_list也支持不了,只固定绑了一个设 备的driver。 所以根本上静态绑定驱动的方式要修改下一步支持动态注册。
两个device_list完全可以支持,这里我们可以从device的attrs去获取支持的列 表,然后做不同算法的 ctx_alloc这个问题。当然只是目前代码没支持这个功能而已。我这里说的 device_list跟drivers是两个 事情,我们同一个drivers完全有可能支持多个devices。
如果说的是两个相同的设备,那肯定是同一个driver。不过都是相同的设备话,应 该不会一个支持zlib,一个支持gzip。 这些都是底层实现和细节。关键还是看接口抽象。
找到之前的一篇推演可以看下 https://zhuanlan.zhihu.com/p/157973336
这里只是举个例子。就是说如果直接传算法,就暗含了一个约束,一个device需要支持所有comp算法。
实际上我们可以制作一个算法和驱动的支持列表,用户态驱动中针对每一个子算法创建一个简单的driver, 它的实现接口可以复用(可以直接使用当前的驱动),每一个算法都对应一个驱动,并且驱动自带名称,这个名称与 内核态创建的设备名称必须一致(后缀不一样就行)。 每一个业务请求,先按算法名称查找用户态驱动,找到驱动后,根据驱动名称查找设备,然后在该设备上申请队列 这样就不会有约束了
而且我理解这个还是跟静态动态没有什么关系,关键还是需要把对外的接口给抽 象出来。我们这里即看到 devices又看到drivers,这两个相互耦合,但是实际使用过程中又可能存在各种 情况,我们当前的模型完 全是从hisilicon的硬件模型出发。
一种可行的方案是我们参考crypto子系统,只看到drivers。所有devices的东西 都放到drivers中自己去处 理。我们可以新增一个uadk_device。然后uadk_device完全由drivers去初始 化,而不是当前放在uadk中完 成一堆操作。甚至其实这个uadk_device也不是很有必要,直接保留ctx即可。
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到 alg去申 请。
用户只看到 alg 这个设计方案很合理,方便了用户的理解与使用。但目前来看 这需要对 UADK 做 较大的改动。从这个问题出发,在这里简单的对比下 warpdriver、 uadk(wrapdriver 2.0)以及 crypto。 Warpdriver 和 crypto 都是采用这种方案,而 uadk 却很难用这种方案去做。 从实现的角度分析,是因为 warpdriver 和 crypto 的逻辑都是 ctx(tfm) -> req 的一对多映射 关系。所以它们的 req 的处理单元(软算/硬件/指令)都已经是确定的。而 uadk 不同,uadk 是一个多对多的映射,其req面对的是一个 ctxs pool,如何挑选合适的 ctx 是 由 req 的 attributes 和 scheduler 共同决定。那么倒推三个框架的初始化接 口,warpdriver 和 crypto 初始化时可以由用户指定的 alg 来完成。因为用户想换算法的话申请一个新的 ctx(tfm) 即可。 但是 uadk 不行。Uadk 初始化时创建的是一个pool。在这里如果初始化的时候 没有这个算法, 那后续用户就无法找到这个算法的 ctx。 然后个人从当初设计的角度分析(不排除理解有误),这里 ctxs pool 最主要 的设计目的还是 为了给 uadk scheduler 使用。我还记得当初在做 uadk 设计前不久就遇到一个 问题,在计算产品线 大数据场景,很多进程可能申请了 ctx 之后进入休眠状态,既不发送任务,也 不释放队列,而 服务器上的进程数可能很高,导致加速器设备根本无法正常跑满带宽,甚至到后 期无法使用。 我不确定这个问题是否影响了 uadk 的设计思路。但是目前来看 uadk 的方案对 此场景的适配性 还是高于 warpdriver 的。但是这种设计方案带来的牺牲就是它比 warpdriver 和 crypto 多了 一个进程唯一的初始化环节,这个初始化环节无法通过 alg 来完成,而需要 device_list。
是为了降低对ctx的消耗,但是目前调度均衡申请,反而会浪费更多的ctx。所以这 个需要考虑优化。 如果是均衡申请,又绑核的情况,就会导致性能不是很好。所以不绑核的话,随着 cpu迁移的话动态调度也要支持。
如果不考虑那么复杂,就是需要绑核使用,那uadk内部默认从最近numa申请ctx做 任务,感觉就会简单一点。
我没太理解这里说的均衡申请是指什么场景。
每个numa都申请了同步异步,压缩解压缩。
这个地方的配置是用户决定的啊,用户可以配也可以不配。我的接口只是给用户提供一种途径。 用户的使用个数由它自己来决定,所以这里不存在什么浪费。
现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调 用。
“wd_request_ctx()等被udriver调用” —— 个人理解这个应该是反过来。udriver 注册回调到 libwd, wd_request_ctx 调用回调。
这样理解,估计是这个接口可以sys/uacce下自动搜索各个设备所以认为它是 high level的接口。但是这个应该是很底层的接口,类比内核态alloc_qp。 因为libwd是对uacce暴露给user space设备的一层很薄封装。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
这里的耦合是没有办法的,init2 的接口设计出发点还是基于 init,当前 init 的逻辑就是对 ctxs pool 建立 scheduler region。所以接口的参数不可避免。
其实uadk里做调度,初衷是不想把ctx和dev丢给app感知,不然用户自己去做调 度,uadk反而省了很多事。 调度放在算法层是否可以挪到driver里? 不过会很复杂。
init2简化只要传sched_type,目前的简化已经可以。
后续演进支持看,只会对资源初始化进行升级,正常可能会同时有 init1,init2,init3...
/lgtm
> + wd_free_list_accel(list); > + …… > +} > +```
.
.
.
.
.
.