在 2022/7/11 17:12, Yang Shen 写道:
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for users. And add the design documents.
Signed-off-by: Yang Shen shenyang39@huawei.com
docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/wd_alg_init2.md
diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md new file mode 100644 index 0000000..3fb570c --- /dev/null +++ b/docs/wd_alg_init2.md @@ -0,0 +1,176 @@ +# wd_alg_init2
+## Preface
+The current uadk initialization process is: +1.Call wd_request_ctx() to request ctxs from devices. +2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits). +3.Initialize the sched. +4.Call wd_alg_init() with ctx_config and sched.
+```flow +st=>start: Start +o1=>operation: request ctxs +o2=>operation: create uadk_sched and instance ctxs to sched region +o3=>operation: call wd_alg_init +e=>end +st->o1->o2->o3->e +```
+Logic is reasonable. But in practice, the step of `wd_request_ctx()` +and `wd_sched_rr_alloc()` are very tedious. This makes it difficult +for users to use the interface. One of the main reasons for this is +that uadk has made a lot of configurations in the scheduler in order +to provide users with better performance. Based on this consideration, +the current uadk requires the user to arrange the division of hardware +resources according to the device topology during initialization. +Therefore, as a high-level interface, this scheme can provide customized +scheme configuration for users with deep needs.
+## wd_alg_init2
+### Design
+Is there any way to simplify these steps? Not currently. Because the +architecture model designed by uadk is to manage hardware resources +through a scheduler, users can no longer perceive after specifying +hardware resources, and all subsequent tasks are handled by the scheduler. +The original intention of this design is to make the scenarios supported +by uadk more flexible. Because the resource requirements of different +business scenarios are different from the task model of the business +itself, the best performance experience can be obtained through the +scheduler to match.
+But we can try to provide a layer of encapsulation. The original design +intention of this layer of encapsulation is that users only need to +specify available resources and requirements, and the configuration of +resources is completed internally by the interface. Because the previous +interface complexity mainly lies in the parameter configuration of CTX +and scheduler, it is easy for users to make configuration errors and +generate bugs because of their misunderstanding of parameters.
+All algorithms have the same input parameters and initialization logic.
+```c +struct wd_ctx_config {
- __u32 ctx_num;
- struct wd_ctx *ctxs;
- void *priv;
+};
+struct wd_sched {
- const char *name;
- int sched_policy;
- handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
- __u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
const int sched_mode);
- int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count);
- handle_t h_sched_ctx;
+};
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched); +```
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes +of ctxs are contained in their own structure. The attributes will be +used in scheduler for picking ctx according to request type. The main +difficulty in this step is that users need to apply for CTXs from the +appropriate device nodes according to their own business distribution. +If the user does not consider the appropriate device distribution, +it may lead to cross chip or cross numa node which will affect +performance.
+`wd_sched` is the scheduler descriptor of the request. It will create +the scheduling domain based parameters passed by the users. User needs +to allocate the ctxs applied to the scheduling domain that meets the +attribute, so that uadk can select the appropriate ctxs according to +the issued business. The main difficulty in this step is that the user +needs to initialize the correct scheduling domain according to the ctxs +attributes previously applied. However, there are many attributes of +ctxs here, which should be divided by multiple dimensions. If the +parameters are not understood enough, it is easy to make queue +allocation errors, resulting in the scheduling of the wrong ctxs when +the task is finally issued, and cause unexpected errors.
+Therefore, the next thing to be done is to use limited and easy-to-use +input parameters to describe users' requirements on the two input +parameters, ensuring that the functions of the new interface init2 +are the same as those of init. For ease of description, v1 is used +to refer to the existing interface, and v2 is used to refer to the +layer of encapsulation.
+Let's clarify the following logic first: all uacce devices under a +numa node can be regarded as the same. So although we request for +ctxs from the device, we manage ctxs according to numa nodes. +That means if users want to get the same performance for all cpu, +the uadk configure should be same for all numa node.
+At present, at least 4 parameters are required to meet the user +configuration requirements with the V1 interface function remains +unchanged.
+@device_list: The available uacce device list. Users can get it by +`wd_get_accel_list()`.
+@numa_bitmask: The bitmask provided by libnuma. Users can use this +parameter to control requesting ctxs devices in the bind NUMA scenario. +This parameter is mainly convenient for users to use in the binding +cpu scenario. It can avoid resource waste or initialization failure +caused by insufficient resources. Libnuma provides a complete operation +interface which can be found in numa.h.
+@ctx_nums: The requested ctx number for each numa node. Due to users +may have different requirements for different types of ctx numbers, +needs a two-dimensional array as input.
+@sched_type: Scheduling type the user wants to use.
+To sum up, the wd_alg_init2 is as follows
+```c +struct wd_ctx_nums {
- __u32 sync_ctx_num;
- __u32 async_ctx_num;
+};
+struct wd_ctx_params {
- __u32 ctx_set_num;
- struct wd_ctx_nums *ctx_set_size;
+};
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
struct wd_ctx_params *cparams, __u32 sched_type);
+```
+Somebody may say that the wd_alg_init2 is still complex for three +input parameters are structure. So the interface support default value +for some parameters. The @bmp can be set as NULL, and then it will be +initialized according to device list. The @cparams can be set as NULL, +and it has a default value in wd_alg.c. The @list and sched_type are +necessary.
+What's more, uadk provides a new set of interface to get device list +bit mask.
+```c +struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+void wd_free_device_nodemask(struct bitmask *bmp); +```
+## Demo
+The simplest user initialization process is:
+```c +{
- ……
- struct uacce_dev_list *list;
- int ret;
- list = wd_get_accel_list(alg);
- ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化: wd_<alg>_init2_(alg,mode_type, node_mask);
其他的一点想法补充: 后面如果拓展指令其他实现,uacce_dev应该要被收到内部去。用户只看到alg去申请。 现在uadk用户态三层结构,libwd是基础最底层接口,他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。 所以wd_request_ctx()等被udriver调用,wd_get_accel_list()等被alg调用。
还有个调度器,这个耦合的有点别扭。目前还要看这么简化和收到内部去。
- wd_free_list_accel(list);
- ……
+} +```