[Acc] Re: [PATCH v2 6/6] uadk/docs - support a simple interface for initialization

21 Jul 2022

在 2022/7/11 17:12, Yang Shen 写道:
...
Due to the complexity of wd_alg_init, add wd_alg_init2 interface for
users. And add the design documents.
Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
  docs/wd_alg_init2.md | 176 +++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 176 insertions(+)
  create mode 100644 docs/wd_alg_init2.md

diff --git a/docs/wd_alg_init2.md b/docs/wd_alg_init2.md
new file mode 100644
index 0000000..3fb570c
--- /dev/null
+++ b/docs/wd_alg_init2.md
@@ -0,0 +1,176 @@
+# wd_alg_init2
+
+## Preface
+
+The current uadk initialization process is:
+1.Call wd_request_ctx() to request ctxs from devices.
+2.Call wd_sched_rr_alloc() to create a sched(or some other scheduler alloc function if exits).
+3.Initialize the sched.
+4.Call wd_alg_init() with ctx_config and sched.
+
+```flow
+st=>start: Start
+o1=>operation: request ctxs
+o2=>operation: create uadk_sched and instance ctxs to sched region
+o3=>operation: call wd_alg_init
+e=>end
+st->o1->o2->o3->e
+```
+
+Logic is reasonable. But in practice, the step of `wd_request_ctx()`
+and `wd_sched_rr_alloc()` are very tedious. This makes it difficult
+for users to use the interface. One of the main reasons for this is
+that uadk has made a lot of configurations in the scheduler in order
+to provide users with better performance. Based on this consideration,
+the current uadk requires the user to arrange the division of hardware
+resources according to the device topology during initialization.
+Therefore, as a high-level interface, this scheme can provide customized
+scheme configuration for users with deep needs.
+
+## wd_alg_init2
+
+### Design
+
+Is there any way to simplify these steps? Not currently. Because the
+architecture model designed by uadk is to manage hardware resources
+through a scheduler, users can no longer perceive after specifying
+hardware resources, and all subsequent tasks are handled by the scheduler.
+The original intention of this design is to make the scenarios supported
+by uadk more flexible. Because the resource requirements of different
+business scenarios are different from the task model of the business
+itself, the best performance experience can be obtained through the
+scheduler to match.
+
+But we can try to provide a layer of encapsulation. The original design
+intention of this layer of encapsulation is that users only need to
+specify available resources and requirements, and the configuration of
+resources is completed internally by the interface. Because the previous
+interface complexity mainly lies in the parameter configuration of CTX
+and scheduler, it is easy for users to make configuration errors and
+generate bugs because of their misunderstanding of parameters.
+
+All algorithms have the same input parameters and initialization logic.
+
+```c
+struct wd_ctx_config {
+	__u32 ctx_num;
+	struct wd_ctx *ctxs;
+	void *priv;
+};
+
+struct wd_sched {
+	const char *name;
+	int sched_policy;
+	handle_t (*sched_init)(handle_t h_sched_ctx, void *sched_param);
+	__u32 (*pick_next_ctx)(handle_t h_sched_ctx, void *sched_key,
+			       const int sched_mode);
+	int (*poll_policy)(handle_t h_sched_ctx, __u32 expect, __u32 *count);
+	handle_t h_sched_ctx;
+};
+
+int wd_alg_init(struct wd_ctx_config *config, struct wd_sched *sched);
+```
+
+`wd_ctx_config` is the requested ctxs descriptor, and the attributes
+of ctxs are contained in their own structure. The attributes will be
+used in scheduler for picking ctx according to request type. The main
+difficulty in this step is that users need to apply for CTXs from the
+appropriate device nodes according to their own business distribution.
+If the user does not consider the appropriate device distribution,
+it may lead to cross chip or cross numa node which will affect
+performance.
+
+`wd_sched` is the scheduler descriptor of the request. It will create
+the scheduling domain based parameters passed by the users. User needs
+to allocate the ctxs applied to the scheduling domain that meets the
+attribute, so that uadk can select the appropriate ctxs according to
+the issued business. The main difficulty in this step is that the user
+needs to initialize the correct scheduling domain according to the ctxs
+attributes previously applied. However, there are many attributes of
+ctxs here, which should be divided by multiple dimensions. If the
+parameters are not understood enough, it is easy to make queue
+allocation errors, resulting in the scheduling of the wrong ctxs when
+the task is finally issued, and cause unexpected errors.
+
+Therefore, the next thing to be done is to use limited and easy-to-use
+input parameters to describe users' requirements on the two input
+parameters, ensuring that the functions of the new interface init2
+are the same as those of init. For ease of description, v1 is used
+to refer to the existing interface, and v2 is used to refer to the
+layer of encapsulation.
+
+Let's clarify the following logic first: all uacce devices under a
+numa node can be regarded as the same. So although we request for
+ctxs from the device, we manage ctxs according to numa nodes.
+That means if users want to get the same performance for all cpu,
+the uadk configure should be same for all numa node.
+
+At present, at least 4 parameters are required to meet the user
+configuration requirements with the V1 interface function remains
+unchanged.
+
+@device_list: The available uacce device list. Users can get it by
+`wd_get_accel_list()`.
+
+@numa_bitmask: The bitmask provided by libnuma. Users can use this
+parameter to control requesting ctxs devices in the bind NUMA scenario.
+This parameter is mainly convenient for users to use in the binding
+cpu scenario. It can avoid resource waste or initialization failure
+caused by insufficient resources. Libnuma provides a complete operation
+interface which can be found in numa.h.
+
+@ctx_nums: The requested ctx number for each numa node. Due to users
+may have different requirements for different types of ctx numbers,
+needs a two-dimensional array as input.
+
+@sched_type: Scheduling type the user wants to use.
+
+To sum up, the wd_alg_init2 is as follows
+
+```c
+struct wd_ctx_nums {
+	__u32 sync_ctx_num;
+	__u32 async_ctx_num;
+};
+
+struct wd_ctx_params {
+	__u32 ctx_set_num;
+	struct wd_ctx_nums *ctx_set_size;
+};
+
+init wd_alg_init2 (struct uacce_dev_list *list, struct bitmask *bmp,
+                   struct wd_ctx_params *cparams, __u32 sched_type);
+```
+
+Somebody may say that the wd_alg_init2 is still complex for three
+input parameters are structure. So the interface support default value
+for some parameters. The @bmp can be set as NULL, and then it will be
+initialized according to device list. The @cparams can be set as NULL,
+and it has a default value in wd_alg.c. The @list and sched_type are
+necessary.
+
+What's more, uadk provides a new set of interface to get device list
+bit mask.
+
+```c
+struct bitmask *wd_create_device_nodemask(strcut uacce_dev_list *list);
+
+void wd_free_device_nodemask(struct bitmask *bmp);
+```
+
+## Demo
+
+The simplest user initialization process is:
+
+```c
+{
+	……
+	struct uacce_dev_list *list;
+	int ret;
+
+	list = wd_get_accel_list(alg);
+	ret = wd_<alg>_init2_(list, NULL, NULL, sched_type);
可以合并再简化：
  wd_<alg>_init2_(alg，mode_type, node_mask);

其他的一点想法补充：
后面如果拓展指令其他实现，uacce_dev应该要被收到内部去。用户只看到alg去申请。
现在uadk用户态三层结构，libwd是基础最底层接口，他提供的接口是给udriver 和 alg层调用。alg层的接口对用户。
所以wd_request_ctx()等被udriver调用，wd_get_accel_list（）等被alg调用。

还有个调度器，这个耦合的有点别扭。目前还要看这么简化和收到内部去。
...
+	wd_free_list_accel(list);
+	……
+}
+```

    

[Acc] Re: [PATCH v2 6/6] uadk/docs - support a simple interface for initialization

fanghao (A)