[PATCH] libhns: Bugfixes and one debug improvement

From: Xinghai Cen <cenxinghai@h-partners.com> The last commit was found when I created a XRC SRQ in lock-free mode but failed to destroy it because of the refcnt check added in the previous commit. The failure was because the PAD was acquired through ibv_srq->pd in destroy_srq(), while ibv_srq->pd wasn't assigned when the SRQ was created by ibv_create_srq_ex(). So let's assign ibv_srq->pd in the common ibv_icmd_create_srq() , so that drivers can get the correct pd no matter which api the SRQ is created by. Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> --- ...hns-Add-debug-log-for-lock-free-mode.patch | 59 +++++++++++ ...s-Fix-ret-not-assigned-in-create-srq.patch | 58 +++++++++++ ...efcnt-leaking-in-error-flow-of-creat.patch | 99 +++++++++++++++++++ ...-freeing-pad-without-checking-refcnt.patch | 69 +++++++++++++ ...-Assign-ibv-srq-pd-when-creating-SRQ.patch | 43 ++++++++ rdma-core.spec | 13 ++- 6 files changed, 340 insertions(+), 1 deletion(-) create mode 100644 0058-libhns-Add-debug-log-for-lock-free-mode.patch create mode 100644 0059-libhns-Fix-ret-not-assigned-in-create-srq.patch create mode 100644 0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch create mode 100644 0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch create mode 100644 0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch diff --git a/0058-libhns-Add-debug-log-for-lock-free-mode.patch b/0058-libhns-Add-debug-log-for-lock-free-mode.patch new file mode 100644 index 0000000..86bc0a5 --- /dev/null +++ b/0058-libhns-Add-debug-log-for-lock-free-mode.patch @@ -0,0 +1,59 @@ +From 20dc7f183603b936ba7a865fc8a6d115073b1e29 Mon Sep 17 00:00:00 2001 +From: Junxian Huang <huangjunxian6@hisilicon.com> +Date: Thu, 24 Apr 2025 20:32:12 +0800 +Subject: [PATCH 58/62] libhns: Add debug log for lock-free mode + +mainline inclusion +from mainline-v56.0-65 +commit fb96940fcf6f96185d407d57bcaf775ccf8f1762 +category: cheanup +bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57 +CVE: NA + +Reference: +https://github.com/linux-rdma/rdma-core/pull/1599/commits/fb96940fcf6f96185d... + +--------------------------------------------------------------------- + +Currently there is no way to observe whether the lock-free mode is +configured from the driver's perspective. Add debug log for this. + +Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> +Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> +--- + providers/hns/hns_roce_u_verbs.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c +index 5fe169e..3efc2f4 100644 +--- a/providers/hns/hns_roce_u_verbs.c ++++ b/providers/hns/hns_roce_u_verbs.c +@@ -182,6 +182,7 @@ err: + struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context, + struct ibv_parent_domain_init_attr *attr) + { ++ struct hns_roce_pd *protection_domain; + struct hns_roce_pad *pad; + + if (ibv_check_alloc_parent_domain(attr)) +@@ -198,12 +199,16 @@ struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context, + return NULL; + } + ++ protection_domain = to_hr_pd(attr->pd); + if (attr->td) { + pad->td = to_hr_td(attr->td); + atomic_fetch_add(&pad->td->refcount, 1); ++ verbs_debug(verbs_get_ctx(context), ++ "set PAD(0x%x) to lock-free mode.\n", ++ protection_domain->pdn); + } + +- pad->pd.protection_domain = to_hr_pd(attr->pd); ++ pad->pd.protection_domain = protection_domain; + atomic_fetch_add(&pad->pd.protection_domain->refcount, 1); + + atomic_init(&pad->pd.refcount, 1); +-- +2.33.0 + diff --git a/0059-libhns-Fix-ret-not-assigned-in-create-srq.patch b/0059-libhns-Fix-ret-not-assigned-in-create-srq.patch new file mode 100644 index 0000000..10443d6 --- /dev/null +++ b/0059-libhns-Fix-ret-not-assigned-in-create-srq.patch @@ -0,0 +1,58 @@ +From cf284feddf6bb98c600061a3fd1f0095e46b540e Mon Sep 17 00:00:00 2001 +From: Junxian Huang <huangjunxian6@hisilicon.com> +Date: Wed, 23 Apr 2025 16:55:14 +0800 +Subject: [PATCH 59/62] libhns: Fix ret not assigned in create srq() + +mainline inclusion +from mainline-v56.0-65 +commit 2034b1860c5a8b0cc3879315259462c04e53a98d +category: bugfix +bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57 +CVE: NA + +Reference: +https://github.com/linux-rdma/rdma-core/pull/1599/commits/2034b1860c5a8b0cc3... + +--------------------------------------------------------------------- + +Fix the problem that ret may not be assigned in the error flow +of create_srq(). + +Fixes: aa7bcf7f7e44 ("libhns: Add support for lock-free SRQ") +Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> +Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> +--- + providers/hns/hns_roce_u_verbs.c | 10 +++++++--- + 1 file changed, 7 insertions(+), 3 deletions(-) + +diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c +index 3efc2f4..b26ac29 100644 +--- a/providers/hns/hns_roce_u_verbs.c ++++ b/providers/hns/hns_roce_u_verbs.c +@@ -933,16 +933,20 @@ static struct ibv_srq *create_srq(struct ibv_context *context, + if (pad) + atomic_fetch_add(&pad->pd.refcount, 1); + +- if (hns_roce_srq_spinlock_init(context, srq, init_attr)) ++ ret = hns_roce_srq_spinlock_init(context, srq, init_attr) ++ if (ret) + goto err_free_srq; + + set_srq_param(context, srq, init_attr); +- if (alloc_srq_buf(srq)) ++ ret = alloc_srq_buf(srq) ++ if (ret) + goto err_destroy_lock; + + srq->rdb = hns_roce_alloc_db(hr_ctx, HNS_ROCE_SRQ_TYPE_DB); +- if (!srq->rdb) ++ if (!srq->rdb) { ++ ret = ENOMEM; + goto err_srq_buf; ++ } + + ret = exec_srq_create_cmd(context, srq, init_attr); + if (ret) +-- +2.33.0 + diff --git a/0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch b/0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch new file mode 100644 index 0000000..95a1e69 --- /dev/null +++ b/0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch @@ -0,0 +1,99 @@ +From 2bc3cafa227528e2893dadfff7cf54cfee427e1a Mon Sep 17 00:00:00 2001 +From: Junxian Huang <huangjunxian6@hisilicon.com> +Date: Wed, 23 Apr 2025 16:55:15 +0800 +Subject: [PATCH 60/62] libhns: Fix pad refcnt leaking in error flow of create + qp/cq/srq + +mainline inclusion +from mainline-v56.0-65 +commit f877d6e610e438515e1535c9ec7a3a3ef37c58e0 +category: bugfix +bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57 +CVE: NA + +Reference: +https://github.com/linux-rdma/rdma-core/pull/1599/commits/f877d6e610e438515e... + +--------------------------------------------------------------------- + +Decrease pad refcnt by 1 in error flow of create qp/cq/srq. + +Fixes: f8b4f622b1c5 ("libhns: Add support for lock-free QP") +Fixes: 95225025e24c ("libhns: Add support for lock-free CQ") +Fixes: aa7bcf7f7e44 ("libhns: Add support for lock-free SRQ") +Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> +Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> +--- + providers/hns/hns_roce_u_verbs.c | 20 +++++++++++++------- + 1 file changed, 13 insertions(+), 7 deletions(-) + +diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c +index b26ac29..5a62bb2 100644 +--- a/providers/hns/hns_roce_u_verbs.c ++++ b/providers/hns/hns_roce_u_verbs.c +@@ -445,12 +445,9 @@ static int verify_cq_create_attr(struct ibv_cq_init_attr_ex *attr, + return EOPNOTSUPP; + } + +- if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) { +- if (!pad) { +- verbs_err(&context->ibv_ctx, "failed to check the pad of cq.\n"); +- return EINVAL; +- } +- atomic_fetch_add(&pad->pd.refcount, 1); ++ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD && !pad) { ++ verbs_err(&context->ibv_ctx, "failed to check the pad of cq.\n"); ++ return EINVAL; + } + + attr->cqe = max_t(uint32_t, HNS_ROCE_MIN_CQE_NUM, +@@ -556,6 +553,7 @@ static void hns_roce_uninit_cq_swc(struct hns_roce_cq *cq) + static struct ibv_cq_ex *create_cq(struct ibv_context *context, + struct ibv_cq_init_attr_ex *attr) + { ++ struct hns_roce_pad *pad = to_hr_pad(attr->parent_domain); + struct hns_roce_context *hr_ctx = to_hr_ctx(context); + struct hns_roce_cq *cq; + int ret; +@@ -570,8 +568,10 @@ static struct ibv_cq_ex *create_cq(struct ibv_context *context, + goto err; + } + +- if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) ++ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) { + cq->parent_domain = attr->parent_domain; ++ atomic_fetch_add(&pad->pd.refcount, 1); ++ } + + ret = hns_roce_cq_spinlock_init(context, cq, attr); + if (ret) +@@ -611,6 +611,8 @@ err_db: + err_buf: + hns_roce_spinlock_destroy(&cq->hr_lock); + err_lock: ++ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) ++ atomic_fetch_sub(&pad->pd.refcount, 1); + free(cq); + err: + if (ret < 0) +@@ -977,6 +979,8 @@ err_destroy_lock: + hns_roce_spinlock_destroy(&srq->hr_lock); + + err_free_srq: ++ if (pad) ++ atomic_fetch_sub(&pad->pd.refcount, 1); + free(srq); + + err: +@@ -1872,6 +1876,8 @@ err_cmd: + err_buf: + hns_roce_qp_spinlock_destroy(qp); + err_spinlock: ++ if (pad) ++ atomic_fetch_sub(&pad->pd.refcount, 1); + free(qp); + err: + if (ret < 0) +-- +2.33.0 + diff --git a/0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch b/0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch new file mode 100644 index 0000000..d37dee6 --- /dev/null +++ b/0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch @@ -0,0 +1,69 @@ +From 0db7ff07caa483da0fb2cfd7944d549a38b4c720 Mon Sep 17 00:00:00 2001 +From: Junxian Huang <huangjunxian6@hisilicon.com> +Date: Wed, 23 Apr 2025 16:55:16 +0800 +Subject: [PATCH 61/62] libhns: Fix freeing pad without checking refcnt + +mainline inclusion +from mainline-v56.0-65 +commit 234d135276ea8ef83633113e224e0cd735ebeca8 +category: bugfix +bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57 +CVE: NA + +Reference: +https://github.com/linux-rdma/rdma-core/pull/1599/commits/234d135276ea8ef836... + +--------------------------------------------------------------------- + +Currently pad refcnt will be added when creating qp/cq/srq, but it is +not checked when freeing pad. Add a check to prevent freeing pad when +it is still used by any qp/cq/srq. + +Fixes: 7b6b3dae328f ("libhns: Add support for thread domain and parent +domain") +Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> +Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> +--- + providers/hns/hns_roce_u_verbs.c | 12 +++++++----- + 1 file changed, 7 insertions(+), 5 deletions(-) + +diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c +index 5a62bb2..8c37496 100644 +--- a/providers/hns/hns_roce_u_verbs.c ++++ b/providers/hns/hns_roce_u_verbs.c +@@ -218,14 +218,18 @@ struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context, + return &pad->pd.ibv_pd; + } + +-static void hns_roce_free_pad(struct hns_roce_pad *pad) ++static int hns_roce_free_pad(struct hns_roce_pad *pad) + { ++ if (atomic_load(&pad->pd.refcount) > 1) ++ return EBUSY; ++ + atomic_fetch_sub(&pad->pd.protection_domain->refcount, 1); + + if (pad->td) + atomic_fetch_sub(&pad->td->refcount, 1); + + free(pad); ++ return 0; + } + + static int hns_roce_free_pd(struct hns_roce_pd *pd) +@@ -248,10 +252,8 @@ int hns_roce_u_dealloc_pd(struct ibv_pd *ibv_pd) + struct hns_roce_pad *pad = to_hr_pad(ibv_pd); + struct hns_roce_pd *pd = to_hr_pd(ibv_pd); + +- if (pad) { +- hns_roce_free_pad(pad); +- return 0; +- } ++ if (pad) ++ return hns_roce_free_pad(pad); + + return hns_roce_free_pd(pd); + } +-- +2.33.0 + diff --git a/0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch b/0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch new file mode 100644 index 0000000..e7d0395 --- /dev/null +++ b/0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch @@ -0,0 +1,43 @@ +From f8f9295695921fa796bb93c5ee7066e50221bbc3 Mon Sep 17 00:00:00 2001 +From: Junxian Huang <huangjunxian6@hisilicon.com> +Date: Wed, 23 Apr 2025 16:55:17 +0800 +Subject: [PATCH 62/62] verbs: Assign ibv srq->pd when creating SRQ + +mainline inclusion +from mainline-v56.0-65 +commit bf1e427141fde2651bab4860e77a432bb7e26094 +category: bugfix +bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57 +CVE: NA + +Reference: +https://github.com/linux-rdma/rdma-core/pull/1599/commits/bf1e427141fde2651b... + +--------------------------------------------------------------------- + +Some providers need to access ibv_srq->pd during SRQ destruction, but +it may not be assigned currently when using ibv_create_srq_ex(). This +may lead to some SRQ-related resource leaks. Assign ibv_srq->pd when +creating SRQ to ensure pd can be obtained correctly. + +Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> +Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com> +--- + libibverbs/cmd_srq.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/libibverbs/cmd_srq.c b/libibverbs/cmd_srq.c +index dfaaa6a..259ea0d 100644 +--- a/libibverbs/cmd_srq.c ++++ b/libibverbs/cmd_srq.c +@@ -63,6 +63,7 @@ static int ibv_icmd_create_srq(struct ibv_pd *pd, struct verbs_srq *vsrq, + struct verbs_xrcd *vxrcd = NULL; + enum ibv_srq_type srq_type; + ++ srq->pd = pd; + srq->context = pd->context; + pthread_mutex_init(&srq->mutex, NULL); + pthread_cond_init(&srq->cond, NULL); +-- +2.33.0 + diff --git a/rdma-core.spec b/rdma-core.spec index e760f40..928fc75 100644 --- a/rdma-core.spec +++ b/rdma-core.spec @@ -1,6 +1,6 @@ Name: rdma-core Version: 50.0 -Release: 27 +Release: 28 Summary: RDMA core userspace libraries and daemons License: GPL-2.0-only OR BSD-2-Clause AND BSD-3-Clause Url: https://github.com/linux-rdma/rdma-core @@ -63,6 +63,11 @@ patch54: 0054-libhns-Fix-wrong-max-inline-data-value.patch patch55: 0055-libhns-Fix-wrong-order-of-spin-unlock-in-modify-qp.patch patch56: 0056-libhns-Add-initial-support-for-HNS-LTTng-tracing.patch patch57: 0057-libhns-Add-tracepoint-for-HNS-RoCE-I-O.patch +patch58: 0058-libhns-Add-debug-log-for-lock-free-mode.patch +patch59: 0059-libhns-Fix-ret-not-assigned-in-create-srq.patch +patch60: 0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch +patch61: 0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch +patch62: 0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch BuildRequires: binutils cmake >= 2.8.11 gcc libudev-devel pkgconfig pkgconfig(libnl-3.0) BuildRequires: pkgconfig(libnl-route-3.0) systemd systemd-devel @@ -642,6 +647,12 @@ fi %doc %{_docdir}/%{name}-%{version}/70-persistent-ipoib.rules %changelog +* Fri Apr 25 2025 Xinghai Cen <cenxinghai@h-partners.com> - 50.0-28 +- Type: bugfix +- ID: NA +- SUG: NA +- DESC: Bugfixes and one debug improvement + * Wed Apr 23 2025 Xinghai Cen <cenxinghai@h-partners.com> - 50.0-27 - Type: feature - ID: NA -- 2.33.0
participants (1)
-
Junxian Huang