New subject: [PATCH OLK-6.6] RDMA/hns: Fix mbox timing out by adding retry mechanism

10 Feb 2025

From: Junxian Huang <huangjunxian6@hisilicon.com>

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IBLA1C

---------------------------------------------------------------

For HIP08, the driver creates 8 free_mr_qp in free_mr_alloc_res(),
and then assigns qp->device for them in free_mr_modify_rsv_qp().
When an error occurs during this process, qp->device of some
free_mr_qp may not get assigned yet. When the driver enters the
error flow and destroys these free_mr_qp, it will obtain hr_dev by
qp->device, and this will lead to the panic below:

Unable to handle kernel paging request at virtual address 0000000000001198
...
Call trace:
  hns_roce_v2_destroy_qp_common+0x18c/0x5f4 [hns_roce_hw_v2]
  hns_roce_v2_destroy_qp+0x254/0x490 [hns_roce_hw_v2]
  free_mr_exit+0x6c/0x120 [hns_roce_hw_v2]
  free_mr_init+0xcc/0xf0 [hns_roce_hw_v2]
  __hns_roce_hw_v2_init_instance+0x180/0x304 [hns_roce_hw_v2]
  hns_roce_hw_v2_reset_notify_init+0x170/0x21c [hns_roce_hw_v2]
  hns_roce_hw_v2_reset_notify+0x6c/0x1a0 [hns_roce_hw_v2]
  hclge_notify_roce_client+0x6c/0x160 [hclge]
  hclge_reset_rebuild+0x150/0x5c0 [hclge]
  hclge_reset+0x10c/0x140 [hclge]
  hclge_reset_subtask+0x80/0x104 [hclge]
  hclge_reset_service_task+0x168/0x3ac [hclge]
  hclge_service_task+0xb4/0x100 [hclge]

Fixes: e89e2d3b692b ("RDMA/hns: Fix gid idx issue caused by free mr")
Fixes: 6f5f556d3795 ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 9607f2eb45b7..5a7539f8c14d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -3055,6 +3055,8 @@ static struct hns_roce_qp *create_free_mr_qp(struct hns_roce_dev *hr_dev,
 		ibdev_err(ibdev, "failed to create qp for free mr.\n");
 		return NULL;
 	}
+	qp->device = ibdev;
+	qp->qp_type = IB_QPT_RC;
 
 	return to_hr_qp(qp);
 }
@@ -3138,8 +3140,6 @@ static int free_mr_modify_rsv_qp(struct hns_roce_dev *hr_dev,
 
 	hr_qp = to_hr_qp(&free_mr->rsv_qp[sl_num]->ibqp);
 	hr_qp->free_mr_en = 1;
-	hr_qp->ibqp.device = ibdev;
-	hr_qp->ibqp.qp_type = IB_QPT_RC;
 
 	mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS;
 	attr->qp_state = IB_QPS_INIT;
-- 
2.33.0

    

[PATCH OLK-5.10] RDMA/hns: Fix free_mr_qp not assigning qp->device in error flow

Chengchang Tang

Chengchang Tang

patchwork bot

patchwork bot

tags

participants (2)