From: Xinghai Cen cenxinghai@h-partners.com
RDMA/hns: Some patches are integrated into OLK-5.10
Junxian Huang (2): RDMA/hns: Fix bonding failure due to wrong reset_state RDMA/hns: Don't query *except on hip10*
wenglianfa (1): RDMA/hns: Fix creating GSI QP in non-extended SGE QP bank
drivers/infiniband/hw/hns/hns_roce_bond.c | 9 ++++++--- drivers/infiniband/hw/hns/hns_roce_device.h | 2 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 16 ++++++---------- 3 files changed, 13 insertions(+), 14 deletions(-)
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBD4ID
----------------------------------------------------------------------
When roce driver is removed during reset, the reset flow of roce may not be fully completed. This may lead to the reset_state of roce handler stored in nic driver remaining in a middle state, such as HNS_ROCE_STATE_RST_DOWN or HNS_ROCE_STATE_RST_UNINIT.
The reset_state won't be cleared even if roce driver is re-inited. This cause that roce bonding which currently relies on reset_state fails in this case.
Replace the reset detection for bonding with nic APIs (.ae_dev_resetting() and .get_hw_reset_stat()), just like the reset detection elsewhere in roce driver.
Fixes: b927e3066992 ("RDMA/hns: Fix the concurrency error between bond and reset.") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Signed-off-by: Xinghai Cen cenxinghai@h-partners.com --- drivers/infiniband/hw/hns/hns_roce_bond.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index a4ac07f8fc96..5799db2c09ba 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -543,6 +543,7 @@ static void hns_roce_do_bond(struct hns_roce_bond_group *bond_grp)
bool is_bond_slave_in_reset(struct hns_roce_bond_group *bond_grp) { + const struct hnae3_ae_ops *ops; struct hnae3_handle *handle; struct net_device *net_dev; int i; @@ -550,9 +551,11 @@ bool is_bond_slave_in_reset(struct hns_roce_bond_group *bond_grp) for (i = 0; i < ROCE_BOND_FUNC_MAX; i++) { net_dev = bond_grp->bond_func_info[i].net_dev; handle = bond_grp->bond_func_info[i].handle; - if (net_dev && handle && - handle->rinfo.reset_state != HNS_ROCE_STATE_NON_RST && - handle->rinfo.reset_state != HNS_ROCE_STATE_RST_INITED) + if (!net_dev || !handle) + continue; + ops = handle->ae_algo->ops; + if (ops->ae_dev_resetting(handle) || + ops->get_hw_reset_stat(handle)) return true; }
From: wenglianfa wenglianfa@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBD4ID
--------------------------------------------------------------------------
According to the IB protocol, the qp_num of the GSI QP must be set to 1. Therefore, the QP must be created in BANK 1.
Currently, only QPs in BANK 0 and 6 can use extended SGEs, but the GSI QP in bank 1 also need to use extended SGEs. To fix it, the restriction is changed to allow QPs in BANK 1 and BANK 6 to use extended SGEs.
Fixes: 1c16701634e4 ("RDMA/hns: Fix RoCEE hang when multiple QP banks use EXT_SGE") Signed-off-by: wenglianfa wenglianfa@huawei.com Signed-off-by: Xinghai Cen cenxinghai@h-partners.com --- drivers/infiniband/hw/hns/hns_roce_device.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index a8083da693bf..f12e56969fd3 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -115,7 +115,7 @@ #define VALID_CQ_BANK_MASK_DEFAULT 0xF #define VALID_CQ_BANK_MASK_LIMIT 0x9
-#define VALID_EXT_SGE_QP_BANK_MASK_LIMIT 0x41 +#define VALID_EXT_SGE_QP_BANK_MASK_LIMIT 0x42
#define HNS_ROCE_MAX_CQ_COUNT 0xFFFF #define HNS_ROCE_MAX_CQ_PERIOD 0xFFFF
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBD4ID
--------------------------------------------------------------------------
Commit 1470b68f21c3 ("RDMA/hns: Fix print after query hw id failed.") was meant to avoid error printing when HNS_ROCE_OPC_QUERY_HW_ID cmd is not supported, that is, when CMD_NOT_EXIST error is returned by FW.
But the checking condition is incorrect. Now all errors printings of HNS_ROCE_OPC_QUERY_HW_ID, even if the error code is not CMD_NOT_EXIST, will be ignored.
This cmd is for stars mode and cnp priority setting. Since this two features are supported only by HIP10, don't send this cmd on other platform.
Fixes: 1470b68f21c3 ("RDMA/hns: Fix print after query hw id failed.") Fixes: 867e1e95fe12 ("RDMA/hns: Support query HW ID from user space.") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Signed-off-by: Xinghai Cen cenxinghai@h-partners.com --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 13da17c913de..80f275243cfd 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -1451,11 +1451,9 @@ static int __hns_roce_cmq_send(struct hns_roce_dev *hr_dev, if (likely(desc_ret == CMD_EXEC_SUCCESS)) continue;
- if (desc->opcode != cpu_to_le16(HNS_ROCE_OPC_QUERY_HW_ID) && - desc_ret != CMD_NOT_EXIST) - dev_err_ratelimited(hr_dev->dev, - "Cmdq IO error, opcode = 0x%x, return = 0x%x.\n", - desc->opcode, desc_ret); + dev_err_ratelimited(hr_dev->dev, + "Cmdq IO error, opcode = 0x%x, return = 0x%x.\n", + desc->opcode, desc_ret); ret = hns_roce_cmd_err_convert_errno(desc_ret); } } else { @@ -1593,16 +1591,14 @@ static void hns_roce_cmq_query_hw_id(struct hns_roce_dev *hr_dev) struct hns_roce_cmq_desc desc; int ret;
- if (hr_dev->is_vf) + if (hr_dev->is_vf || hr_dev->pci_dev->revision <= PCI_REVISION_ID_HIP09) goto invalid_val;
hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_QUERY_HW_ID, true); ret = hns_roce_cmq_send(hr_dev, &desc, 1); if (ret) { - if (desc.retval != cpu_to_le16(CMD_NOT_EXIST)) - ibdev_warn(&hr_dev->ib_dev, - "failed to query hw id, ret = %d.\n", ret); - + ibdev_warn(&hr_dev->ib_dev, + "failed to query hw id, ret = %d.\n", ret); goto invalid_val; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/14305 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/14305 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/J...