driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9CK0O
--------------------------------------------------------------------------
read_poll_timeout() in MBX may cause sleep, especially at reset, the probability becomes higher. In other words, it is not safe to use MBX in an atomic context.
In order to ensure the atomicity of QPC setup, DCA will use locks to protect the QPC setup operation in DCA ATTACH_MEM phase(i.e. post_send/post_recv). This results in the above-mentioned problem at reset.
Replace read_poll_timeout() with read_poll_timeout_atomic() to avoid MBX operation sleep in an atomic context().
Fixes: 306b8c76257b ("RDMA/hns: Do not destroy QP resources in the hw resetting phase") Signed-off-by: Chengchang Tang tangchengchang@huawei.com Signed-off-by: Juan Zhou zhoujuan51@h-partners.com --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 36e707e48..f06079352 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -1130,7 +1130,7 @@ static u32 hns_roce_v2_cmd_hw_resetting(struct hns_roce_dev *hr_dev, unsigned long reset_stage) { #define HW_RESET_TIMEOUT_US 1000000 -#define HW_RESET_SLEEP_US 1000 +#define HW_RESET_DELAY_US 1
struct hns_roce_v2_priv *priv = hr_dev->priv; struct hnae3_handle *handle = priv->handle; @@ -1149,8 +1149,8 @@ static u32 hns_roce_v2_cmd_hw_resetting(struct hns_roce_dev *hr_dev, */ hr_dev->dis_db = true;
- ret = read_poll_timeout(ops->ae_dev_reset_cnt, val, - val > hr_dev->reset_cnt, HW_RESET_SLEEP_US, + ret = read_poll_timeout_atomic(ops->ae_dev_reset_cnt, val, + val > hr_dev->reset_cnt, HW_RESET_DELAY_US, HW_RESET_TIMEOUT_US, false, handle); if (!ret) hr_dev->is_reset = true;