From: Yangyang Li liyangyang20@huawei.com
stable inclusion from stable-v5.10.85 commit 33f320c35d69374d62b8a3b8f594f4da3aae91b5 bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 52414e27d6b568120b087d1fbafbb4482b0ccaab upstream.
is_reset is used to indicate whether the hardware starts to reset. When hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet started to reset. If is_reset is set at this time, all mailbox operations of resource destroy actions will be intercepted by driver. When the driver cleans up resources, but the hardware is still accessed, the following errors will appear:
arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000003f arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0800 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043e arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0800 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000020880000436 arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50a0880 arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000043a arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000a50e0840 hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0) arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 hns3 0000:35:00.0: received unknown or unhandled event of vector0 arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000350100000010 {34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7
is_reset will be set correctly in check_aedev_reset_status(), so the setting in hns_roce_hw_v2_reset_notify_down() should be deleted.
Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting") Link: https://lore.kernel.org/r/20211123084809.37318-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li liyangyang20@huawei.com Signed-off-by: Wenpeng Liang liangwenpeng@huawei.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chen Jun chenjun102@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 038e78619e26..436952bffdd0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -6385,10 +6385,8 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle) if (!hr_dev) return 0;
- hr_dev->is_reset = true; hr_dev->active = false; hr_dev->dis_db = true; - hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
return 0;