In a corner case of concurrent driver removal and driver reset, bonding resource is first released in hns_roce_hw_v2_exit() during driver removal, and then is allocated again in hns_roce_register_device() during driver reset. This leads to memory leak because the release timing has already passed. This may also lead to a kernel panic as below because of the leaked notifier callback: Call trace: 0xffffa20fccc04978 (P) raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x60/0xb8 netdev_lower_state_changed+0x4c/0xb8 Bonding resource allocation and release should occur only during driver init and removal, so don't do the allocation during reset. Fixes: b37ad2e290fc ("RDMA/hns: Initialize bonding resources") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> --- drivers/infiniband/hw/hns/hns_roce_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index c17ff5347a01..a7308a3c586e 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -795,6 +795,7 @@ static const struct ib_device_ops hns_roce_dev_restrack_ops = { static int hns_roce_register_device(struct hns_roce_dev *hr_dev) { + struct hns_roce_v2_priv *priv = hr_dev->priv; struct hns_roce_ib_iboe *iboe = NULL; struct device *dev = hr_dev->dev; struct ib_device *ib_dev = NULL; @@ -838,7 +839,8 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev) dma_set_max_seg_size(dev, SZ_2G); - if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_BOND) { + if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_BOND && + priv->handle->rinfo.reset_state != HNS_ROCE_STATE_RST_INIT) { ret = hns_roce_alloc_bond_grp(hr_dev); if (ret) { dev_err(dev, "failed to alloc bond_grp for bus %u, ret = %d\n", -- 2.33.0