From: Juan Zhou zhoujuan51@h-partners.com
Some bugfixes for openeuler 22.03 sp4.
Chengchang Tang (3): RDMA/hns: Fix missing resetting notify RDMA/hns: Use mutex to protect uconctext RDMA/hns: Fix allocating POE channels after IB device registeration
wenglianfa (1): RDMA/hns : Fix scc delay_work to execute after sysfs shutdown
drivers/infiniband/core/ib_core_uverbs.c | 85 ++++++++++++++++++++ drivers/infiniband/core/rdma_core.h | 1 - drivers/infiniband/core/uverbs_main.c | 64 --------------- drivers/infiniband/hw/hns/hns_roce_debugfs.c | 6 +- drivers/infiniband/hw/hns/hns_roce_device.h | 2 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 14 ++-- drivers/infiniband/hw/hns/hns_roce_main.c | 14 ++-- drivers/infiniband/hw/hns/hns_roce_sysfs.c | 5 ++ include/rdma/ib_verbs.h | 2 + 9 files changed, 110 insertions(+), 83 deletions(-)
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9QO9C
----------------------------------------------------------------------
A hardware reset is required to stop traffic injection, currently this is done by obtaining a notification from the kernel space driver and stop ringing the doorbell in user space. This notification is implemented through a shared memory. If concurrency scenarios are involved, the shared memory mechanism needs barriers to ensure reliability, but barriers will severely affect performance.
This patch uses a new scheme to solve this problem. Before resetting, the kernel-mode driver will zap all the shared memory between user-mode driver and kernel-mode driver, and point these VMAs to a zero page, so that user-mode can no longer access any hardware address during reset, thus achieving flow stop.
Fixes: e8b1fec497a0 ("RDMA/hns: Kernel notify usr space to stop ring db") Signed-off-by: Chengchang Tang tangchengchang@huawei.com --- drivers/infiniband/core/ib_core_uverbs.c | 85 ++++++++++++++++++++++ drivers/infiniband/core/rdma_core.h | 1 - drivers/infiniband/core/uverbs_main.c | 64 ---------------- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 14 ++-- include/rdma/ib_verbs.h | 2 + 5 files changed, 94 insertions(+), 72 deletions(-)
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c index b51bd7087a88..4e27389a75ad 100644 --- a/drivers/infiniband/core/ib_core_uverbs.c +++ b/drivers/infiniband/core/ib_core_uverbs.c @@ -5,6 +5,7 @@ * Copyright 2019 Marvell. All rights reserved. */ #include <linux/xarray.h> +#include <linux/sched/mm.h> #include "uverbs.h" #include "core_priv.h"
@@ -365,3 +366,87 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, U32_MAX); } EXPORT_SYMBOL(rdma_user_mmap_entry_insert); + +void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) +{ + struct rdma_umap_priv *priv, *next_priv; + + lockdep_assert_held(&ufile->hw_destroy_rwsem); + + while (1) { + struct mm_struct *mm = NULL; + + /* Get an arbitrary mm pointer that hasn't been cleaned yet */ + mutex_lock(&ufile->umap_lock); + while (!list_empty(&ufile->umaps)) { + int ret; + + priv = list_first_entry(&ufile->umaps, + struct rdma_umap_priv, list); + mm = priv->vma->vm_mm; + ret = mmget_not_zero(mm); + if (!ret) { + list_del_init(&priv->list); + if (priv->entry) { + rdma_user_mmap_entry_put(priv->entry); + priv->entry = NULL; + } + mm = NULL; + continue; + } + break; + } + mutex_unlock(&ufile->umap_lock); + if (!mm) + return; + + /* + * The umap_lock is nested under mmap_lock since it used within + * the vma_ops callbacks, so we have to clean the list one mm + * at a time to get the lock ordering right. Typically there + * will only be one mm, so no big deal. + */ + mmap_read_lock(mm); + mutex_lock(&ufile->umap_lock); + list_for_each_entry_safe(priv, next_priv, &ufile->umaps, list) { + struct vm_area_struct *vma = priv->vma; + + if (vma->vm_mm != mm) + continue; + list_del_init(&priv->list); + + zap_vma_ptes(vma, vma->vm_start, + vma->vm_end - vma->vm_start); + + if (priv->entry) { + rdma_user_mmap_entry_put(priv->entry); + priv->entry = NULL; + } + } + mutex_unlock(&ufile->umap_lock); + mmap_read_unlock(mm); + mmput(mm); + } +} +EXPORT_SYMBOL(uverbs_user_mmap_disassociate); + +/** + * rdma_user_mmap_disassociate() - disassociate the mmap from the ucontext. + * + * @ucontext: associated user context. + * + * This function should be called by drivers that need to disable mmap for + * some ucontexts. + */ +void rdma_user_mmap_disassociate(struct ib_ucontext *ucontext) +{ + struct ib_uverbs_file *ufile = ucontext->ufile; + + /* Racing with uverbs_destroy_ufile_hw */ + if (!down_read_trylock(&ufile->hw_destroy_rwsem)) + return; + + uverbs_user_mmap_disassociate(ufile); + up_read(&ufile->hw_destroy_rwsem); +} +EXPORT_SYMBOL(rdma_user_mmap_disassociate); diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h index 33706dad6c0f..ad01fbd52c48 100644 --- a/drivers/infiniband/core/rdma_core.h +++ b/drivers/infiniband/core/rdma_core.h @@ -149,7 +149,6 @@ void uverbs_disassociate_api(struct uverbs_api *uapi); void uverbs_destroy_api(struct uverbs_api *uapi); void uapi_compute_bundle_size(struct uverbs_api_ioctl_method *method_elm, unsigned int num_attrs); -void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile);
extern const struct uapi_definition uverbs_def_obj_async_fd[]; extern const struct uapi_definition uverbs_def_obj_counters[]; diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index f2fcdb70d903..5a0a0936b9e1 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -45,7 +45,6 @@ #include <linux/cdev.h> #include <linux/anon_inodes.h> #include <linux/slab.h> -#include <linux/sched/mm.h>
#include <linux/uaccess.h>
@@ -806,69 +805,6 @@ static const struct vm_operations_struct rdma_umap_ops = { .fault = rdma_umap_fault, };
-void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) -{ - struct rdma_umap_priv *priv, *next_priv; - - lockdep_assert_held(&ufile->hw_destroy_rwsem); - - while (1) { - struct mm_struct *mm = NULL; - - /* Get an arbitrary mm pointer that hasn't been cleaned yet */ - mutex_lock(&ufile->umap_lock); - while (!list_empty(&ufile->umaps)) { - int ret; - - priv = list_first_entry(&ufile->umaps, - struct rdma_umap_priv, list); - mm = priv->vma->vm_mm; - ret = mmget_not_zero(mm); - if (!ret) { - list_del_init(&priv->list); - if (priv->entry) { - rdma_user_mmap_entry_put(priv->entry); - priv->entry = NULL; - } - mm = NULL; - continue; - } - break; - } - mutex_unlock(&ufile->umap_lock); - if (!mm) - return; - - /* - * The umap_lock is nested under mmap_lock since it used within - * the vma_ops callbacks, so we have to clean the list one mm - * at a time to get the lock ordering right. Typically there - * will only be one mm, so no big deal. - */ - mmap_read_lock(mm); - mutex_lock(&ufile->umap_lock); - list_for_each_entry_safe (priv, next_priv, &ufile->umaps, - list) { - struct vm_area_struct *vma = priv->vma; - - if (vma->vm_mm != mm) - continue; - list_del_init(&priv->list); - - zap_vma_ptes(vma, vma->vm_start, - vma->vm_end - vma->vm_start); - - if (priv->entry) { - rdma_user_mmap_entry_put(priv->entry); - priv->entry = NULL; - } - } - mutex_unlock(&ufile->umap_lock); - mmap_read_unlock(mm); - mmput(mm); - } -} - /* * ib_uverbs_open() does not need the BKL: * diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index dbdf3b4081b1..570675a3b541 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -41,6 +41,7 @@ #include <rdma/ib_cache.h> #include <rdma/ib_umem.h> #include <rdma/uverbs_ioctl.h> +#include <rdma/ib_verbs.h>
#include "hnae3.h" #include "hclge_main.h" @@ -7826,14 +7827,13 @@ int hns_roce_bond_uninit_client(struct hns_roce_bond_group *bond_grp,
static void hns_roce_v2_reset_notify_user(struct hns_roce_dev *hr_dev) { - struct hns_roce_v2_reset_state *state; - - state = (struct hns_roce_v2_reset_state *)hr_dev->reset_kaddr; + struct hns_roce_ucontext *uctx, *tmp;
- state->reset_state = HNS_ROCE_IS_RESETTING; - state->hw_ready = 0; - /* Ensure reset state was flushed in memory */ - wmb(); + mutex_lock(&hr_dev->uctx_list_mutex); + list_for_each_entry_safe(uctx, tmp, &hr_dev->uctx_list, list) { + rdma_user_mmap_disassociate(&uctx->ibucontext); + } + mutex_unlock(&hr_dev->uctx_list_mutex); }
static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5aa21b454afc..bc45cc211839 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2854,6 +2854,7 @@ int rdma_user_mmap_entry_insert_range(struct ib_ucontext *ucontext, struct rdma_user_mmap_entry *entry, size_t length, u32 min_pgoff, u32 max_pgoff); +void rdma_user_mmap_disassociate(struct ib_ucontext *ucontext);
static inline int rdma_user_mmap_entry_insert_exact(struct ib_ucontext *ucontext, @@ -4601,6 +4602,7 @@ void rdma_roce_rescan_device(struct ib_device *ibdev); struct ib_ucontext *ib_uverbs_get_ucontext_file(struct ib_uverbs_file *ufile);
int uverbs_destroy_def_handler(struct uverbs_attr_bundle *attrs); +void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile);
struct net_device *rdma_alloc_netdev(struct ib_device *device, u8 port_num, enum rdma_netdev_t type, const char *name,
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9QO9C
----------------------------------------------------------------------
ucontext list does not work in an atomic context, so there is no need to use a spinlock for protection.
Signed-off-by: Chengchang Tang tangchengchang@huawei.com --- drivers/infiniband/hw/hns/hns_roce_debugfs.c | 6 ++---- drivers/infiniband/hw/hns/hns_roce_device.h | 2 +- drivers/infiniband/hw/hns/hns_roce_main.c | 12 +++++++----- 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_debugfs.c b/drivers/infiniband/hw/hns/hns_roce_debugfs.c index 9d3075ee0d22..c8294a836ee3 100644 --- a/drivers/infiniband/hw/hns/hns_roce_debugfs.c +++ b/drivers/infiniband/hw/hns/hns_roce_debugfs.c @@ -287,13 +287,11 @@ static void dca_stats_dev_pool_in_seqfile(struct hns_roce_dev *hr_dev, /* Write kernel DCA pool stats */ dca_print_pool_stats(&hr_dev->dca_ctx, 0, true, file); /* Write user DCA pool stats */ - spin_lock(&hr_dev->uctx_list_lock); + mutex_lock(&hr_dev->uctx_list_mutex); list_for_each_entry_safe(uctx, tmp, &hr_dev->uctx_list, list) { - spin_unlock(&hr_dev->uctx_list_lock); dca_print_pool_stats(&uctx->dca_ctx, uctx->pid, false, file); - spin_lock(&hr_dev->uctx_list_lock); } - spin_unlock(&hr_dev->uctx_list_lock); + mutex_unlock(&hr_dev->uctx_list_mutex); }
struct dca_qp_stats { diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 9a3c79dc6759..884b642f48e6 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -1126,7 +1126,7 @@ struct hns_roce_dev { struct hns_roce_dev_debugfs dbgfs; /* debugfs for this dev */
struct list_head uctx_list; /* list of all uctx on this dev */ - spinlock_t uctx_list_lock; /* protect @uctx_list */ + struct mutex uctx_list_mutex; /* protect @uctx_list */
struct hns_roce_uar priv_uar; const char *irq_names[HNS_ROCE_MAX_IRQ_NUM]; diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index a1b4f84675bc..4a76f44a5c98 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -611,9 +611,9 @@ static int hns_roce_alloc_ucontext(struct ib_ucontext *uctx, if (ret) goto error_fail_copy_to_udata;
- spin_lock(&hr_dev->uctx_list_lock); + mutex_lock(&hr_dev->uctx_list_mutex); list_add(&context->list, &hr_dev->uctx_list); - spin_unlock(&hr_dev->uctx_list_lock); + mutex_unlock(&hr_dev->uctx_list_mutex);
hns_roce_register_uctx_debugfs(hr_dev, context);
@@ -640,9 +640,9 @@ static void hns_roce_dealloc_ucontext(struct ib_ucontext *ibcontext) struct hns_roce_ucontext *context = to_hr_ucontext(ibcontext); struct hns_roce_dev *hr_dev = to_hr_dev(ibcontext->device);
- spin_lock(&hr_dev->uctx_list_lock); + mutex_lock(&hr_dev->uctx_list_mutex); list_del(&context->list); - spin_unlock(&hr_dev->uctx_list_lock); + mutex_unlock(&hr_dev->uctx_list_mutex);
hns_roce_unregister_uctx_debugfs(context);
@@ -1299,6 +1299,7 @@ static void hns_roce_teardown_hca(struct hns_roce_dev *hr_dev) hns_roce_cleanup_dca(hr_dev);
hns_roce_cleanup_bitmap(hr_dev); + mutex_destroy(&hr_dev->uctx_list_mutex); }
/** @@ -1319,7 +1320,7 @@ static int hns_roce_setup_hca(struct hns_roce_dev *hr_dev) spin_lock_init(&hr_dev->dip_list_lock);
INIT_LIST_HEAD(&hr_dev->uctx_list); - spin_lock_init(&hr_dev->uctx_list_lock); + mutex_init(&hr_dev->uctx_list_mutex);
INIT_LIST_HEAD(&hr_dev->mtr_unfree_list); spin_lock_init(&hr_dev->mtr_unfree_list_lock); @@ -1367,6 +1368,7 @@ static int hns_roce_setup_hca(struct hns_roce_dev *hr_dev)
err_uar_table_free: ida_destroy(&hr_dev->uar_ida.ida); + mutex_destroy(&hr_dev->uctx_list_mutex); return ret; }
From: wenglianfa wenglianfa@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9QO9C
----------------------------------------------------------------------
After sysfs is disabled, scc delay_work may continue to be executed, causing the UAF problem. To fix it, cancel_delayde_work_sync () is introduced to ensure that scc delay_work is canceled or executed.
Fixes: 523f34d81ea7 ("RDMA/hns: Support congestion control algorithm parameter configuration") Signed-off-by: wenglianfa wenglianfa@huawei.com --- drivers/infiniband/hw/hns/hns_roce_sysfs.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/infiniband/hw/hns/hns_roce_sysfs.c b/drivers/infiniband/hw/hns/hns_roce_sysfs.c index b6a9369d8147..a51423b94dc1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_sysfs.c +++ b/drivers/infiniband/hw/hns/hns_roce_sysfs.c @@ -429,10 +429,15 @@ int hns_roce_create_port_files(struct ib_device *ibdev, u8 port_num, static void hns_roce_unregister_port_sysfs(struct hns_roce_dev *hr_dev, u8 port_num) { + struct hns_roce_scc_param *scc_param; struct hns_roce_port *pdata; + int i;
pdata = &hr_dev->port_data[port_num]; sysfs_remove_groups(&pdata->kobj, hns_attr_port_groups); + scc_param = pdata->scc_param; + for (i = 0; i < HNS_ROCE_SCC_ALGO_TOTAL; i++) + cancel_delayed_work_sync(&scc_param[i].scc_cfg_dwork); kobject_put(&pdata->kobj); }
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9QO9C
----------------------------------------------------------------------
Currently, the POE channel was allocated after IB device registration.
Therefore, other modules will fail to register POE during this period that IB device was registered and POE channels was not allocated.
Fixes: 85b26a97adfe ("RDMA/hns: Support STARS mode QP") Signed-off-by: Chengchang Tang tangchengchang@huawei.com --- drivers/infiniband/hw/hns/hns_roce_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index 4a76f44a5c98..f639f52e6386 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -1528,11 +1528,11 @@ int hns_roce_init(struct hns_roce_dev *hr_dev) } }
+ hns_roce_register_poe_ch(hr_dev); ret = hns_roce_register_device(hr_dev); if (ret) goto error_failed_register_device;
- hns_roce_register_poe_ch(hr_dev); hns_roce_register_debugfs(hr_dev);
return 0;