February 2021 - Linuxarm - mailweb.openeuler.org

[PATCH v2] PCI/DPC: Check host->native_dpc before enable dpc service
by Yicong Yang 28 Jul '21

28 Jul '21

Per Downstream Port Containment Related Enhancements ECN[1] Table 4-6, Interpretation of _OSC Control Field Returned Value, for bit 7 of _OSC control return value: "Firmware sets this bit to 1 to grant the OS control over PCI Express Downstream Port Containment configuration." "If control of this feature was requested and denied, or was not requested, the firmware returns this bit set to 0." We store bit 7 of _OSC control return value in host->native_dpc, check it before enable the dpc service as the firmware may not grant the control. [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 Signed-off-by: Yicong Yang <yangyicong(a)hisilicon.com> --- Change since v1: - use correct reference for _OSC control return value drivers/pci/pcie/portdrv_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index e1fed664..7445d03 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -253,7 +253,8 @@ static int get_port_device_capability(struct pci_dev *dev) */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || + ((services & PCIE_PORT_SERVICE_AER) && host->native_dpc))) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM || -- 2.8.1

4 11

[RFC PATCH 0/5] KVM/ARM64 Add support for pinned VMIDs
by Shameer Kolothum 23 Jul '21

23 Jul '21

On an ARM64 system with a SMMUv3 implementation that fully supports Broadcast TLB Maintenance(BTM) feature as part of the Distributed Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are received by SMMUv3. This is very useful when the SMMUv3 shares the page tables with the CPU(eg: Guest SVA use case). For this to work, the SMMU must use the same VMID that is allocated by KVM to configure the stage 2 translations. At present KVM VMID allocations are recycled on rollover and may change as a result. This will create issues if we have to share the KVM VMID with SMMU. Please see the discussion here, https://lore.kernel.org/linux-iommu/20200522101755.GA3453945@myrica/ This series proposes a way to share the VMID between KVM and IOMMU driver by, 1. Splitting the KVM VMID space into two equal halves based on the command line option "kvm-arm.pinned_vmid_enable". 2. First half of the VMID space follows the normal recycle on rollover policy. 3. Second half of the VMID space doesn't roll over and is used to allocate pinned VMIDs. 4. Provides helper function to retrieve the KVM instance associated with a device(if it is part of a vfio group). 5. Introduces generic interfaces to get/put pinned KVM VMIDs. Open Items: 1. I couldn't figure out a way to determine whether a platform actually fully supports DVM/BTM or not. Not sure we can take a call based on SMMUv3 BTM feature bit alone. Probably we can get it from firmware via IORT? 2. The current splitting of VMID space is only one way to do this and probably not the best. Maybe we can follow the pinned ASID method used in SVA code. Suggestions welcome here. 3. The detach_pasid_table() interface is not very clear to me as the current Qemu prototype is not using that. This requires fixing from my side. This is based on Jean-Philippe's SVA series[1] and Eric's SMMUv3 dual-stage support series[2]. The branch with the whole vSVA + BTM solution is here, https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva-btm-r… This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool, ./zip_sva_per -k tlb Thanks, Shameer 1. https://github.com/Linaro/linux-kernel-uadk/commits/uacce-devel-5.10 2. https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.auger@redha… Shameer Kolothum (5): vfio: Add a helper to retrieve kvm instance from a dev KVM: Add generic infrastructure to support pinned VMIDs KVM: ARM64: Add support for pinned VMIDs iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM KVM: arm64: Make sure pinned vmid is released on VM exit arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 116 +++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++- drivers/vfio/vfio.c | 12 ++ include/linux/kvm_host.h | 17 +++ include/linux/vfio.h | 1 + virt/kvm/Kconfig | 2 + virt/kvm/kvm_main.c | 25 +++++ 9 files changed, 220 insertions(+), 5 deletions(-) -- 2.17.1

4 12

[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler
by Barry Song 14 Apr '21

14 Apr '21

ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data while each cluster has local L3 tag. On the other hand, each cluster will share some internal system bus. This means cache is much more affine inside one cluster than across clusters. +-----------------------------------+ +---------+ | +------+ +------+ +---------------------------+ | | | CPU0 | | cpu1 | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ cluster | | tag | | | | | CPU2 | | CPU3 | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +----+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | L3 | | data | +-----------------------------------+ | | | +------+ +------+ | +-----------+ | | | | | | | | | | | | | +------+ +------+ +----+ L3 | | | | | | tag | | | | +------+ +------+ | | | | | | | | | | ++ +-----------+ | | | +------+ +------+ |---------------------------+ | +-----------------------------------| | | +-----------------------------------| | | | +------+ +------+ +---------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ | | tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | Through the following small program, you can see the performance impact of running it in one cluster and across two clusters: struct foo { int x; int y; } f; void *thread1_fun(void *param) { int s = 0; for (int i = 0; i < 0xfffffff; i++) s += f.x; } void *thread2_fun(void *param) { int s = 0; for (int i = 0; i < 0xfffffff; i++) f.y++; } int main(int argc, char **argv) { pthread_t tid1, tid2; pthread_create(&tid1, NULL, thread1_fun, NULL); pthread_create(&tid2, NULL, thread2_fun, NULL); pthread_join(tid1, NULL); pthread_join(tid2, NULL); } While running this program in one cluster, it takes: $ time taskset -c 0,1 ./a.out real 0m0.832s user 0m1.649s sys 0m0.004s As a contrast, it takes much more time if we run the same program in two clusters: $ time taskset -c 0,4 ./a.out real 0m1.133s user 0m1.960s sys 0m0.000s 0.832/1.133 = 73%, it is a huge difference. Also, hackbench running on 4 cpus in single one cluster and 4 cpus in different clusters also shows a large contrast: * inside a cluster: root@ubuntu:~# taskset -c 0,1,2,3 hackbench -p -T -l 20000 -g 1 Running in threaded mode with 1 groups using 40 file descriptors each (== 40 tasks) Each sender will pass 20000 messages of 100 bytes Time: 4.285 * across clusters: root@ubuntu:~# taskset -c 0,4,8,12 hackbench -p -T -l 20000 -g 1 Running in threaded mode with 1 groups using 40 file descriptors each (== 40 tasks) Each sender will pass 20000 messages of 100 bytes Time: 5.524 The score is 4.285 vs 5.524, shorter time means better performance. All these testing implies that we should let the Linux scheduler use this topology to make better load balancing and WAKE_AFFINE decisions. However, the current scheduler totally has no idea of clusters. This patchset exposed the cluster topology first, then added the sched domain for cluster. While it is named as "cluster", architectures and machines can define the exact meaning of cluster as long as they have some resources sharing under llc and they can leverage the affinity of this resource to achive better scheduling performance. -v3: - rebased againest 5.11-rc2 - with respect to the comments of Valentin Schneider, Peter Zijlstra, Vincent Guittot and Mel Gorman etc. * moved the scheduler changes from arm64 to the common place for all architectures. * added SD_SHARE_CLS_RESOURCES sd_flags specifying the sched_domain where select_idle_cpu() should begin to scan from * removed redundant select_idle_cluster() function since all code is in select_idle_cpu() now. it also avoided scanning cluster cpus twice in v2 code; * redo the hackbench in one numa after the above changes Valentin suggested that select_idle_cpu() could begin to scan from domain with SD_SHARE_PKG_RESOURCES. Changing like this might be too aggressive and limit the spreading of tasks. Thus, this patch lets the architectures and machines to decide where to start by adding a new SD_SHARE_CLS_RESOURCES. Barry Song (1): scheduler: add scheduler level for clusters Jonathan Cameron (1): topology: Represent clusters of CPUs within a die. Documentation/admin-guide/cputopology.rst | 26 +++++++++++--- arch/arm64/Kconfig | 7 ++++ arch/arm64/kernel/topology.c | 2 ++ drivers/acpi/pptt.c | 60 +++++++++++++++++++++++++++++++ drivers/base/arch_topology.c | 14 ++++++++ drivers/base/topology.c | 10 ++++++ include/linux/acpi.h | 5 +++ include/linux/arch_topology.h | 5 +++ include/linux/sched/sd_flags.h | 9 +++++ include/linux/sched/topology.h | 7 ++++ include/linux/topology.h | 13 +++++++ kernel/sched/fair.c | 27 ++++++++++---- kernel/sched/topology.c | 6 ++++ 13 files changed, 181 insertions(+), 10 deletions(-) -- 2.7.4

7 19

[PATCH 0/2] meson build fixes for hns3
by Lijun Ou 08 Apr '21

08 Apr '21

This series fix meson build for kunpeng920 and kunpeng930 boards. Chengchang Tang (2): config/arm: fix Hisilicon kunpeng920 SoC build config/arm: fix Hisilicon kunpeng930 Soc build config/arm/arm64_kunpeng920_linux_gcc | 19 +++++++++++++++ config/arm/arm64_kunpeng930_linux_gcc | 19 +++++++++++++++ config/arm/meson.build | 27 ++++++++++++++++++++++ .../linux_gsg/cross_build_dpdk_for_arm64.rst | 5 ++++ 4 files changed, 70 insertions(+) create mode 100644 config/arm/arm64_kunpeng920_linux_gcc create mode 100644 config/arm/arm64_kunpeng930_linux_gcc -- 2.7.4

8 46

[PATCH] drivers/perf: Simplify the SMMUv3 PMU event attributes
by Qi Liu 25 Mar '21

25 Mar '21

For each PMU event, there is a SMMU_EVENT_ATTR(xx, XX) and &smmu_event_attr_xx.attr.attr. Let's redefine the SMMU_EVENT_ATTR to simplify the smmu_pmu_events. Signed-off-by: Qi Liu <liuqi115(a)huawei.com> --- drivers/perf/arm_smmuv3_pmu.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 8ff7a67..4c8cb1b 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -509,27 +509,21 @@ static ssize_t smmu_pmu_event_show(struct device *dev, return sprintf(page, "event=0x%02llx\n", pmu_attr->id); } -#define SMMU_EVENT_ATTR(name, config) \ - PMU_EVENT_ATTR(name, smmu_event_attr_##name, \ - config, smmu_pmu_event_show) -SMMU_EVENT_ATTR(cycles, 0); -SMMU_EVENT_ATTR(transaction, 1); -SMMU_EVENT_ATTR(tlb_miss, 2); -SMMU_EVENT_ATTR(config_cache_miss, 3); -SMMU_EVENT_ATTR(trans_table_walk_access, 4); -SMMU_EVENT_ATTR(config_struct_access, 5); -SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6); -SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7); +#define SMMU_EVENT_ATTR(name, config) \ + (&((struct perf_pmu_events_attr) { \ + .attr = __ATTR(name, 0444, smmu_pmu_event_show, NULL), \ + .id = config, \ + }).attr.attr) static struct attribute *smmu_pmu_events[] = { - &smmu_event_attr_cycles.attr.attr, - &smmu_event_attr_transaction.attr.attr, - &smmu_event_attr_tlb_miss.attr.attr, - &smmu_event_attr_config_cache_miss.attr.attr, - &smmu_event_attr_trans_table_walk_access.attr.attr, - &smmu_event_attr_config_struct_access.attr.attr, - &smmu_event_attr_pcie_ats_trans_rq.attr.attr, - &smmu_event_attr_pcie_ats_trans_passed.attr.attr, + SMMU_EVENT_ATTR(cycles, 0), + SMMU_EVENT_ATTR(transaction, 1), + SMMU_EVENT_ATTR(tlb_miss, 2), + SMMU_EVENT_ATTR(config_cache_miss, 3), + SMMU_EVENT_ATTR(trans_table_walk_access, 4), + SMMU_EVENT_ATTR(config_struct_access, 5), + SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6), + SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7), NULL }; -- 2.8.1

2 1

[PATCH v2 0/2] EDAC/ghes: Add EDAC device for reporting the CPU cache error count
by Shiju Jose 17 Mar '21

17 Mar '21

CPU cache corrected errors are detected occasionally on few of our ARM64 hardware boards. Though it is rare, the probability of the CPU cache errors frequently occurring can't be avoided. The earlier failure detection by monitoring the cache corrected errors for the frequent occurrences and taking preventive action could prevent more serious hardware faults. On Intel architectures, cache corrected errors are reported and the affected cores are offlined in the architecture specific method. http://www.mcelog.org/cache.html However for the firmware-first error reporting, specifically on ARM64 architecture, there is no provision present for reporting the cache corrected error count to the user-space and taking preventive action such as offline the affected cores. For this purpose, it was suggested to create the CPU EDAC device for the CPU caches for reporting the cache error count for the firmware-first error reporting. User-space application could monitor the recorded corrected error count for the earlier hardware failure detection and could take preventive action, such as offline the corresponding CPU core/s. Changes: RFC V1 -> RFC V2: 1. Fixed feedback by Boris. 1.1. Added reason of this patch. 1.2. Changed CPU errors to CPU cache errors in the drivers/edac/Kconfig 1.3 Changed EDAC cache list to percpu variables. 1.4 Changed configuration depends on ARM64. 1.5. Moved discovery of cacheinfo to ghes_scan_system(). 2. Changes in the descriptions. Shiju Jose (2): EDAC/ghes: Add EDAC device for reporting the CPU cache errors ACPI / APEI: Add reporting ARM64 CPU cache corrected error count Documentation/ABI/testing/sysfs-devices-edac | 15 ++ drivers/acpi/apei/ghes.c | 76 +++++++- drivers/edac/Kconfig | 12 ++ drivers/edac/ghes_edac.c | 186 +++++++++++++++++++ include/acpi/ghes.h | 27 +++ include/linux/cper.h | 4 + 6 files changed, 316 insertions(+), 4 deletions(-) -- 2.17.1

3 5

[PATCH] Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64
by Barry Song 16 Mar '21

16 Mar '21

BATCHED_UNMAP_TLB_FLUSH is used on x86 to do batched tlb shootdown by sending one IPI to TLB flush all entries after unmapping pages rather than sending an IPI to flush each individual entry. On arm64, tlb shootdown is done by hardware. Flush instructions are innershareable. The local flushes are limited to the boot (1 per CPU) and when a task is getting a new ASID. So marking this feature as "TODO" is not proper. ".." isn't good as well. So this patch adds a "N/A" for this kind of features which are not needed on some architectures. Cc: Mel Gorman <mgorman(a)suse.de> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will(a)kernel.org> Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com> --- Documentation/features/arch-support.txt | 1 + Documentation/features/vm/TLB/arch-support.txt | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/features/arch-support.txt b/Documentation/features/arch-support.txt index d22a1095e661..118ae031840b 100644 --- a/Documentation/features/arch-support.txt +++ b/Documentation/features/arch-support.txt @@ -8,4 +8,5 @@ The meaning of entries in the tables is: | ok | # feature supported by the architecture |TODO| # feature not yet supported by the architecture | .. | # feature cannot be supported by the hardware + | N/A| # feature doesn't apply to the architecture diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt index 30f75a79ce01..0d070f9f98d8 100644 --- a/Documentation/features/vm/TLB/arch-support.txt +++ b/Documentation/features/vm/TLB/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | TODO | + | arm64: | N/A | | c6x: | .. | | csky: | TODO | | h8300: | .. | -- 2.25.1

5 4

Re: [PATCH] lib/logic_pio: Fix overlap check for pio registery
by John Garry 15 Mar '21

15 Mar '21

On 21/12/2020 13:04, Jiahui Cen wrote: >> On 21/12/2020 03:24, Jiahui Cen wrote: >>> Hi John, >>> >>> On 2020/12/18 18:40, John Garry wrote: >>>> On 18/12/2020 06:23, Jiahui Cen wrote: >>>>> Since the [start, end) is a half-open interval, a range with the end equal >>>>> to the start of another range should not be considered as overlapped. >>>>> >>>>> Signed-off-by: Jiahui Cen<cenjiahui(a)huawei.com> >>>>> --- >>>>> lib/logic_pio.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/lib/logic_pio.c b/lib/logic_pio.c >>>>> index f32fe481b492..445d611f1dc1 100644 >>>>> --- a/lib/logic_pio.c >>>>> +++ b/lib/logic_pio.c >>>>> @@ -57,7 +57,7 @@ int logic_pio_register_range(struct logic_pio_hwaddr *new_range) >>>>> new_range->flags == LOGIC_PIO_CPU_MMIO) { >>>>> /* for MMIO ranges we need to check for overlap */ >>>>> if (start >= range->hw_start + range->size || >>>>> - end < range->hw_start) { >>>>> + end <= range->hw_start) { >>>> It looks like your change is correct, but should not really have an impact in practice since: >>>> a: BIOSes generally list ascending IO port CPU addresses >>>> b. there is space between IO port CPU address regions >>>> >>>> Have you seen a problem here? >>>> >>> No serious problem. I found it just when I was working on adding support of >>> pci expander bridge for Arm in QEMU. I found the IO window of some extended >>> root bus could not be registered when I inserted the extended buses' _CRS >>> info into DSDT table in the x86 way, which does not sort the buses. >>> >>> Though root buses should be sorted in QEMU, would it be better to accept >>> those non-ascending IO windows? >>> >> ok, so it seems that you have seen a real problem, and this issue is not just detected by code analysis. >> >>> BTW, for b, it seems to be no space between IO windows of different root buses >>> generated by EDK2. Or maybe I missed something obvious. >> I don't know about that. Anyway, your change looks ok. >> >> Reviewed-by: John Garry<john.garry(a)huawei.com> >> >> BTW, for your virt env, will there be requirement to unregister PCI MMIO ranges? Currently we don't see that in non-virt world. >> > Thanks for your review. > > And currently there is no such a requirement in my virt env. > I am not sure what happened to this patch, but I plan on sending some patches in this area soon - do you want me to include this one? Thanks, John

2 2

[PATCH for-next] RDMA/hns: Use new SQ doorbell register for HIP09
by Weihang Li 11 Mar '21

11 Mar '21

From: Lang Cheng <chenglang(a)huawei.com> HIP09 uses new address space to map SQ doorbell registers, the doorbell of each QP is isolated based on the size of 64KB, which can improve the performance in concurrency scenarios. Signed-off-by: Lang Cheng <chenglang(a)huawei.com> Signed-off-by: Weihang Li <liweihang(a)huawei.com> --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 3 +- drivers/infiniband/hw/hns/hns_roce_qp.c | 49 ++++++++++++++++-------------- 2 files changed, 28 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index c3934ab..18394e5 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -681,8 +681,7 @@ static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp, roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S, qp->sq.head); - hns_roce_write512(hr_dev, wqe, hr_dev->mem_base + - HNS_ROCE_DWQE_SIZE * qp->ibqp.qp_num); + hns_roce_write512(hr_dev, wqe, qp->sq.db_reg_l); } static int hns_roce_v2_post_send(struct ib_qp *ibqp, diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index 8af411f..2318e3e 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -840,9 +840,14 @@ static int alloc_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp, resp->cap_flags |= HNS_ROCE_QP_CAP_RQ_RECORD_DB; } } else { - /* QP doorbell register address */ - hr_qp->sq.db_reg_l = hr_dev->reg_base + hr_dev->sdb_offset + - DB_REG_OFFSET * hr_dev->priv_uar.index; + if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP09) + hr_qp->sq.db_reg_l = hr_dev->mem_base + + HNS_ROCE_DWQE_SIZE * hr_qp->qpn; + else + hr_qp->sq.db_reg_l = + hr_dev->reg_base + hr_dev->sdb_offset + + DB_REG_OFFSET * hr_dev->priv_uar.index; + hr_qp->rq.db_reg_l = hr_dev->reg_base + hr_dev->odb_offset + DB_REG_OFFSET * hr_dev->priv_uar.index; @@ -1011,36 +1016,36 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, } } - ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp); - if (ret) { - ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n", - ret); - goto err_wrid; - } - ret = alloc_qp_buf(hr_dev, hr_qp, init_attr, udata, ucmd.buf_addr); if (ret) { ibdev_err(ibdev, "failed to alloc QP buffer, ret = %d.\n", ret); - goto err_db; + goto err_buf; } ret = alloc_qpn(hr_dev, hr_qp); if (ret) { ibdev_err(ibdev, "failed to alloc QPN, ret = %d.\n", ret); - goto err_buf; + goto err_qpn; + } + + ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp); + if (ret) { + ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n", + ret); + goto err_db; } ret = alloc_qpc(hr_dev, hr_qp); if (ret) { ibdev_err(ibdev, "failed to alloc QP context, ret = %d.\n", ret); - goto err_qpn; + goto err_qpc; } ret = hns_roce_qp_store(hr_dev, hr_qp, init_attr); if (ret) { ibdev_err(ibdev, "failed to store QP, ret = %d.\n", ret); - goto err_qpc; + goto err_store; } if (udata) { @@ -1055,7 +1060,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL) { ret = hr_dev->hw->qp_flow_control_init(hr_dev, hr_qp); if (ret) - goto err_store; + goto err_flow_ctrl; } hr_qp->ibqp.qp_num = hr_qp->qpn; @@ -1065,17 +1070,17 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, return 0; -err_store: +err_flow_ctrl: hns_roce_qp_remove(hr_dev, hr_qp); -err_qpc: +err_store: free_qpc(hr_dev, hr_qp); -err_qpn: +err_qpc: + free_qp_db(hr_dev, hr_qp, udata); +err_db: free_qpn(hr_dev, hr_qp); -err_buf: +err_qpn: free_qp_buf(hr_dev, hr_qp); -err_db: - free_qp_db(hr_dev, hr_qp, udata); -err_wrid: +err_buf: free_kernel_wrid(hr_qp); return ret; } -- 2.8.1

2 1

【Some Questions About Multi-Process Resource Cleaning】
by oulijun 08 Mar '21

08 Mar '21

Hi, Thomas Monjalon&Ferruh Yigit and others I'm analyzing multiprocess with eal. I have some questions I'd like to ask you. Firstly, After the rte_eal_init() command is executed, the master and slave processes are started successfully. and traffic is continuously sent using the tester.If you run the kill -9 command to stop the slave process, restart the re-process, and start packet receiving and sending, how to ensure that the eal resource of the slave process is cleaned up? Second, how to invoke the remove function to clear probe resources of the slave process after the slave process exits? Finally, I found out why the rte_eal_cleanup call was not unregistered mp action after the process exited. I look forward to your response. Thanks Lijun Ou

4 6