[PATCH v2] PCI/DPC: Check host->native_dpc before enable dpc service
by Yicong Yang
Per Downstream Port Containment Related Enhancements ECN[1]
Table 4-6, Interpretation of _OSC Control Field Returned Value,
for bit 7 of _OSC control return value:
"Firmware sets this bit to 1 to grant the OS control over PCI Express
Downstream Port Containment configuration."
"If control of this feature was requested and denied,
or was not requested, the firmware returns this bit set to 0."
We store bit 7 of _OSC control return value in host->native_dpc,
check it before enable the dpc service as the firmware may not
grant the control.
[1] Downstream Port Containment Related Enhancements ECN,
Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888
Signed-off-by: Yicong Yang <yangyicong(a)hisilicon.com>
---
Change since v1:
- use correct reference for _OSC control return value
drivers/pci/pcie/portdrv_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed664..7445d03 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -253,7 +253,8 @@ static int get_port_device_capability(struct pci_dev *dev)
*/
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
pci_aer_available() &&
- (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+ (pcie_ports_dpc_native ||
+ ((services & PCIE_PORT_SERVICE_AER) && host->native_dpc)))
services |= PCIE_PORT_SERVICE_DPC;
if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
--
2.8.1
1 year, 6 months
[RFC PATCH 0/5] KVM/ARM64 Add support for pinned VMIDs
by Shameer Kolothum
On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature as part of the Distributed
Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
received by SMMUv3. This is very useful when the SMMUv3 shares the
page tables with the CPU(eg: Guest SVA use case). For this to work,
the SMMU must use the same VMID that is allocated by KVM to configure
the stage 2 translations. At present KVM VMID allocations are recycled
on rollover and may change as a result. This will create issues if we
have to share the KVM VMID with SMMU.
Please see the discussion here,
https://lore.kernel.org/linux-iommu/20200522101755.GA3453945@myrica/
This series proposes a way to share the VMID between KVM and IOMMU
driver by,
1. Splitting the KVM VMID space into two equal halves based on the
command line option "kvm-arm.pinned_vmid_enable".
2. First half of the VMID space follows the normal recycle on rollover
policy.
3. Second half of the VMID space doesn't roll over and is used to
allocate pinned VMIDs.
4. Provides helper function to retrieve the KVM instance associated
with a device(if it is part of a vfio group).
5. Introduces generic interfaces to get/put pinned KVM VMIDs.
Open Items:
1. I couldn't figure out a way to determine whether a platform actually
fully supports DVM/BTM or not. Not sure we can take a call based on
SMMUv3 BTM feature bit alone. Probably we can get it from firmware
via IORT?
2. The current splitting of VMID space is only one way to do this and
probably not the best. Maybe we can follow the pinned ASID method used
in SVA code. Suggestions welcome here.
3. The detach_pasid_table() interface is not very clear to me as the current
Qemu prototype is not using that. This requires fixing from my side.
This is based on Jean-Philippe's SVA series[1] and Eric's SMMUv3 dual-stage
support series[2].
The branch with the whole vSVA + BTM solution is here,
https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva-btm...
This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool,
./zip_sva_per -k tlb
Thanks,
Shameer
1. https://github.com/Linaro/linux-kernel-uadk/commits/uacce-devel-5.10
2. https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.auger@red...
Shameer Kolothum (5):
vfio: Add a helper to retrieve kvm instance from a dev
KVM: Add generic infrastructure to support pinned VMIDs
KVM: ARM64: Add support for pinned VMIDs
iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
KVM: arm64: Make sure pinned vmid is released on VM exit
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c | 116 +++++++++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++-
drivers/vfio/vfio.c | 12 ++
include/linux/kvm_host.h | 17 +++
include/linux/vfio.h | 1 +
virt/kvm/Kconfig | 2 +
virt/kvm/kvm_main.c | 25 +++++
9 files changed, 220 insertions(+), 5 deletions(-)
--
2.17.1
1 year, 6 months
[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler
by Barry Song
ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
cluster has 4 cpus. All clusters share L3 cache data while each cluster
has local L3 tag. On the other hand, each cluster will share some
internal system bus. This means cache is much more affine inside one cluster
than across clusters.
+-----------------------------------+ +---------+
| +------+ +------+ +---------------------------+ |
| | CPU0 | | cpu1 | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ cluster | | tag | | |
| | CPU2 | | CPU3 | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| | | L3 | | |
| +------+ +------+ +----+ tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | L3 |
| data |
+-----------------------------------+ | |
| +------+ +------+ | +-----------+ | |
| | | | | | | | | |
| +------+ +------+ +----+ L3 | | |
| | | tag | | |
| +------+ +------+ | | | | |
| | | | | ++ +-----------+ | |
| +------+ +------+ |---------------------------+ |
+-----------------------------------| | |
+-----------------------------------| | |
| +------+ +------+ +---------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
| +----+ L3 | | |
| +------+ +------+ | | tag | | |
| | | | | | | | | |
| +------+ +------+ | +-----------+ | |
| | | |
+-----------------------------------+ | |
+-----------------------------------+ | |
| +------+ +------+ +--------------------------+ |
| | | | | | +-----------+ | |
| +------+ +------+ | | | | |
Through the following small program, you can see the performance impact of
running it in one cluster and across two clusters:
struct foo {
int x;
int y;
} f;
void *thread1_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
s += f.x;
}
void *thread2_fun(void *param)
{
int s = 0;
for (int i = 0; i < 0xfffffff; i++)
f.y++;
}
int main(int argc, char **argv)
{
pthread_t tid1, tid2;
pthread_create(&tid1, NULL, thread1_fun, NULL);
pthread_create(&tid2, NULL, thread2_fun, NULL);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
}
While running this program in one cluster, it takes:
$ time taskset -c 0,1 ./a.out
real 0m0.832s
user 0m1.649s
sys 0m0.004s
As a contrast, it takes much more time if we run the same program
in two clusters:
$ time taskset -c 0,4 ./a.out
real 0m1.133s
user 0m1.960s
sys 0m0.000s
0.832/1.133 = 73%, it is a huge difference.
Also, hackbench running on 4 cpus in single one cluster and 4 cpus in
different clusters also shows a large contrast:
* inside a cluster:
root@ubuntu:~# taskset -c 0,1,2,3 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 4.285
* across clusters:
root@ubuntu:~# taskset -c 0,4,8,12 hackbench -p -T -l 20000 -g 1
Running in threaded mode with 1 groups using 40 file descriptors each
(== 40 tasks)
Each sender will pass 20000 messages of 100 bytes
Time: 5.524
The score is 4.285 vs 5.524, shorter time means better performance.
All these testing implies that we should let the Linux scheduler use
this topology to make better load balancing and WAKE_AFFINE decisions.
However, the current scheduler totally has no idea of clusters.
This patchset exposed the cluster topology first, then added the sched
domain for cluster. While it is named as "cluster", architectures and
machines can define the exact meaning of cluster as long as they have
some resources sharing under llc and they can leverage the affinity
of this resource to achive better scheduling performance.
-v3:
- rebased againest 5.11-rc2
- with respect to the comments of Valentin Schneider, Peter Zijlstra,
Vincent Guittot and Mel Gorman etc.
* moved the scheduler changes from arm64 to the common place for all
architectures.
* added SD_SHARE_CLS_RESOURCES sd_flags specifying the sched_domain
where select_idle_cpu() should begin to scan from
* removed redundant select_idle_cluster() function since all code is
in select_idle_cpu() now. it also avoided scanning cluster cpus
twice in v2 code;
* redo the hackbench in one numa after the above changes
Valentin suggested that select_idle_cpu() could begin to scan from
domain with SD_SHARE_PKG_RESOURCES. Changing like this might be too
aggressive and limit the spreading of tasks. Thus, this patch lets
the architectures and machines to decide where to start by adding
a new SD_SHARE_CLS_RESOURCES.
Barry Song (1):
scheduler: add scheduler level for clusters
Jonathan Cameron (1):
topology: Represent clusters of CPUs within a die.
Documentation/admin-guide/cputopology.rst | 26 +++++++++++---
arch/arm64/Kconfig | 7 ++++
arch/arm64/kernel/topology.c | 2 ++
drivers/acpi/pptt.c | 60 +++++++++++++++++++++++++++++++
drivers/base/arch_topology.c | 14 ++++++++
drivers/base/topology.c | 10 ++++++
include/linux/acpi.h | 5 +++
include/linux/arch_topology.h | 5 +++
include/linux/sched/sd_flags.h | 9 +++++
include/linux/sched/topology.h | 7 ++++
include/linux/topology.h | 13 +++++++
kernel/sched/fair.c | 27 ++++++++++----
kernel/sched/topology.c | 6 ++++
13 files changed, 181 insertions(+), 10 deletions(-)
--
2.7.4
1 year, 9 months
[PATCH 0/2] meson build fixes for hns3
by Lijun Ou
This series fix meson build for kunpeng920 and
kunpeng930 boards.
Chengchang Tang (2):
config/arm: fix Hisilicon kunpeng920 SoC build
config/arm: fix Hisilicon kunpeng930 Soc build
config/arm/arm64_kunpeng920_linux_gcc | 19 +++++++++++++++
config/arm/arm64_kunpeng930_linux_gcc | 19 +++++++++++++++
config/arm/meson.build | 27 ++++++++++++++++++++++
.../linux_gsg/cross_build_dpdk_for_arm64.rst | 5 ++++
4 files changed, 70 insertions(+)
create mode 100644 config/arm/arm64_kunpeng920_linux_gcc
create mode 100644 config/arm/arm64_kunpeng930_linux_gcc
--
2.7.4
1 year, 9 months
[PATCH] drivers/perf: Simplify the SMMUv3 PMU event attributes
by Qi Liu
For each PMU event, there is a SMMU_EVENT_ATTR(xx, XX) and
&smmu_event_attr_xx.attr.attr. Let's redefine the SMMU_EVENT_ATTR
to simplify the smmu_pmu_events.
Signed-off-by: Qi Liu <liuqi115(a)huawei.com>
---
drivers/perf/arm_smmuv3_pmu.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 8ff7a67..4c8cb1b 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -509,27 +509,21 @@ static ssize_t smmu_pmu_event_show(struct device *dev,
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}
-#define SMMU_EVENT_ATTR(name, config) \
- PMU_EVENT_ATTR(name, smmu_event_attr_##name, \
- config, smmu_pmu_event_show)
-SMMU_EVENT_ATTR(cycles, 0);
-SMMU_EVENT_ATTR(transaction, 1);
-SMMU_EVENT_ATTR(tlb_miss, 2);
-SMMU_EVENT_ATTR(config_cache_miss, 3);
-SMMU_EVENT_ATTR(trans_table_walk_access, 4);
-SMMU_EVENT_ATTR(config_struct_access, 5);
-SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6);
-SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7);
+#define SMMU_EVENT_ATTR(name, config) \
+ (&((struct perf_pmu_events_attr) { \
+ .attr = __ATTR(name, 0444, smmu_pmu_event_show, NULL), \
+ .id = config, \
+ }).attr.attr)
static struct attribute *smmu_pmu_events[] = {
- &smmu_event_attr_cycles.attr.attr,
- &smmu_event_attr_transaction.attr.attr,
- &smmu_event_attr_tlb_miss.attr.attr,
- &smmu_event_attr_config_cache_miss.attr.attr,
- &smmu_event_attr_trans_table_walk_access.attr.attr,
- &smmu_event_attr_config_struct_access.attr.attr,
- &smmu_event_attr_pcie_ats_trans_rq.attr.attr,
- &smmu_event_attr_pcie_ats_trans_passed.attr.attr,
+ SMMU_EVENT_ATTR(cycles, 0),
+ SMMU_EVENT_ATTR(transaction, 1),
+ SMMU_EVENT_ATTR(tlb_miss, 2),
+ SMMU_EVENT_ATTR(config_cache_miss, 3),
+ SMMU_EVENT_ATTR(trans_table_walk_access, 4),
+ SMMU_EVENT_ATTR(config_struct_access, 5),
+ SMMU_EVENT_ATTR(pcie_ats_trans_rq, 6),
+ SMMU_EVENT_ATTR(pcie_ats_trans_passed, 7),
NULL
};
--
2.8.1
1 year, 10 months
[PATCH 1/2] config/arm: fix Hisilicon kunpeng920 SoC build
by Lijun Ou
From: Chengchang Tang <tangchengchang(a)huawei.com>
Because of the '9ca2f16' have merged, the current hns3
pmd driver can not be directly complied on the kunpeng920
server board. Therefore, we need to fix the meson build.
Besides, add kunpeng 920 SoC meson cross compile target.
Fixes: 9ca2f16faa7f ("config/arm: isolate generic build")
Signed-off-by: Chengchang Tang <tangchengchang(a)huawei.com>
Signed-off-by: Lijun Ou <oulijun(a)huawei.com>
---
config/arm/arm64_kunpeng920_linux_gcc | 19 +++++++++++++++++++
config/arm/meson.build | 20 ++++++++++++++++++++
doc/guides/linux_gsg/cross_build_dpdk_for_arm64.rst | 4 ++++
3 files changed, 43 insertions(+)
create mode 100644 config/arm/arm64_kunpeng920_linux_gcc
diff --git a/config/arm/arm64_kunpeng920_linux_gcc b/config/arm/arm64_kunpeng920_linux_gcc
new file mode 100644
index 0000000..3eeb2e9
--- /dev/null
+++ b/config/arm/arm64_kunpeng920_linux_gcc
@@ -0,0 +1,19 @@
+[binaries]
+c = 'aarch64-linux-gnu-gcc'
+cpp = 'aarch64-linux-gnu-cpp'
+ar = 'aarch64-linux-gnu-gcc-ar'
+strip = 'aarch64-linux-gnu-strip'
+pkgconfig = 'aarch64-linux-gnu-pkg-config'
+pcap-config = ''
+
+[host_machine]
+system = 'linux'
+cpu_family = 'aarch64'
+cpu = 'armv8-a'
+endian = 'little'
+
+[properties]
+implementer_id = '0x48'
+part_number = '0xd01'
+max_lcores = 128
+max_numa_nodes = 4
diff --git a/config/arm/meson.build b/config/arm/meson.build
index f948768..9b87f5a 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -133,6 +133,25 @@ implementer_cavium = {
}
}
+implementer_hisilicon = {
+ 'description': 'Hisilicon',
+ 'flags': [
+ ['RTE_USE_C11_MEM_MODEL', true],
+ ['RTE_CACHE_LINE_SIZE', 128],
+ ['RTE_MAX_NUMA_NODES', 4]
+ ],
+ 'part_number_config': {
+ '0xd01': {
+ 'machine_args': ['-march=armv8.2-a+crypto',
+ '-mtune=tsv110'],
+ 'flag': [['RTE_MACHINE', '"kunpeng920"'],
+ ['RTE_MAX_LCORE', 128],
+ ['RTE_ARM_FEATURE_ATOMICS', true]
+ ]
+ }
+ }
+}
+
implementer_ampere = {
'description': 'Ampere Computing',
'flags': [
@@ -176,6 +195,7 @@ implementers = {
'generic': implementer_generic,
'0x41': implementer_arm,
'0x43': implementer_cavium,
+ '0x48': implementer_hisilicon,
'0x50': implementer_ampere,
'0x56': implementer_marvell,
'dpaa': implementer_dpaa
diff --git a/doc/guides/linux_gsg/cross_build_dpdk_for_arm64.rst b/doc/guides/linux_gsg/cross_build_dpdk_for_arm64.rst
index faaf24b..afe4f8e 100644
--- a/doc/guides/linux_gsg/cross_build_dpdk_for_arm64.rst
+++ b/doc/guides/linux_gsg/cross_build_dpdk_for_arm64.rst
@@ -197,6 +197,7 @@ you may use various combinations of implementer/part number::
'generic': Generic armv8
'0x41': Arm
'0x43': Cavium
+ '0x48': Hisilicon
'0x50': Ampere Computing
'0x56': Marvell ARMADA
'dpaa': NXP DPAA
@@ -219,6 +220,9 @@ you may use various combinations of implementer/part number::
'0xaf': thunderx2t99
'0xb2': octeontx2
+ Supported part_numbers for 0x48:
+ '0xd01': kunpeng920
+
Supported part_numbers for 0x50:
'0x0': emag
--
2.7.4
1 year, 10 months
[PATCH v2 0/2] EDAC/ghes: Add EDAC device for reporting the CPU cache error count
by Shiju Jose
CPU cache corrected errors are detected occasionally on
few of our ARM64 hardware boards. Though it is rare, the
probability of the CPU cache errors frequently occurring
can't be avoided. The earlier failure detection by monitoring
the cache corrected errors for the frequent occurrences and
taking preventive action could prevent more serious hardware
faults.
On Intel architectures, cache corrected errors are reported and
the affected cores are offlined in the architecture specific method.
http://www.mcelog.org/cache.html
However for the firmware-first error reporting, specifically on
ARM64 architecture, there is no provision present for reporting
the cache corrected error count to the user-space and taking
preventive action such as offline the affected cores.
For this purpose, it was suggested to create the CPU EDAC
device for the CPU caches for reporting the cache error count
for the firmware-first error reporting.
User-space application could monitor the recorded corrected error
count for the earlier hardware failure detection and could take
preventive action, such as offline the corresponding CPU core/s.
Changes:
RFC V1 -> RFC V2:
1. Fixed feedback by Boris.
1.1. Added reason of this patch.
1.2. Changed CPU errors to CPU cache errors in the drivers/edac/Kconfig
1.3 Changed EDAC cache list to percpu variables.
1.4 Changed configuration depends on ARM64.
1.5. Moved discovery of cacheinfo to ghes_scan_system().
2. Changes in the descriptions.
Shiju Jose (2):
EDAC/ghes: Add EDAC device for reporting the CPU cache errors
ACPI / APEI: Add reporting ARM64 CPU cache corrected error count
Documentation/ABI/testing/sysfs-devices-edac | 15 ++
drivers/acpi/apei/ghes.c | 76 +++++++-
drivers/edac/Kconfig | 12 ++
drivers/edac/ghes_edac.c | 186 +++++++++++++++++++
include/acpi/ghes.h | 27 +++
include/linux/cper.h | 4 +
6 files changed, 316 insertions(+), 4 deletions(-)
--
2.17.1
1 year, 10 months
[PATCH] Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64
by Barry Song
BATCHED_UNMAP_TLB_FLUSH is used on x86 to do batched tlb shootdown by
sending one IPI to TLB flush all entries after unmapping pages rather
than sending an IPI to flush each individual entry.
On arm64, tlb shootdown is done by hardware. Flush instructions are
innershareable. The local flushes are limited to the boot (1 per CPU)
and when a task is getting a new ASID.
So marking this feature as "TODO" is not proper. ".." isn't good as
well. So this patch adds a "N/A" for this kind of features which are
not needed on some architectures.
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com>
---
Documentation/features/arch-support.txt | 1 +
Documentation/features/vm/TLB/arch-support.txt | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/features/arch-support.txt b/Documentation/features/arch-support.txt
index d22a1095e661..118ae031840b 100644
--- a/Documentation/features/arch-support.txt
+++ b/Documentation/features/arch-support.txt
@@ -8,4 +8,5 @@ The meaning of entries in the tables is:
| ok | # feature supported by the architecture
|TODO| # feature not yet supported by the architecture
| .. | # feature cannot be supported by the hardware
+ | N/A| # feature doesn't apply to the architecture
diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt
index 30f75a79ce01..0d070f9f98d8 100644
--- a/Documentation/features/vm/TLB/arch-support.txt
+++ b/Documentation/features/vm/TLB/arch-support.txt
@@ -9,7 +9,7 @@
| alpha: | TODO |
| arc: | TODO |
| arm: | TODO |
- | arm64: | TODO |
+ | arm64: | N/A |
| c6x: | .. |
| csky: | TODO |
| h8300: | .. |
--
2.25.1
1 year, 10 months
Re: [PATCH] lib/logic_pio: Fix overlap check for pio registery
by John Garry
On 21/12/2020 13:04, Jiahui Cen wrote:
>> On 21/12/2020 03:24, Jiahui Cen wrote:
>>> Hi John,
>>>
>>> On 2020/12/18 18:40, John Garry wrote:
>>>> On 18/12/2020 06:23, Jiahui Cen wrote:
>>>>> Since the [start, end) is a half-open interval, a range with the end equal
>>>>> to the start of another range should not be considered as overlapped.
>>>>>
>>>>> Signed-off-by: Jiahui Cen<cenjiahui(a)huawei.com>
>>>>> ---
>>>>> lib/logic_pio.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/logic_pio.c b/lib/logic_pio.c
>>>>> index f32fe481b492..445d611f1dc1 100644
>>>>> --- a/lib/logic_pio.c
>>>>> +++ b/lib/logic_pio.c
>>>>> @@ -57,7 +57,7 @@ int logic_pio_register_range(struct logic_pio_hwaddr *new_range)
>>>>> new_range->flags == LOGIC_PIO_CPU_MMIO) {
>>>>> /* for MMIO ranges we need to check for overlap */
>>>>> if (start >= range->hw_start + range->size ||
>>>>> - end < range->hw_start) {
>>>>> + end <= range->hw_start) {
>>>> It looks like your change is correct, but should not really have an impact in practice since:
>>>> a: BIOSes generally list ascending IO port CPU addresses
>>>> b. there is space between IO port CPU address regions
>>>>
>>>> Have you seen a problem here?
>>>>
>>> No serious problem. I found it just when I was working on adding support of
>>> pci expander bridge for Arm in QEMU. I found the IO window of some extended
>>> root bus could not be registered when I inserted the extended buses' _CRS
>>> info into DSDT table in the x86 way, which does not sort the buses.
>>>
>>> Though root buses should be sorted in QEMU, would it be better to accept
>>> those non-ascending IO windows?
>>>
>> ok, so it seems that you have seen a real problem, and this issue is not just detected by code analysis.
>>
>>> BTW, for b, it seems to be no space between IO windows of different root buses
>>> generated by EDK2. Or maybe I missed something obvious.
>> I don't know about that. Anyway, your change looks ok.
>>
>> Reviewed-by: John Garry<john.garry(a)huawei.com>
>>
>> BTW, for your virt env, will there be requirement to unregister PCI MMIO ranges? Currently we don't see that in non-virt world.
>>
> Thanks for your review.
>
> And currently there is no such a requirement in my virt env.
>
I am not sure what happened to this patch, but I plan on sending some
patches in this area soon - do you want me to include this one?
Thanks,
John
1 year, 10 months
[PATCH for-next] RDMA/hns: Use new SQ doorbell register for HIP09
by Weihang Li
From: Lang Cheng <chenglang(a)huawei.com>
HIP09 uses new address space to map SQ doorbell registers, the doorbell of
each QP is isolated based on the size of 64KB, which can improve the
performance in concurrency scenarios.
Signed-off-by: Lang Cheng <chenglang(a)huawei.com>
Signed-off-by: Weihang Li <liweihang(a)huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 3 +-
drivers/infiniband/hw/hns/hns_roce_qp.c | 49 ++++++++++++++++--------------
2 files changed, 28 insertions(+), 24 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index c3934ab..18394e5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -681,8 +681,7 @@ static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp,
roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_M,
V2_RC_SEND_WQE_BYTE_4_WQE_INDEX_S, qp->sq.head);
- hns_roce_write512(hr_dev, wqe, hr_dev->mem_base +
- HNS_ROCE_DWQE_SIZE * qp->ibqp.qp_num);
+ hns_roce_write512(hr_dev, wqe, qp->sq.db_reg_l);
}
static int hns_roce_v2_post_send(struct ib_qp *ibqp,
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 8af411f..2318e3e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -840,9 +840,14 @@ static int alloc_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
resp->cap_flags |= HNS_ROCE_QP_CAP_RQ_RECORD_DB;
}
} else {
- /* QP doorbell register address */
- hr_qp->sq.db_reg_l = hr_dev->reg_base + hr_dev->sdb_offset +
- DB_REG_OFFSET * hr_dev->priv_uar.index;
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP09)
+ hr_qp->sq.db_reg_l = hr_dev->mem_base +
+ HNS_ROCE_DWQE_SIZE * hr_qp->qpn;
+ else
+ hr_qp->sq.db_reg_l =
+ hr_dev->reg_base + hr_dev->sdb_offset +
+ DB_REG_OFFSET * hr_dev->priv_uar.index;
+
hr_qp->rq.db_reg_l = hr_dev->reg_base + hr_dev->odb_offset +
DB_REG_OFFSET * hr_dev->priv_uar.index;
@@ -1011,36 +1016,36 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
}
}
- ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp);
- if (ret) {
- ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n",
- ret);
- goto err_wrid;
- }
-
ret = alloc_qp_buf(hr_dev, hr_qp, init_attr, udata, ucmd.buf_addr);
if (ret) {
ibdev_err(ibdev, "failed to alloc QP buffer, ret = %d.\n", ret);
- goto err_db;
+ goto err_buf;
}
ret = alloc_qpn(hr_dev, hr_qp);
if (ret) {
ibdev_err(ibdev, "failed to alloc QPN, ret = %d.\n", ret);
- goto err_buf;
+ goto err_qpn;
+ }
+
+ ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp);
+ if (ret) {
+ ibdev_err(ibdev, "failed to alloc QP doorbell, ret = %d.\n",
+ ret);
+ goto err_db;
}
ret = alloc_qpc(hr_dev, hr_qp);
if (ret) {
ibdev_err(ibdev, "failed to alloc QP context, ret = %d.\n",
ret);
- goto err_qpn;
+ goto err_qpc;
}
ret = hns_roce_qp_store(hr_dev, hr_qp, init_attr);
if (ret) {
ibdev_err(ibdev, "failed to store QP, ret = %d.\n", ret);
- goto err_qpc;
+ goto err_store;
}
if (udata) {
@@ -1055,7 +1060,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL) {
ret = hr_dev->hw->qp_flow_control_init(hr_dev, hr_qp);
if (ret)
- goto err_store;
+ goto err_flow_ctrl;
}
hr_qp->ibqp.qp_num = hr_qp->qpn;
@@ -1065,17 +1070,17 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
return 0;
-err_store:
+err_flow_ctrl:
hns_roce_qp_remove(hr_dev, hr_qp);
-err_qpc:
+err_store:
free_qpc(hr_dev, hr_qp);
-err_qpn:
+err_qpc:
+ free_qp_db(hr_dev, hr_qp, udata);
+err_db:
free_qpn(hr_dev, hr_qp);
-err_buf:
+err_qpn:
free_qp_buf(hr_dev, hr_qp);
-err_db:
- free_qp_db(hr_dev, hr_qp, udata);
-err_wrid:
+err_buf:
free_kernel_wrid(hr_qp);
return ret;
}
--
2.8.1
1 year, 10 months