-----Original Message----- From: Auger Eric [mailto:eric.auger@redhat.com] Sent: 21 February 2021 18:21 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org; joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com; alex.williamson@redhat.com Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org; zhangfei.gao@gmail.com; vivek.gautam@arm.com; jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com; nicoleotsuka@gmail.com; yuzenghui yuzenghui@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; linuxarm@openeuler.org Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Shameer, On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
Hi Eric,
-----Original Message----- From: Eric Auger [mailto:eric.auger@redhat.com] Sent: 18 November 2020 11:22 To: eric.auger.pro@gmail.com; eric.auger@redhat.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org; joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com; alex.williamson@redhat.com Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org; zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com; jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com; nicoleotsuka@gmail.com; yuzenghui yuzenghui@huawei.com Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
This series brings the IOMMU part of HW nested paging support in the SMMUv3. The VFIO part is submitted separately.
The IOMMU API is extended to support 2 new API functionalities:
- pass the guest stage 1 configuration
- pass stage 1 MSI bindings
Then those capabilities gets implemented in the SMMUv3 driver.
The virtualizer passes information through the VFIO user API which cascades them to the iommu subsystem. This allows the guest to own stage 1 tables and context descriptors (so-called PASID table) while the host owns stage 2 tables and main configuration structures (STE).
I am seeing an issue with Guest testpmd run with this series. I have two different setups and testpmd works fine with the first one but not with the second.
1). Guest doesn't have kernel driver built-in for pass-through dev.
root@ubuntu:/# lspci -v ... 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
Subsystem: Huawei Technologies Co., Ltd. Device 0000 Flags: fast devsel Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K] Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M] Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [a0] MSI-X: Enable- Count=67 Masked- Capabilities: [b0] Power Management version 3 Capabilities: [100] Access Control Services Capabilities: [300] Transaction Processing Hints
root@ubuntu:/# echo vfio-pci >
/sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket EAL: Selected IOVA mode 'VA' EAL: No available hugepages reported in hugepages-32768kB EAL: No available hugepages reported in hugepages-64kB EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: Invalid NUMA socket, default to 0 EAL: using IOMMU type 1 (Type 1) EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
EAL: No legacy callbacks, legacy socket not created Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port
will pair with itself.
Configuring Port 0 (socket 0) Port 0: 8E:A6:8C:43:43:45 Checking link statuses... Done testpmd>
2). Guest have kernel driver built-in for pass-through dev.
root@ubuntu:/# lspci -v ... 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
Subsystem: Huawei Technologies Co., Ltd. Device 0000 Flags: bus master, fast devsel, latency 0 Memory at 8000100000 (64-bit, prefetchable) [size=64K] Memory at 8000000000 (64-bit, prefetchable) [size=1M] Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [a0] MSI-X: Enable+ Count=67 Masked- Capabilities: [b0] Power Management version 3 Capabilities: [100] Access Control Services Capabilities: [300] Transaction Processing Hints Kernel driver in use: hns3
root@ubuntu:/# echo vfio-pci >
/sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket EAL: Selected IOVA mode 'VA' EAL: No available hugepages reported in hugepages-32768kB EAL: No available hugepages reported in hugepages-64kB EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: Invalid NUMA socket, default to 0 EAL: using IOMMU type 1 (Type 1) EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0)
lost(1) from PF in_irq:0
hns3vf_get_queue_info(): Failed to get tqp info from PF: -62 hns3vf_init_vf(): Failed to fetch configuration: -62 hns3vf_dev_init(): Failed to init vf: -62 EAL: Releasing pci mapped resource for 0000:00:02.0 EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000 EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000 EAL: Requested device 0000:00:02.0 cannot be used EAL: Bus (pci) probe failed. EAL: No legacy callbacks, legacy socket not created testpmd: No probed ethernet devices Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc Done testpmd>
And in this case, smmu(host) reports a translation fault,
[ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received: [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010 [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040 [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
Tested with Intel 82599 card(ixgbevf) as well. but same errror.
So this should be fixed in the next release. The problem came from the fact the MSI giova was not duly unregistered. When vfio is not in used on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40 is the ITS translater offset ;-) - When passthrough is in use, the iova is allocated @0x8000000. As fffef000 MSI giova was not properly unregistered, the host kernel used it - despite it has been unmapped by the guest kernel -, hence the translation fault. So the fix is to unregister the MSI in the VFIO QEMU code when msix are disabled. So to me this is a QEMU integration issue.
Super!. I was focusing on the TLBI side and was slightly worried it is somehow related our specific hardware. That’s a relief :).
Thanks, Shameer