On Thu, 13 May 2021 17:52:56 +0000 Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com wrote:
Hi Alex,
-----Original Message----- From: Alex Williamson [mailto:alex.williamson@redhat.com] Sent: 13 May 2021 18:04 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: Jason Gunthorpe jgg@nvidia.com; liulongfang liulongfang@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: [Linuxarm] Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
On Thu, 13 May 2021 15:49:25 +0000 Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com wrote:
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 13 May 2021 14:44 To: liulongfang liulongfang@huawei.com Cc: Alex Williamson alex.williamson@redhat.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: [Linuxarm] Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
On Thu, May 13, 2021 at 10:08:28AM +0800, liulongfang wrote:
On 2021/5/12 20:10, Jason Gunthorpe wrote:
On Wed, May 12, 2021 at 04:39:43PM +0800, liulongfang wrote:
> Therefore, this method of limiting the length of the BAR > configuration space can prevent unsafe operations of the memory.
The issue is DMA controlled by the guest accessing the secure BAR area, not the guest CPU.
Jason .
This secure BAR area is not presented to the Guest, which makes it impossible for the Guest to obtain the secure BAR area when establishing the DMA mapping of the configuration space. If the DMA controller accesses the secure BAR area, the access will be blocked by the SMMU.
There are scenarios where this is not true.
At a minimum the mdev driver should refuse to work in those cases.
Hi,
I think the idea here is not a generic solution, but a quirk for this specific dev.
Something like,
--- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -866,7 +866,12 @@ static long vfio_pci_ioctl(struct vfio_device
*core_vdev,
break; case VFIO_PCI_BAR0_REGION_INDEX ...
VFIO_PCI_BAR5_REGION_INDEX:
info.offset =
VFIO_PCI_INDEX_TO_OFFSET(info.index);
info.size = pci_resource_len(pdev, info.index);
if (check_hisi_acc_quirk(pdev, info))
info.size = new_size;// BAR is limited
without migration region.
else
info.size = pci_resource_len(pdev,
info.index);
if (!info.size) { info.flags = 0; break;
Is this an acceptable/workable solution here?
As Jason says, this only restricts CPU access to the BAR, the issue is DMA access. As the hardware vendor you may be able to guarantee that a DMA transaction generated by the device targeting the remainder of the BAR will always go upstream, but can you guarantee the routing between the device and the SMMU? For instance if this device can be implemented as a plugin card, then it can be installed into a downstream port that may not support ACS. That downstream port may implement request redirection allowing the transaction to reflect back to the device without IOMMU translation. At that point the userspace driver can target the kernel driver half of the BAR and potentially expose a security risk. Thanks,
The ACC devices on this platform are not pluggable devices. They are exposed as integrated endpoint devices. So I am not sure the above concern is valid in this case.
I had a look at the userspace driver approach you suggested. But unfortunately the migration state change for the vf has to check some of the pf registers for confirming the state. So even if we move the implementation to Qemu, we still may have to use the migration uAPI to access the pf device registers.
Since the devices we are concerned here are all integrated endpoints and if the above quirk is an acceptable one, then we can use the uAPI as done in this series without overly complicating things here.
If you expect this device to appear only as an integrated endpoint, then I think Jason's suggestion above is correct. Your driver that supports migration can refuse to load for devices there the topology is other than expected and you're effectively guaranteeing DMA isolation of the user and in-kernel drivers by hardware DMA semantics and topology.
Requiring access to the PF to support the migration protocol also suggests that an in-kernel driver to support migration is our best option. Thanks,
Alex
-----Original Message----- From: Alex Williamson [mailto:alex.williamson@redhat.com] Sent: 13 May 2021 19:23 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: Jason Gunthorpe jgg@nvidia.com; liulongfang liulongfang@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: Re: [Linuxarm] Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
On Thu, 13 May 2021 17:52:56 +0000 Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com wrote:
Hi Alex,
-----Original Message----- From: Alex Williamson [mailto:alex.williamson@redhat.com] Sent: 13 May 2021 18:04 To: Shameerali Kolothum Thodi
shameerali.kolothum.thodi@huawei.com
Cc: Jason Gunthorpe jgg@nvidia.com; liulongfang liulongfang@huawei.com; cohuck@redhat.com; linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: [Linuxarm] Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio
On Thu, 13 May 2021 15:49:25 +0000 Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com wrote:
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 13 May 2021 14:44 To: liulongfang liulongfang@huawei.com Cc: Alex Williamson alex.williamson@redhat.com;
cohuck@redhat.com;
linux-kernel@vger.kernel.org; linuxarm@openeuler.org Subject: [Linuxarm] Re: [RFC PATCH 2/3] vfio/hisilicon: register the
driver to
vfio
On Thu, May 13, 2021 at 10:08:28AM +0800, liulongfang wrote:
On 2021/5/12 20:10, Jason Gunthorpe wrote: > On Wed, May 12, 2021 at 04:39:43PM +0800, liulongfang wrote: > >> Therefore, this method of limiting the length of the BAR >> configuration space can prevent unsafe operations of the memory. > > The issue is DMA controlled by the guest accessing the secure BAR > area, not the guest CPU. > > Jason > . > This secure BAR area is not presented to the Guest, which makes it impossible for the Guest to obtain the secure BAR area when establishing the DMA mapping of the configuration space. If the DMA controller accesses the secure BAR area, the access will be blocked by the SMMU.
There are scenarios where this is not true.
At a minimum the mdev driver should refuse to work in those cases.
Hi,
I think the idea here is not a generic solution, but a quirk for this specific
dev.
Something like,
--- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -866,7 +866,12 @@ static long vfio_pci_ioctl(struct vfio_device
*core_vdev,
break; case VFIO_PCI_BAR0_REGION_INDEX ...
VFIO_PCI_BAR5_REGION_INDEX:
info.offset =
VFIO_PCI_INDEX_TO_OFFSET(info.index);
info.size = pci_resource_len(pdev, info.index);
if (check_hisi_acc_quirk(pdev, info))
info.size = new_size;// BAR is
limited
without migration region.
else
info.size = pci_resource_len(pdev,
info.index);
if (!info.size) { info.flags = 0; break;
Is this an acceptable/workable solution here?
As Jason says, this only restricts CPU access to the BAR, the issue is DMA access. As the hardware vendor you may be able to guarantee that a DMA transaction generated by the device targeting the remainder of the BAR will always go upstream, but can you guarantee the routing between the device and the SMMU? For instance if this device can be implemented as a plugin card, then it can be installed into a downstream port that may not support ACS. That downstream port may implement request redirection allowing the transaction to reflect back to the device without IOMMU translation. At that point the userspace driver can target the kernel driver half of the BAR and potentially expose a security risk. Thanks,
The ACC devices on this platform are not pluggable devices. They are
exposed
as integrated endpoint devices. So I am not sure the above concern is valid in
this
case.
I had a look at the userspace driver approach you suggested. But
unfortunately
the migration state change for the vf has to check some of the pf registers for confirming the state. So even if we move the implementation to Qemu, we still may have to use the migration uAPI to access the pf device registers.
Since the devices we are concerned here are all integrated endpoints and if
the
above quirk is an acceptable one, then we can use the uAPI as done in this series without overly complicating things here.
If you expect this device to appear only as an integrated endpoint, then I think Jason's suggestion above is correct. Your driver that supports migration can refuse to load for devices there the topology is other than expected and you're effectively guaranteeing DMA isolation of the user and in-kernel drivers by hardware DMA semantics and topology.
Ok. Will take a look at the recommendation.
Requiring access to the PF to support the migration protocol also suggests that an in-kernel driver to support migration is our best option.
Thanks, Shameer
On Thu, May 13, 2021 at 12:22:32PM -0600, Alex Williamson wrote:
If you expect this device to appear only as an integrated endpoint, then I think Jason's suggestion above is correct. Your driver that supports migration can refuse to load for devices there the topology is other than expected and you're effectively guaranteeing DMA isolation of the user and in-kernel drivers by hardware DMA semantics and topology.
Some kind of VFIO api 'vfio_is_this_pci_dev_is_safe_to_share()' seems appropriate here.
I saw Intel doing the same thing in one of their VDPA driver proposals.
Jason