On an ARM64 system with a SMMUv3 implementation that fully supports Broadcast TLB Maintenance(BTM) feature as part of the Distributed Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are received by SMMUv3. This is very useful when the SMMUv3 shares the page tables with the CPU(eg: Guest SVA use case). For this to work, the SMMU must use the same VMID that is allocated by KVM to configure the stage 2 translations. At present KVM VMID allocations are recycled on rollover and may change as a result. This will create issues if we have to share the KVM VMID with SMMU. Please see the discussion here, https://lore.kernel.org/linux-iommu/20200522101755.GA3453945@myrica/
This series proposes a way to share the VMID between KVM and IOMMU driver by,
1. Splitting the KVM VMID space into two equal halves based on the command line option "kvm-arm.pinned_vmid_enable". 2. First half of the VMID space follows the normal recycle on rollover policy. 3. Second half of the VMID space doesn't roll over and is used to allocate pinned VMIDs. 4. Provides helper function to retrieve the KVM instance associated with a device(if it is part of a vfio group). 5. Introduces generic interfaces to get/put pinned KVM VMIDs.
Open Items: 1. I couldn't figure out a way to determine whether a platform actually fully supports DVM/BTM or not. Not sure we can take a call based on SMMUv3 BTM feature bit alone. Probably we can get it from firmware via IORT? 2. The current splitting of VMID space is only one way to do this and probably not the best. Maybe we can follow the pinned ASID method used in SVA code. Suggestions welcome here. 3. The detach_pasid_table() interface is not very clear to me as the current Qemu prototype is not using that. This requires fixing from my side. This is based on Jean-Philippe's SVA series[1] and Eric's SMMUv3 dual-stage support series[2].
The branch with the whole vSVA + BTM solution is here, https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva-btm-rf...
This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool, ./zip_sva_per -k tlb
Thanks, Shameer
1. https://github.com/Linaro/linux-kernel-uadk/commits/uacce-devel-5.10 2. https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.auger@redhat...
Shameer Kolothum (5): vfio: Add a helper to retrieve kvm instance from a dev KVM: Add generic infrastructure to support pinned VMIDs KVM: ARM64: Add support for pinned VMIDs iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM KVM: arm64: Make sure pinned vmid is released on VM exit
arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 116 +++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++- drivers/vfio/vfio.c | 12 ++ include/linux/kvm_host.h | 17 +++ include/linux/vfio.h | 1 + virt/kvm/Kconfig | 2 + virt/kvm/kvm_main.c | 25 +++++ 9 files changed, 220 insertions(+), 5 deletions(-)
A device that belongs to vfio_group has the kvm instance associated with it. Retrieve it.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com --- drivers/vfio/vfio.c | 12 ++++++++++++ include/linux/vfio.h | 1 + 2 files changed, 13 insertions(+)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2151bc7f87ab..d1968e4bf9f4 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -876,6 +876,18 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
+struct kvm *vfio_kvm_get_from_dev(struct device *dev) +{ + struct vfio_group *group; + + group = vfio_group_get_from_dev(dev); + if (!group) + return NULL; + + return group->kvm; +} +EXPORT_SYMBOL_GPL(vfio_kvm_get_from_dev); + static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group, char *buf) { diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 38d3c6a8dc7e..8716298fee27 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -56,6 +56,7 @@ extern void *vfio_del_group_dev(struct device *dev); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); extern void *vfio_device_data(struct vfio_device *device); +extern struct kvm *vfio_kvm_get_from_dev(struct device *dev);
/** * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
Provide generic helper functions to get/put pinned VMIDs if the arch supports it.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com --- include/linux/kvm_host.h | 17 +++++++++++++++++ virt/kvm/Kconfig | 2 ++ virt/kvm/kvm_main.c | 25 +++++++++++++++++++++++++ 3 files changed, 44 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7f2e2a09ebbd..25856db74a08 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -836,6 +836,8 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu); void kvm_vcpu_kick(struct kvm_vcpu *vcpu); int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible); +int kvm_pinned_vmid_get(struct device *dev); +int kvm_pinned_vmid_put(struct device *dev);
void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_reload_remote_mmus(struct kvm *kvm); @@ -1478,4 +1480,19 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) } #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
+#ifdef CONFIG_HAVE_KVM_PINNED_VMID +int kvm_arch_pinned_vmid_get(struct kvm *kvm); +int kvm_arch_pinned_vmid_put(struct kvm *kvm); +#else +static inline int kvm_arch_pinned_vmid_get(struct kvm *kvm) +{ + return -EINVAL; +} + +static inline int kvm_arch_pinned_vmid_put(struct kvm *kvm) +{ + return -EINVAL; +} +#endif + #endif diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 1c37ccd5d402..bb55616c5616 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -63,3 +63,5 @@ config HAVE_KVM_NO_POLL
config KVM_XFER_TO_GUEST_WORK bool +config HAVE_KVM_PINNED_VMID + bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2541a17ff1c4..632d391f0e34 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -51,6 +51,7 @@ #include <linux/io.h> #include <linux/lockdep.h> #include <linux/kthread.h> +#include <linux/vfio.h>
#include <asm/processor.h> #include <asm/ioctl.h> @@ -2849,6 +2850,30 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
+int kvm_pinned_vmid_get(struct device *dev) +{ + struct kvm *kvm; + + kvm = vfio_kvm_get_from_dev(dev); + if (!kvm) + return -EINVAL; + + return kvm_arch_pinned_vmid_get(kvm); +} +EXPORT_SYMBOL_GPL(kvm_pinned_vmid_get); + +int kvm_pinned_vmid_put(struct device *dev) +{ + struct kvm *kvm; + + kvm = vfio_kvm_get_from_dev(dev); + if (!kvm) + return -EINVAL; + + return kvm_arch_pinned_vmid_put(kvm); +} +EXPORT_SYMBOL_GPL(kvm_pinned_vmid_put); + #ifndef CONFIG_S390 /* * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
On an ARM64 system with a SMMUv3 implementation that fully supports Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate instructions are received by SMMU. This is very useful when the SMMU shares the page tables with the CPU(eg: Guest SVA use case). For this to work, the SMMU must use the same VMID that is allocated by KVM to configure the stage 2 translations.
At present KVM VMID allocations are recycled on rollover and may change as a result. This will create issues if we have to share the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into two, the first half follows the normal recycle on rollover policy while the second half of the VMID pace is used to allocate pinned VMIDs. This feature is enabled based on a command line option "kvm-arm.pinned_vmid_enable".
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com --- arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/arm.c | 104 +++++++++++++++++++++++++++++- 3 files changed, 106 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 0cd9f0f75c13..db6441c6a580 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -25,6 +25,7 @@ #include <asm/fpsimd.h> #include <asm/kvm.h> #include <asm/kvm_asm.h> +#include <linux/refcount.h> #include <asm/thread_info.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED @@ -65,6 +66,7 @@ struct kvm_vmid { /* The VMID generation used for the virt. memory system */ u64 vmid_gen; u32 vmid; + refcount_t pinned; };
struct kvm_s2_mmu { diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 043756db8f6e..c5c52953e842 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -40,6 +40,7 @@ menuconfig KVM select HAVE_KVM_VCPU_RUN_PID_CHANGE select TASKSTATS select TASK_DELAY_ACCT + select HAVE_KVM_PINNED_VMID help Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c0ffb019ca8b..8955968be49f 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -56,6 +56,19 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u32 kvm_next_vmid; static DEFINE_SPINLOCK(kvm_vmid_lock);
+static bool kvm_pinned_vmid_enable; + +static int __init early_pinned_vmid_enable(char *buf) +{ + return strtobool(buf, &kvm_pinned_vmid_enable); +} + +early_param("kvm-arm.pinned_vmid_enable", early_pinned_vmid_enable); + +static DEFINE_IDA(kvm_pinned_vmids); +static u32 kvm_pinned_vmid_start; +static u32 kvm_pinned_vmid_end; + static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled); @@ -475,6 +488,10 @@ void force_vm_exit(const cpumask_t *mask) static bool need_new_vmid_gen(struct kvm_vmid *vmid) { u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen); + + if (refcount_read(&vmid->pinned)) + return false; + smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */ return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen); } @@ -485,6 +502,8 @@ static bool need_new_vmid_gen(struct kvm_vmid *vmid) */ static void update_vmid(struct kvm_vmid *vmid) { + unsigned int vmid_bits; + if (!need_new_vmid_gen(vmid)) return;
@@ -521,7 +540,12 @@ static void update_vmid(struct kvm_vmid *vmid)
vmid->vmid = kvm_next_vmid; kvm_next_vmid++; - kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1; + if (kvm_pinned_vmid_enable) + vmid_bits = kvm_get_vmid_bits() - 1; + else + vmid_bits = kvm_get_vmid_bits(); + + kvm_next_vmid &= (1 << vmid_bits) - 1;
smp_wmb(); WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen)); @@ -569,6 +593,71 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) return ret; }
+int kvm_arch_pinned_vmid_get(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + struct kvm_vmid *kvm_vmid; + int ret; + + if (!kvm_pinned_vmid_enable || !atomic_read(&kvm->online_vcpus)) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, 0); + if (!vcpu) + return -EINVAL; + + kvm_vmid = &vcpu->arch.hw_mmu->vmid; + + spin_lock(&kvm_vmid_lock); + + if (refcount_inc_not_zero(&kvm_vmid->pinned)) { + spin_unlock(&kvm_vmid_lock); + return kvm_vmid->vmid; + } + + ret = ida_alloc_range(&kvm_pinned_vmids, kvm_pinned_vmid_start, + kvm_pinned_vmid_end, GFP_KERNEL); + if (ret < 0) { + spin_unlock(&kvm_vmid_lock); + return ret; + } + + force_vm_exit(cpu_all_mask); + kvm_call_hyp(__kvm_flush_vm_context); + + kvm_vmid->vmid = (u32)ret; + refcount_set(&kvm_vmid->pinned, 1); + spin_unlock(&kvm_vmid_lock); + + return ret; +} + +int kvm_arch_pinned_vmid_put(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + struct kvm_vmid *kvm_vmid; + + if (!kvm_pinned_vmid_enable) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, 0); + if (!vcpu) + return -EINVAL; + + kvm_vmid = &vcpu->arch.hw_mmu->vmid; + + spin_lock(&kvm_vmid_lock); + + if (!refcount_read(&kvm_vmid->pinned)) + goto out; + + if (refcount_dec_and_test(&kvm_vmid->pinned)) + ida_free(&kvm_pinned_vmids, kvm_vmid->vmid); +out: + spin_unlock(&kvm_vmid_lock); + return 0; +} + bool kvm_arch_intc_initialized(struct kvm *kvm) { return vgic_initialized(kvm); @@ -1680,6 +1769,16 @@ static void check_kvm_target_cpu(void *ret) *(int *)ret = kvm_target_cpu(); }
+static void kvm_arm_pinned_vmid_init(void) +{ + unsigned int vmid_bits = kvm_get_vmid_bits(); + + kvm_pinned_vmid_start = (1 << (vmid_bits - 1)); + kvm_pinned_vmid_end = (1 << vmid_bits) - 1; + + kvm_info("Pinned VMID[0x%x - 0x%x] enabled\n", kvm_pinned_vmid_start, kvm_pinned_vmid_end); +} + struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu; @@ -1790,6 +1889,9 @@ int kvm_arch_init(void *opaque) else kvm_info("Hyp mode initialized successfully\n");
+ if (kvm_pinned_vmid_enable) + kvm_arm_pinned_vmid_init(); + return 0;
out_hyp:
Hi Shameer,
[+Will]
On Mon, 22 Feb 2021 15:53:36 +0000, Shameer Kolothum shameerali.kolothum.thodi@huawei.com wrote:
On an ARM64 system with a SMMUv3 implementation that fully supports Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate instructions are received by SMMU. This is very useful when the SMMU shares the page tables with the CPU(eg: Guest SVA use case). For this to work, the SMMU must use the same VMID that is allocated by KVM to configure the stage 2 translations.
At present KVM VMID allocations are recycled on rollover and may change as a result. This will create issues if we have to share the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into two, the first half follows the normal recycle on rollover policy while the second half of the VMID pace is used to allocate pinned VMIDs. This feature is enabled based on a command line option "kvm-arm.pinned_vmid_enable".
I think this is the wrong approach. Instead of shoving the notion of pinned VMID into the current allocator, which really isn't designed for this, it'd be a lot better if we aligned the KVM VMID allocator with the ASID allocator, which already has support for pinning and is in general much more efficient.
Julien Grall worked on such a series[1] a long while ago, which got stalled because of the 32bit KVM port. Since we don't have this burden anymore, I'd rather you look in that direction instead of wasting half of the VMID space on potentially pinned VMIDs.
Thanks,
M.
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20190724162534.7...
Hi Marc,
-----Original Message----- From: Marc Zyngier [mailto:maz@kernel.org] Sent: 09 March 2021 10:33 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: linux-arm-kernel@lists.infradead.org; iommu@lists.linux-foundation.org; kvmarm@lists.cs.columbia.edu; alex.williamson@redhat.com; jean-philippe@linaro.org; eric.auger@redhat.com; zhangfei.gao@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; linuxarm@openeuler.org; Will Deacon will@kernel.org Subject: Re: [RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs
Hi Shameer,
[+Will]
On Mon, 22 Feb 2021 15:53:36 +0000, Shameer Kolothum shameerali.kolothum.thodi@huawei.com wrote:
On an ARM64 system with a SMMUv3 implementation that fully supports Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate instructions are received by SMMU. This is very useful when the SMMU shares the page tables with the CPU(eg: Guest SVA use case). For this to work, the SMMU must use the same VMID that is allocated by KVM to configure the stage 2 translations.
At present KVM VMID allocations are recycled on rollover and may change as a result. This will create issues if we have to share the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into two, the first half follows the normal recycle on rollover policy while the second half of the VMID pace is used to allocate pinned VMIDs. This feature is enabled based on a command line option "kvm-arm.pinned_vmid_enable".
I think this is the wrong approach. Instead of shoving the notion of pinned VMID into the current allocator, which really isn't designed for this, it'd be a lot better if we aligned the KVM VMID allocator with the ASID allocator, which already has support for pinning and is in general much more efficient.
Ok. Agree that this is not efficient, but was easy to prototype something :)
Julien Grall worked on such a series[1] a long while ago, which got stalled because of the 32bit KVM port. Since we don't have this burden anymore, I'd rather you look in that direction instead of wasting half of the VMID space on potentially pinned VMIDs.
Sure. I will check that and work on it.
Thanks, Shameer
Thanks,
M.
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20190724162534 .7390-1-julien.grall@arm.com/
-- Without deviation from the norm, progress is not possible.
If the SMMU supports BTM and the device belongs to NESTED domain with shared pasid table, we need to use the VMID allocated by the KVM for the s2 configuration. Hence, request a pinned VMID from KVM.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++++++++++++++- 1 file changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 26bf7da1bcd0..04f83f7c8319 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -28,6 +28,7 @@ #include <linux/pci.h> #include <linux/pci-ats.h> #include <linux/platform_device.h> +#include <linux/kvm_host.h>
#include <linux/amba/bus.h>
@@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned long *map, int idx) clear_bit(idx, map); }
+static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_master *master; + + master = list_first_entry_or_null(&smmu_domain->devices, + struct arm_smmu_master, domain_head); + if (!master) + return -EINVAL; + + return kvm_pinned_vmid_get(master->dev); +} + +static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_master *master; + + master = list_first_entry_or_null(&smmu_domain->devices, + struct arm_smmu_master, domain_head); + if (!master) + return -EINVAL; + + if (smmu_domain->s2_cfg.vmid) + return kvm_pinned_vmid_put(master->dev); + + return 0; +} + static void arm_smmu_domain_free(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) mutex_unlock(&arm_smmu_asid_lock); } if (s2_cfg->set) { - if (s2_cfg->vmid) - arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid); + if (s2_cfg->vmid) { + if (!(smmu->features & ARM_SMMU_FEAT_BTM) && + smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) + arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid); + } }
kfree(smmu_domain); @@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain, !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) goto out;
+ if (smmu->features & ARM_SMMU_FEAT_BTM) { + ret = arm_smmu_pinned_vmid_get(smmu_domain); + if (ret < 0) + goto out; + + if (smmu_domain->s2_cfg.vmid) + arm_smmu_bitmap_free(smmu->vmid_map, smmu_domain->s2_cfg.vmid); + + smmu_domain->s2_cfg.vmid = (u16)ret; + } + smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr; smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits; smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt; @@ -3221,6 +3263,7 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain, static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_master *master; unsigned long flags;
@@ -3237,6 +3280,8 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) arm_smmu_install_ste_for_dev(master); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+ if (smmu->features & ARM_SMMU_FEAT_BTM) + arm_smmu_pinned_vmid_put(smmu_domain); unlock: mutex_unlock(&smmu_domain->init_mutex); }
Hi Shameer,
On Mon, Feb 22, 2021 at 03:53:37PM +0000, Shameer Kolothum wrote:
If the SMMU supports BTM and the device belongs to NESTED domain with shared pasid table, we need to use the VMID allocated by the KVM for the s2 configuration. Hence, request a pinned VMID from KVM.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++++++++++++++- 1 file changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 26bf7da1bcd0..04f83f7c8319 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -28,6 +28,7 @@ #include <linux/pci.h> #include <linux/pci-ats.h> #include <linux/platform_device.h> +#include <linux/kvm_host.h>
#include <linux/amba/bus.h>
@@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned long *map, int idx) clear_bit(idx, map); }
+static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain *smmu_domain) +{
- struct arm_smmu_master *master;
- master = list_first_entry_or_null(&smmu_domain->devices,
struct arm_smmu_master, domain_head);
This probably needs to hold devices_lock while using master.
- if (!master)
return -EINVAL;
- return kvm_pinned_vmid_get(master->dev);
+}
+static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain *smmu_domain) +{
- struct arm_smmu_master *master;
- master = list_first_entry_or_null(&smmu_domain->devices,
struct arm_smmu_master, domain_head);
- if (!master)
return -EINVAL;
- if (smmu_domain->s2_cfg.vmid)
return kvm_pinned_vmid_put(master->dev);
- return 0;
+}
static void arm_smmu_domain_free(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) mutex_unlock(&arm_smmu_asid_lock); } if (s2_cfg->set) {
if (s2_cfg->vmid)
arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
if (s2_cfg->vmid) {
if (!(smmu->features & ARM_SMMU_FEAT_BTM) &&
smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
}
}
kfree(smmu_domain);
@@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain, !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) goto out;
if (smmu->features & ARM_SMMU_FEAT_BTM) {
ret = arm_smmu_pinned_vmid_get(smmu_domain);
if (ret < 0)
goto out;
if (smmu_domain->s2_cfg.vmid)
arm_smmu_bitmap_free(smmu->vmid_map, smmu_domain->s2_cfg.vmid);
smmu_domain->s2_cfg.vmid = (u16)ret;
That will require a TLB invalidation on the old VMID, once the STE is rewritten.
More generally I think this pinned VMID set conflicts with that of stage-2-only domains (which is the default state until a guest attaches a PASID table). Say you have one guest using DOMAIN_NESTED without PASID table, just DMA to IPA using VMID 0x8000. Now another guest attaches a PASID table and obtains the same VMID from KVM. The stage-2 translation might use TLB entries from the other guest, no? They'll both create stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
It's tempting to allocate all VMIDs through KVM instead, but that will force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might break existing users of that extension (though I'm not sure there are any). Instead we might need to restrict the SMMU VMID bitmap to match the private VMID set in KVM.
Besides we probably want to restrict this feature to systems supporting VMID16 on both SMMU and CPUs, or at least check that they are compatible.
}
- smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr; smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits; smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
@@ -3221,6 +3263,7 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain, static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_master *master; unsigned long flags;
@@ -3237,6 +3280,8 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) arm_smmu_install_ste_for_dev(master); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- if (smmu->features & ARM_SMMU_FEAT_BTM)
arm_smmu_pinned_vmid_put(smmu_domain);
Aliasing here as well: the VMID is still live but can be reallocated by KVM and another domain might obtain it.
Thanks, Jean
unlock: mutex_unlock(&smmu_domain->init_mutex); } -- 2.17.1
Hi Jean,
-----Original Message----- From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org] Sent: 04 March 2021 17:11 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: linux-arm-kernel@lists.infradead.org; iommu@lists.linux-foundation.org; kvmarm@lists.cs.columbia.edu; maz@kernel.org; alex.williamson@redhat.com; eric.auger@redhat.com; zhangfei.gao@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; linuxarm@openeuler.org Subject: Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
Hi Shameer,
On Mon, Feb 22, 2021 at 03:53:37PM +0000, Shameer Kolothum wrote:
If the SMMU supports BTM and the device belongs to NESTED domain with shared pasid table, we need to use the VMID allocated by the KVM for the s2 configuration. Hence, request a pinned VMID from KVM.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49
++++++++++++++++++++-
1 file changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 26bf7da1bcd0..04f83f7c8319 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -28,6 +28,7 @@ #include <linux/pci.h> #include <linux/pci-ats.h> #include <linux/platform_device.h> +#include <linux/kvm_host.h>
#include <linux/amba/bus.h>
@@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned
long *map, int idx)
clear_bit(idx, map); }
+static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain
*smmu_domain)
+{
- struct arm_smmu_master *master;
- master = list_first_entry_or_null(&smmu_domain->devices,
struct arm_smmu_master, domain_head);
This probably needs to hold devices_lock while using master.
Ok.
- if (!master)
return -EINVAL;
- return kvm_pinned_vmid_get(master->dev);
+}
+static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain
*smmu_domain)
+{
- struct arm_smmu_master *master;
- master = list_first_entry_or_null(&smmu_domain->devices,
struct arm_smmu_master, domain_head);
- if (!master)
return -EINVAL;
- if (smmu_domain->s2_cfg.vmid)
return kvm_pinned_vmid_put(master->dev);
- return 0;
+}
static void arm_smmu_domain_free(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct
iommu_domain *domain)
mutex_unlock(&arm_smmu_asid_lock);
} if (s2_cfg->set) {
if (s2_cfg->vmid)
arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
if (s2_cfg->vmid) {
if (!(smmu->features & ARM_SMMU_FEAT_BTM) &&
smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
arm_smmu_bitmap_free(smmu->vmid_map,
s2_cfg->vmid);
}
}
kfree(smmu_domain);
@@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct
iommu_domain *domain,
!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) goto out;
if (smmu->features & ARM_SMMU_FEAT_BTM) {
ret = arm_smmu_pinned_vmid_get(smmu_domain);
if (ret < 0)
goto out;
if (smmu_domain->s2_cfg.vmid)
arm_smmu_bitmap_free(smmu->vmid_map,
smmu_domain->s2_cfg.vmid);
smmu_domain->s2_cfg.vmid = (u16)ret;
That will require a TLB invalidation on the old VMID, once the STE is rewritten.
True. Will add that.
More generally I think this pinned VMID set conflicts with that of stage-2-only domains (which is the default state until a guest attaches a PASID table). Say you have one guest using DOMAIN_NESTED without PASID table, just DMA to IPA using VMID 0x8000. Now another guest attaches a PASID table and obtains the same VMID from KVM. The stage-2 translation might use TLB entries from the other guest, no? They'll both create stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
It's tempting to allocate all VMIDs through KVM instead, but that will force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might break existing users of that extension (though I'm not sure there are any). Instead we might need to restrict the SMMU VMID bitmap to match the private VMID set in KVM.
Right, that is indeed a problem. I will take a look at this suggestion.
Besides we probably want to restrict this feature to systems supporting VMID16 on both SMMU and CPUs, or at least check that they are compatible.
Yes. Ideally I would like to detect that in the KVM code and enable/disable the VMID splitting based on that. But I am yet to figure out an easy way to do that in KVM.
}
- smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr; smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits; smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
@@ -3221,6 +3263,7 @@ static int arm_smmu_attach_pasid_table(struct
iommu_domain *domain,
static void arm_smmu_detach_pasid_table(struct iommu_domain
*domain)
{ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_master *master; unsigned long flags;
@@ -3237,6 +3280,8 @@ static void arm_smmu_detach_pasid_table(struct
iommu_domain *domain)
arm_smmu_install_ste_for_dev(master);
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- if (smmu->features & ARM_SMMU_FEAT_BTM)
arm_smmu_pinned_vmid_put(smmu_domain);
Aliasing here as well: the VMID is still live but can be reallocated by KVM and another domain might obtain it.
Ok. Got it.
Thanks for the review, Shameer
Thanks, Jean
unlock: mutex_unlock(&smmu_domain->init_mutex); } -- 2.17.1
Hi Jean,
-----Original Message----- From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org] Sent: 04 March 2021 17:11 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: linux-arm-kernel@lists.infradead.org; iommu@lists.linux-foundation.org; kvmarm@lists.cs.columbia.edu; maz@kernel.org; alex.williamson@redhat.com; eric.auger@redhat.com; zhangfei.gao@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; linuxarm@openeuler.org Subject: Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
[...]
kfree(smmu_domain); @@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct
iommu_domain *domain,
!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) goto out;
if (smmu->features & ARM_SMMU_FEAT_BTM) {
ret = arm_smmu_pinned_vmid_get(smmu_domain);
if (ret < 0)
goto out;
if (smmu_domain->s2_cfg.vmid)
arm_smmu_bitmap_free(smmu->vmid_map,
smmu_domain->s2_cfg.vmid);
smmu_domain->s2_cfg.vmid = (u16)ret;
That will require a TLB invalidation on the old VMID, once the STE is rewritten.
More generally I think this pinned VMID set conflicts with that of stage-2-only domains (which is the default state until a guest attaches a PASID table). Say you have one guest using DOMAIN_NESTED without PASID table, just DMA to IPA using VMID 0x8000. Now another guest attaches a PASID table and obtains the same VMID from KVM. The stage-2 translation might use TLB entries from the other guest, no? They'll both create stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
Now that we are trying to align the KVM VMID allocation algorithm similar to that of the ASID allocator [1], I attempted to use that for the SMMU pinned VMID allocation. But the issue you have mentioned above is still valid.
And as a solution what I have tried now is follow what pinned ASID is doing in SVA, -Use xarray for private VMIDs -Get pinned VMID from KVM for DOMAIN_NESTED with PASID table -If the new pinned VMID is in use by private, then update the private VMID(VMID update to a live STE).
This seems to work, but still need to run more tests with this though.
It's tempting to allocate all VMIDs through KVM instead, but that will force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might break existing users of that extension (though I'm not sure there are any). Instead we might need to restrict the SMMU VMID bitmap to match the private VMID set in KVM.
Another solution I have in mind is, make the new KVM VMID allocator common between SMMUv3 and KVM. This will help to avoid all the private and shared VMID splitting, also no need for live updates to STE VMID. One possible drawback is less number of available KVM VMIDs but with 16 bit VMID space I am not sure how much that is a concern.
Please let me know your thoughts.
Thanks, Shameer
[1]. https://lore.kernel.org/kvmarm/20210616155606.2806-1-shameerali.kolothum.tho...
Hi Shameer,
On Wed, Jul 21, 2021 at 08:54:00AM +0000, Shameerali Kolothum Thodi wrote:
More generally I think this pinned VMID set conflicts with that of stage-2-only domains (which is the default state until a guest attaches a PASID table). Say you have one guest using DOMAIN_NESTED without PASID table, just DMA to IPA using VMID 0x8000. Now another guest attaches a PASID table and obtains the same VMID from KVM. The stage-2 translation might use TLB entries from the other guest, no? They'll both create stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
Now that we are trying to align the KVM VMID allocation algorithm similar to that of the ASID allocator [1], I attempted to use that for the SMMU pinned VMID allocation. But the issue you have mentioned above is still valid.
And as a solution what I have tried now is follow what pinned ASID is doing in SVA, -Use xarray for private VMIDs -Get pinned VMID from KVM for DOMAIN_NESTED with PASID table -If the new pinned VMID is in use by private, then update the private VMID(VMID update to a live STE).
This seems to work, but still need to run more tests with this though.
It's tempting to allocate all VMIDs through KVM instead, but that will force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might break existing users of that extension (though I'm not sure there are any). Instead we might need to restrict the SMMU VMID bitmap to match the private VMID set in KVM.
Another solution I have in mind is, make the new KVM VMID allocator common between SMMUv3 and KVM. This will help to avoid all the private and shared VMID splitting, also no need for live updates to STE VMID. One possible drawback is less number of available KVM VMIDs but with 16 bit VMID space I am not sure how much that is a concern.
Yes I think that works too. In practice there shouldn't be many VMIDs on the SMMU side, the feature's only enabled when a user wants to assign devices with nesting translation (unlike ASIDs where each device in the system gets a private ASID by default).
Note that you still need to pin all VMIDs used by the SMMU, otherwise you'll have to update the STE after rollover.
The problem we have with VFIO_TYPE1_NESTING_IOMMU might be solved by the upcoming deprecation of VFIO_*_IOMMU [2]. We need a specific sequence from userspace: 1. Attach VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD) 2. Create nesting IOMMU domain and attach the group to it (VFIO_GROUP_SET_CONTAINER, VFIO_SET_IOMMU becomes IOMMU_IOASID_ALLOC, VFIO_DEVICE_ATTACH_IOASID) Currently QEMU does 2 then 1, which would cause the SMMU to allocate a separate VMID. If we wanted to extend VFIO_TYPE1_NESTING_IOMMU with PASID tables we'd need to mandate 1-2 and may break existing users. In the new design we can require from the start that creating a nesting IOMMU container through /dev/iommu *must* come with a KVM context, that way we're sure to reuse the existing VMID.
Thanks, Jean
[2] https://lore.kernel.org/linux-iommu/BN9PR11MB5433B1E4AE5B0480369F97178C189@B...
-----Original Message----- From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org] Sent: 22 July 2021 17:46 To: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Cc: linux-arm-kernel@lists.infradead.org; iommu@lists.linux-foundation.org; kvmarm@lists.cs.columbia.edu; maz@kernel.org; alex.williamson@redhat.com; eric.auger@redhat.com; zhangfei.gao@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com; Zengtao (B) prime.zeng@hisilicon.com; linuxarm@openeuler.org; Linuxarm linuxarm@huawei.com Subject: [Linuxarm] Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
Hi Shameer,
On Wed, Jul 21, 2021 at 08:54:00AM +0000, Shameerali Kolothum Thodi wrote:
More generally I think this pinned VMID set conflicts with that of stage-2-only domains (which is the default state until a guest attaches a PASID table). Say you have one guest using DOMAIN_NESTED without
PASID
table, just DMA to IPA using VMID 0x8000. Now another guest attaches a PASID table and obtains the same VMID from KVM. The stage-2 translation might use TLB entries from the other guest, no? They'll both create stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
Now that we are trying to align the KVM VMID allocation algorithm similar
to
that of the ASID allocator [1], I attempted to use that for the SMMU pinned VMID allocation. But the issue you have mentioned above is still valid.
And as a solution what I have tried now is follow what pinned ASID is doing in SVA, -Use xarray for private VMIDs -Get pinned VMID from KVM for DOMAIN_NESTED with PASID table -If the new pinned VMID is in use by private, then update the private VMID(VMID update to a live STE).
This seems to work, but still need to run more tests with this though.
It's tempting to allocate all VMIDs through KVM instead, but that will force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and
might
break existing users of that extension (though I'm not sure there are any). Instead we might need to restrict the SMMU VMID bitmap to match the private VMID set in KVM.
Another solution I have in mind is, make the new KVM VMID allocator
common
between SMMUv3 and KVM. This will help to avoid all the private and
shared
VMID splitting, also no need for live updates to STE VMID. One possible
drawback
is less number of available KVM VMIDs but with 16 bit VMID space I am not
sure
how much that is a concern.
Yes I think that works too. In practice there shouldn't be many VMIDs on the SMMU side, the feature's only enabled when a user wants to assign devices with nesting translation (unlike ASIDs where each device in the system gets a private ASID by default).
Ok. What about implementations that supports only stage 2? Do we need a private VMID allocator for those or can use the same common KVM VMID allocator?
Note that you still need to pin all VMIDs used by the SMMU, otherwise you'll have to update the STE after rollover.
Sure.
The problem we have with VFIO_TYPE1_NESTING_IOMMU might be solved by the upcoming deprecation of VFIO_*_IOMMU [2]. We need a specific sequence from userspace:
- Attach VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD)
- Create nesting IOMMU domain and attach the group to it (VFIO_GROUP_SET_CONTAINER, VFIO_SET_IOMMU becomes IOMMU_IOASID_ALLOC, VFIO_DEVICE_ATTACH_IOASID)
Currently QEMU does 2 then 1, which would cause the SMMU to allocate a separate VMID.
Yes. I have observed this with my current implementation. I have a check to see the private S2 config VMID belongs to the same domain s2_cfg, then skip the live update to the STE VMID.
If we wanted to extend VFIO_TYPE1_NESTING_IOMMU with PASID tables we'd need to mandate 1-2 and may break existing users. In the new design we can require from the start that creating a nesting IOMMU container through /dev/iommu *must* come with a KVM context, that way we're sure to reuse the existing VMID.
Ok. That helps.
Thanks, Shameer
Since the pinned VMID space is not recycled, we need to make sure that we release the vmid back into the pool when we are done with it.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com --- arch/arm64/kvm/arm.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 8955968be49f..d9900ffb88f4 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -181,8 +181,16 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_vgic_destroy(kvm);
for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - kvm_vcpu_destroy(kvm->vcpus[i]); + struct kvm_vcpu *vcpu = kvm->vcpus[i]; + + if (vcpu) { + struct kvm_vmid *vmid = &vcpu->arch.hw_mmu->vmid; + + if (refcount_read(&vmid->pinned)) { + ida_free(&kvm_pinned_vmids, vmid->vmid); + refcount_set(&vmid->pinned, 0); + } + kvm_vcpu_destroy(vcpu); kvm->vcpus[i] = NULL; } }