vsva patches
Jason Gunthorpe (58): iommu/arm-smmu-v3: Make STE programming independent of the callers iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste() iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev iommu/arm-smmu-v3: Compute the STE only once for each master iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev() iommu/arm-smmu-v3: Put writing the context descriptor in the right order iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats() iommu/arm-smmu-v3: Remove arm_smmu_master->domain iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA iommu/arm-smmu-v3: Add a global static IDENTITY domain iommu/arm-smmu-v3: Add a global static BLOCKED domain iommu/arm-smmu-v3: Use the identity/blocked domain during release iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize iommu/arm-smmu-v3: Convert to domain_alloc_paging() iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V iommu/arm-smmu-v3: Do not allow a SVA domain to be set on the wrong PASID iommu/arm-smmu-v3: Do not ATC invalidate the entire domain iommu/arm-smmu-v3: Add a type for the CD entry iommu/arm-smmu-v3: Add an ops indirection to the STE code iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() iommu/arm-smmu-v3: Consolidate clearing a CD table entry iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function iommu/arm-smmu-v3: Move allocation of the cdtable into arm_smmu_get_cd_ptr() iommu/arm-smmu-v3: Allocate the CD table entry in advance iommu/arm-smmu-v3: Move the CD generation for SVA into a function iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() iommu/arm-smmu-v3: Start building a generic PASID layer iommu/arm-smmu-v3: Make smmu_domain->devices into an allocated list iommu/arm-smmu-v3: Make changing domains be hitless for ATS iommu/arm-smmu-v3: Add ssid to struct arm_smmu_master_domain iommu/arm-smmu-v3: Keep track of valid CD entries in the cd_table iommu/arm-smmu-v3: Thread SSID through the arm_smmu_attach_*() interface iommu/arm-smmu-v3: Make SVA allocate a normal arm_smmu_domain iommu/arm-smmu-v3: Keep track of arm_smmu_master_domain for SVA iommu: Add ops->domain_alloc_sva() iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domain iommu/arm-smmu-v3: Consolidate freeing the ASID/VMID iommu/arm-smmu-v3: Move the arm_smmu_asid_xa to per-smmu like vmid iommu/arm-smmu-v3: Bring back SVA BTM support iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is used iommu/arm-smmu-v3: Allow a PASID to be set when RID is IDENTITY/BLOCKED iommu/arm-smmu-v3: Allow setting a S1 domain to a PASID iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab iommu/arm-smmu-v3: Do not use devm for the cd table allocations iommu/arm-smmu-v3: Remove arm_smmu_ctx_desc & arm_smmu_s2_cfg iommu/arm-smmu-v3: Consolidate cache tag allocation iommu/arm-smmu-v3: Replace arm_smmu_tlb_inv_asid with arm_smmu_tlb_inv_all_s1 iommu/arm-smmu-v3: Create arm_smmu_tlb_inv_range_s1() iommu/arm-smmu-v3: Make arm_smmu_tlb_inv_range_s2() iommu/arm-smmu-v3: Add the limit logic to arm_smmu_tlb_inv_range_s2() vfio: Remove VFIO_TYPE1_NESTING_IOMMU iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry iommu/arm-smmu-v3: Use the new rb tree helpers
Liu Yi L (2): vfio/pci: Emulate PASID/PRI capability for VFs vfio/pci: Expose PCIe PASID capability to userspace
Lu Baolu (8): iommu: Add iopf domain attach/detach/replace interface iommu/sva: Use iopf domain attach/detach interface iommufd: Add fault and response message definitions iommufd: Add iommufd fault object iommufd: Associate fault object with iommufd_hw_pgtable iommufd: IOPF-capable hw page table attach/detach/replace iommufd/selftest: Add IOPF support for mock device iommufd/selftest: Add coverage for IOPF test
Nicolin Chen (7): iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info iommu/arm-smmu-v3: Implement arm_smmu_get_msi_mapping_domain iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL iommu/arm-smmu-v3: Add a helper queue_ncmds iommu/arm-smmu-v3: Link user SID and physical SID iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user Hack: iommu/arm-smmu-v3: Enable ATS for identity domains
Robin Murphy (1): iommu/dma: Support MSIs through nested domains
Shameer Kolothum (1): vfio/pci: Fix for wrong CAP IDs
drivers/iommu/Kconfig | 12 +- drivers/iommu/arm/arm-smmu-v3/Makefile | 2 + .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 658 +++-- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 423 +++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2305 ++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 183 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 - drivers/iommu/dma-iommu.c | 18 +- drivers/iommu/io-pgfault.c | 215 +- drivers/iommu/iommu-sva.c | 63 +- drivers/iommu/iommu.c | 10 - drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 16 +- drivers/iommu/iommufd/fault.c | 391 +++ drivers/iommu/iommufd/hw_pagetable.c | 36 +- drivers/iommu/iommufd/iommufd_private.h | 41 + drivers/iommu/iommufd/iommufd_test.h | 8 + drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 63 + drivers/iommu/iommufd/vfio_compat.c | 7 +- drivers/vfio/pci/vfio_pci_config.c | 325 ++- drivers/vfio/vfio_iommu_type1.c | 12 +- include/linux/iommu.h | 50 +- include/uapi/linux/iommufd.h | 166 ++ include/uapi/linux/vfio.h | 2 +- tools/testing/selftests/iommu/iommufd.c | 17 + .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 83 +- 28 files changed, 3925 insertions(+), 1206 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c create mode 100644 drivers/iommu/iommufd/fault.c
From: Lu Baolu baolu.lu@linux.intel.com
There is a slight difference between iopf domains and non-iopf domains. In the latter, references to domains occur between attach and detach; While in the former, due to the existence of asynchronous iopf handling paths, references to the domain may occur after detach, which leads to potential UAF issues.
Introduce iopf-specific domain attach/detach/replace interface where the caller provides an attach cookie. This cookie can only be freed after all outstanding iopf groups are handled and the domain is detached from the RID or PASID. The presence of this attach cookie indicates that a domain has been attached to the RID or PASID and won't be released until all outstanding iopf groups are handled.
The cookie data structure also includes a private field for storing a caller-specific pointer that will be passed back to its page fault handler. This field provides flexibility for various uses. For example, the IOMMUFD could use it to store the iommufd_device pointer, so that it could easily retrieve the dev_id of the device that triggered the fault.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com --- drivers/iommu/io-pgfault.c | 158 +++++++++++++++++++++++++++++++++++++ include/linux/iommu.h | 36 +++++++++ 2 files changed, 194 insertions(+)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index 06d78fcc79fd..cd28fa70fd23 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -39,6 +39,103 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) kfree_rcu(fault_param, rcu); }
+/* Get the domain attachment cookie for pasid of a device. */ +static struct iopf_attach_cookie __maybe_unused * +iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid) +{ + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); + struct iopf_attach_cookie *curr; + + if (!iopf_param) + return ERR_PTR(-ENODEV); + + xa_lock(&iopf_param->pasid_cookie); + curr = xa_load(&iopf_param->pasid_cookie, pasid); + if (curr && !refcount_inc_not_zero(&curr->users)) + curr = ERR_PTR(-EINVAL); + xa_unlock(&iopf_param->pasid_cookie); + + iopf_put_dev_fault_param(iopf_param); + + return curr; +} + +/* Put the domain attachment cookie. */ +static void iopf_pasid_cookie_put(struct iopf_attach_cookie *cookie) +{ + if (cookie && refcount_dec_and_test(&cookie->users)) + cookie->release(cookie); +} + +/* + * Set the domain attachment cookie for pasid of a device. Return 0 on + * success, or error number on failure. + */ +static int iopf_pasid_cookie_set(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie) +{ + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); + struct iopf_attach_cookie *curr; + + if (!iopf_param) + return -ENODEV; + + refcount_set(&cookie->users, 1); + cookie->dev = dev; + cookie->pasid = pasid; + cookie->domain = domain; + + curr = xa_cmpxchg(&iopf_param->pasid_cookie, pasid, NULL, cookie, GFP_KERNEL); + iopf_put_dev_fault_param(iopf_param); + + return curr ? xa_err(curr) : 0; +} + +/* Clear the domain attachment cookie for pasid of a device. */ +static void iopf_pasid_cookie_clear(struct device *dev, ioasid_t pasid) +{ + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); + struct iopf_attach_cookie *curr; + + if (WARN_ON(!iopf_param)) + return; + + curr = xa_erase(&iopf_param->pasid_cookie, pasid); + /* paired with iopf_pasid_cookie_set/replace() */ + iopf_pasid_cookie_put(curr); + + iopf_put_dev_fault_param(iopf_param); +} + +/* Replace the domain attachment cookie for pasid of a device. */ +static int iopf_pasid_cookie_replace(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie) +{ + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); + struct iopf_attach_cookie *curr; + + if (!iopf_param) + return -ENODEV; + + if (cookie) { + refcount_set(&cookie->users, 1); + cookie->dev = dev; + cookie->pasid = pasid; + cookie->domain = domain; + } + + curr = xa_store(&iopf_param->pasid_cookie, pasid, cookie, GFP_KERNEL); + if (xa_err(curr)) + return xa_err(curr); + + /* paired with iopf_pasid_cookie_set/replace() */ + iopf_pasid_cookie_put(curr); + + iopf_put_dev_fault_param(iopf_param); + + return 0; +} + static void __iopf_free_group(struct iopf_group *group) { struct iopf_fault *iopf, *next; @@ -354,6 +451,7 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) mutex_init(&fault_param->lock); INIT_LIST_HEAD(&fault_param->faults); INIT_LIST_HEAD(&fault_param->partial); + xa_init(&fault_param->pasid_cookie); fault_param->dev = dev; refcount_set(&fault_param->users, 1); list_add(&fault_param->queue_list, &queue->devices); @@ -491,3 +589,63 @@ void iopf_queue_free(struct iopf_queue *queue) kfree(queue); } EXPORT_SYMBOL_GPL(iopf_queue_free); + +int iopf_domain_attach(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie) +{ + int ret; + + if (!domain->iopf_handler) + return -EINVAL; + + if (pasid == IOMMU_NO_PASID) + ret = iommu_attach_group(domain, dev->iommu_group); + else + ret = iommu_attach_device_pasid(domain, dev, pasid); + if (ret) + return ret; + + ret = iopf_pasid_cookie_set(domain, dev, pasid, cookie); + if (ret) { + if (pasid == IOMMU_NO_PASID) + iommu_detach_group(domain, dev->iommu_group); + else + iommu_detach_device_pasid(domain, dev, pasid); + } + + return ret; +} +EXPORT_SYMBOL_GPL(iopf_domain_attach); + +void iopf_domain_detach(struct iommu_domain *domain, struct device *dev, ioasid_t pasid) +{ + iopf_pasid_cookie_clear(dev, pasid); + + if (pasid == IOMMU_NO_PASID) + iommu_detach_group(domain, dev->iommu_group); + else + iommu_detach_device_pasid(domain, dev, pasid); +} +EXPORT_SYMBOL_GPL(iopf_domain_detach); + +int iopf_domain_replace(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie) +{ + struct iommu_domain *old_domain = iommu_get_domain_for_dev(dev); + int ret; + + if (!old_domain || pasid != IOMMU_NO_PASID || + (!old_domain->iopf_handler && !domain->iopf_handler)) + return -EINVAL; + + ret = iommu_group_replace_domain(dev->iommu_group, domain); + if (ret) + return ret; + + ret = iopf_pasid_cookie_replace(domain, dev, pasid, cookie); + if (ret) + iommu_group_replace_domain(dev->iommu_group, old_domain); + + return ret; +} +EXPORT_SYMBOL_NS_GPL(iopf_domain_replace, IOMMUFD_INTERNAL); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 98b0c455aec8..2432b98d71ff 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -116,6 +116,16 @@ struct iommu_page_response { u32 code; };
+struct iopf_attach_cookie { + struct iommu_domain *domain; + struct device *dev; + unsigned int pasid; + refcount_t users; + + void *private; + void (*release)(struct iopf_attach_cookie *cookie); +}; + struct iopf_fault { struct iommu_fault fault; /* node for pending lists */ @@ -716,6 +726,7 @@ struct iommu_fault_param { struct device *dev; struct iopf_queue *queue; struct list_head queue_list; + struct xarray pasid_cookie;
struct list_head partial; struct list_head faults; @@ -1616,6 +1627,12 @@ void iopf_free_group(struct iopf_group *group); void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); void iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status); +int iopf_domain_attach(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie); +void iopf_domain_detach(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid); +int iopf_domain_replace(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid, struct iopf_attach_cookie *cookie); #else static inline int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) @@ -1660,5 +1677,24 @@ static inline void iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status) { } + +static inline int iopf_domain_attach(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid, + struct iopf_attach_cookie *cookie) +{ + return -ENODEV; +} + +static inline void iopf_domain_detach(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ +} + +static inline int iopf_domain_replace(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid, + struct iopf_attach_cookie *cookie) +{ + return -ENODEV; +} #endif /* CONFIG_IOMMU_IOPF */ #endif /* __LINUX_IOMMU_H */
From: Lu Baolu baolu.lu@linux.intel.com
The iommu sva implementation relies on iopf handling. Allocate an attachment cookie and use the iopf domain attach/detach interface. The SVA domain is guaranteed to be released after all outstanding page faults are handled.
In the fault delivering path, the attachment cookie is retrieved, instead of the domain. This ensures that the page fault is forwarded only if an iopf-capable domain is attached, and the domain will only be released after all outstanding faults are handled.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com
conflict: drivers/iommu/iommu-sva.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/io-pgfault.c | 59 +++++++++++++++++++------------------- drivers/iommu/iommu-sva.c | 47 ++++++++++++++++++++++++------ include/linux/iommu.h | 2 +- 3 files changed, 68 insertions(+), 40 deletions(-)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index cd28fa70fd23..9a8074d8abf0 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -40,7 +40,7 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) }
/* Get the domain attachment cookie for pasid of a device. */ -static struct iopf_attach_cookie __maybe_unused * +static struct iopf_attach_cookie * iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid) { struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); @@ -147,6 +147,7 @@ static void __iopf_free_group(struct iopf_group *group)
/* Pair with iommu_report_device_fault(). */ iopf_put_dev_fault_param(group->fault_param); + iopf_pasid_cookie_put(group->cookie); }
void iopf_free_group(struct iopf_group *group) @@ -156,30 +157,6 @@ void iopf_free_group(struct iopf_group *group) } EXPORT_SYMBOL_GPL(iopf_free_group);
-static struct iommu_domain *get_domain_for_iopf(struct device *dev, - struct iommu_fault *fault) -{ - struct iommu_domain *domain; - - if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { - domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0); - if (IS_ERR(domain)) - domain = NULL; - } else { - domain = iommu_get_domain_for_dev(dev); - } - - if (!domain || !domain->iopf_handler) { - dev_warn_ratelimited(dev, - "iopf (pasid %d) without domain attached or handler installed\n", - fault->prm.pasid); - - return NULL; - } - - return domain; -} - /* Non-last request of a group. Postpone until the last one. */ static int report_partial_fault(struct iommu_fault_param *fault_param, struct iommu_fault *fault) @@ -199,10 +176,20 @@ static int report_partial_fault(struct iommu_fault_param *fault_param, return 0; }
+static ioasid_t fault_to_pasid(struct iommu_fault *fault) +{ + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) + return fault->prm.pasid; + + return IOMMU_NO_PASID; +} + static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, struct iopf_fault *evt, struct iopf_group *abort_group) { + ioasid_t pasid = fault_to_pasid(&evt->fault); + struct iopf_attach_cookie *cookie; struct iopf_fault *iopf, *next; struct iopf_group *group;
@@ -215,7 +202,23 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, group = abort_group; }
+ cookie = iopf_pasid_cookie_get(iopf_param->dev, pasid); + if (!cookie && pasid != IOMMU_NO_PASID) + cookie = iopf_pasid_cookie_get(iopf_param->dev, IOMMU_NO_PASID); + if (IS_ERR(cookie) || !cookie) { + /* + * The PASID of this device was not attached by an I/O-capable + * domain. Ask the caller to abort handling of this fault. + * Otherwise, the reference count will be switched to the new + * iopf group and will be released in iopf_free_group(). + */ + kfree(group); + group = abort_group; + cookie = NULL; + } + group->fault_param = iopf_param; + group->cookie = cookie; group->last_fault.fault = evt->fault; INIT_LIST_HEAD(&group->faults); INIT_LIST_HEAD(&group->pending_node); @@ -303,15 +306,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) if (group == &abort_group) goto err_abort;
- group->domain = get_domain_for_iopf(dev, fault); - if (!group->domain) - goto err_abort; - /* * On success iopf_handler must call iopf_group_response() and * iopf_free_group() */ - if (group->domain->iopf_handler(group)) + if (group->cookie->domain->iopf_handler(group)) goto err_abort;
return; diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 640acc804e8c..0b3369d78389 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -51,6 +51,39 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de return iommu_mm; }
+static void release_attach_cookie(struct iopf_attach_cookie *cookie) +{ + struct iommu_domain *domain = cookie->domain; + + mutex_lock(&iommu_sva_lock); + if (--domain->users == 0) { + list_del(&domain->next); + iommu_domain_free(domain); + } + mutex_unlock(&iommu_sva_lock); + + kfree(cookie); +} + +static int sva_attach_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + struct iopf_attach_cookie *cookie; + int ret; + + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); + if (!cookie) + return -ENOMEM; + + cookie->release = release_attach_cookie; + + ret = iopf_domain_attach(domain, dev, pasid, cookie); + if (ret) + kfree(cookie); + + return ret; +} + /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device @@ -99,7 +132,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
/* Search for an existing domain. */ list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { - ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); + ret = sva_attach_device_pasid(domain, dev, iommu_mm->pasid); if (!ret) { domain->users++; goto out; @@ -113,7 +146,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_free_handle; }
- ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); + ret = sva_attach_device_pasid(domain, dev, iommu_mm->pasid); if (ret) goto out_free_domain; domain->users = 1; @@ -157,13 +190,8 @@ void iommu_sva_unbind_device(struct iommu_sva *handle) return; } list_del(&handle->handle_item); - - iommu_detach_device_pasid(domain, dev, iommu_mm->pasid); - if (--domain->users == 0) { - list_del(&domain->next); - iommu_domain_free(domain); - } mutex_unlock(&iommu_sva_lock); + kfree(handle); } EXPORT_SYMBOL_GPL(iommu_sva_unbind_device); @@ -259,7 +287,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work) if (status != IOMMU_PAGE_RESP_SUCCESS) break;
- status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm); + status = iommu_sva_handle_mm(&iopf->fault, + group->cookie->domain->mm); }
iopf_group_response(group, status); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2432b98d71ff..51298f3c530c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -138,9 +138,9 @@ struct iopf_group { /* list node for iommu_fault_param::faults */ struct list_head pending_node; struct work_struct work; - struct iommu_domain *domain; /* The device's fault data parameter. */ struct iommu_fault_param *fault_param; + struct iopf_attach_cookie *cookie; };
/**
From: Lu Baolu baolu.lu@linux.intel.com
iommu_hwpt_pgfaults represent fault messages that the userspace can retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group, with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last iommu_hwpt_pgfault.
An iommu_hwpt_page_response is a response message that the userspace should send to the kernel after finishing handling a group of fault messages. The @dev_id, @pasid, and @grpid fields in the message identify an outstanding iopf group for a device. The @addr field, which matches the fault address of the last fault in the group, will be used by the kernel for a sanity check.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com --- include/uapi/linux/iommufd.h | 67 ++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 1dfeaa2e649e..d59e839ae49e 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -692,4 +692,71 @@ struct iommu_hwpt_invalidate { __u32 __reserved; }; #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) + +/** + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is + * valid. + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group. + */ +enum iommu_hwpt_pgfault_flags { + IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0), + IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1), +}; + +/** + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_PERM_READ: request for read permission + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission + * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission + * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission + */ +enum iommu_hwpt_pgfault_perm { + IOMMU_PGFAULT_PERM_READ = (1 << 0), + IOMMU_PGFAULT_PERM_WRITE = (1 << 1), + IOMMU_PGFAULT_PERM_EXEC = (1 << 2), + IOMMU_PGFAULT_PERM_PRIV = (1 << 3), +}; + +/** + * struct iommu_hwpt_pgfault - iommu page fault data + * @size: sizeof(struct iommu_hwpt_pgfault) + * @flags: Combination of enum iommu_hwpt_pgfault_flags + * @dev_id: id of the originated device + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @perm: Combination of enum iommu_hwpt_pgfault_perm + * @addr: page address + */ +struct iommu_hwpt_pgfault { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; +}; + +/** + * struct iommu_hwpt_page_response - IOMMU page fault response + * @size: sizeof(struct iommu_hwpt_page_response) + * @flags: Must be set to 0 + * @dev_id: device ID of target device for the response + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @code: response code. The supported codes include: + * 0: Successful; 1: Response Failure; 2: Invalid Request. + * @addr: The fault address. Must match the addr field of the + * last iommu_hwpt_pgfault of a reported iopf group. + */ +struct iommu_hwpt_page_response { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 code; + __u64 addr; +}; #endif
From: Lu Baolu baolu.lu@linux.intel.com
An iommufd fault object provides an interface for delivering I/O page faults to user space. These objects are created and destroyed by user space, and they can be associated with or dissociated from hardware page table objects during page table allocation or destruction.
User space interacts with the fault object through a file interface. This interface offers a straightforward and efficient way for user space to handle page faults. It allows user space to read fault messages sequentially and respond to them by writing to the same file. The file interface supports reading messages in poll mode, so it's recommended that user space applications use io_uring to enhance read and write efficiency.
A fault object can be associated with any iopf-capable iommufd_hw_pgtable during the pgtable's allocation. All I/O page faults triggered by devices when accessing the I/O addresses of an iommufd_hw_pgtable are routed through the fault object to user space. Similarly, user space's responses to these page faults are routed back to the iommu device driver through the same fault object.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/fault.c | 255 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 23 +++ drivers/iommu/iommufd/main.c | 6 + include/linux/iommu.h | 2 + include/uapi/linux/iommufd.h | 18 ++ 7 files changed, 306 insertions(+) create mode 100644 drivers/iommu/iommufd/fault.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 34b446146961..b94a74366eed 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -6,6 +6,7 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ + fault.o \ vfio_compat.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 873630c111c1..d70913ee8fdf 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, refcount_inc(&idev->obj.users); /* igroup refcount moves into iommufd_device */ idev->igroup = igroup; + xa_init(&idev->faults);
/* * If the caller fails after this success it must call diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c new file mode 100644 index 000000000000..9844a85feeb2 --- /dev/null +++ b/drivers/iommu/iommufd/fault.c @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2024 Intel Corporation + */ +#define pr_fmt(fmt) "iommufd: " fmt + +#include <linux/file.h> +#include <linux/fs.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/iommufd.h> +#include <linux/poll.h> +#include <linux/anon_inodes.h> +#include <uapi/linux/iommufd.h> + +#include "iommufd_private.h" + +static int device_add_fault(struct iopf_group *group) +{ + struct iommufd_device *idev = group->cookie->private; + void *curr; + + curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid, + NULL, group, GFP_KERNEL); + + return curr ? xa_err(curr) : 0; +} + +static void device_remove_fault(struct iopf_group *group) +{ + struct iommufd_device *idev = group->cookie->private; + + xa_store(&idev->faults, group->last_fault.fault.prm.grpid, + NULL, GFP_KERNEL); +} + +static struct iopf_group *device_get_fault(struct iommufd_device *idev, + unsigned long grpid) +{ + return xa_load(&idev->faults, grpid); +} + +void iommufd_fault_destroy(struct iommufd_object *obj) +{ + struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj); + struct iopf_group *group, *next; + + mutex_lock(&fault->mutex); + list_for_each_entry_safe(group, next, &fault->deliver, node) { + list_del(&group->node); + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); + iopf_free_group(group); + } + list_for_each_entry_safe(group, next, &fault->response, node) { + list_del(&group->node); + device_remove_fault(group); + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); + iopf_free_group(group); + } + mutex_unlock(&fault->mutex); + + mutex_destroy(&fault->mutex); +} + +static void iommufd_compose_fault_message(struct iommu_fault *fault, + struct iommu_hwpt_pgfault *hwpt_fault, + struct iommufd_device *idev) +{ + hwpt_fault->size = sizeof(*hwpt_fault); + hwpt_fault->flags = fault->prm.flags; + hwpt_fault->dev_id = idev->obj.id; + hwpt_fault->pasid = fault->prm.pasid; + hwpt_fault->grpid = fault->prm.grpid; + hwpt_fault->perm = fault->prm.perm; + hwpt_fault->addr = fault->prm.addr; +} + +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, + size_t count, loff_t *ppos) +{ + size_t fault_size = sizeof(struct iommu_hwpt_pgfault); + struct iommufd_fault *fault = filep->private_data; + struct iommu_hwpt_pgfault data; + struct iommufd_device *idev; + struct iopf_group *group; + struct iopf_fault *iopf; + size_t done = 0; + int rc; + + if (*ppos || count % fault_size) + return -ESPIPE; + + mutex_lock(&fault->mutex); + while (!list_empty(&fault->deliver) && count > done) { + group = list_first_entry(&fault->deliver, + struct iopf_group, node); + + if (list_count_nodes(&group->faults) * fault_size > count - done) + break; + + idev = (struct iommufd_device *)group->cookie->private; + list_for_each_entry(iopf, &group->faults, list) { + iommufd_compose_fault_message(&iopf->fault, &data, idev); + rc = copy_to_user(buf + done, &data, fault_size); + if (rc) + goto err_unlock; + done += fault_size; + } + + rc = device_add_fault(group); + if (rc) + goto err_unlock; + + list_move_tail(&group->node, &fault->response); + } + mutex_unlock(&fault->mutex); + + return done; +err_unlock: + mutex_unlock(&fault->mutex); + return rc; +} + +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf, + size_t count, loff_t *ppos) +{ + size_t response_size = sizeof(struct iommu_hwpt_page_response); + struct iommufd_fault *fault = filep->private_data; + struct iommu_hwpt_page_response response; + struct iommufd_device *idev; + struct iopf_group *group; + size_t done = 0; + int rc; + + if (*ppos || count % response_size) + return -ESPIPE; + + while (!list_empty(&fault->response) && count > done) { + rc = copy_from_user(&response, buf + done, response_size); + if (rc) + break; + + idev = container_of(iommufd_get_object(fault->ictx, + response.dev_id, + IOMMUFD_OBJ_DEVICE), + struct iommufd_device, obj); + if (IS_ERR(idev)) + break; + + group = device_get_fault(idev, response.grpid); + if (!group || + response.addr != group->last_fault.fault.prm.addr || + ((group->last_fault.fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) && + response.pasid != group->last_fault.fault.prm.pasid)) { + iommufd_put_object(fault->ictx, &idev->obj); + break; + } + + iopf_group_response(group, response.code); + + mutex_lock(&fault->mutex); + list_del(&group->node); + mutex_unlock(&fault->mutex); + + device_remove_fault(group); + iopf_free_group(group); + done += response_size; + + iommufd_put_object(fault->ictx, &idev->obj); + } + + return done; +} + +static __poll_t iommufd_fault_fops_poll(struct file *filep, + struct poll_table_struct *wait) +{ + struct iommufd_fault *fault = filep->private_data; + __poll_t pollflags = 0; + + poll_wait(filep, &fault->wait_queue, wait); + mutex_lock(&fault->mutex); + if (!list_empty(&fault->deliver)) + pollflags = EPOLLIN | EPOLLRDNORM; + mutex_unlock(&fault->mutex); + + return pollflags; +} + +static const struct file_operations iommufd_fault_fops = { + .owner = THIS_MODULE, + .open = nonseekable_open, + .read = iommufd_fault_fops_read, + .write = iommufd_fault_fops_write, + .poll = iommufd_fault_fops_poll, + .llseek = no_llseek, +}; + +static int get_fault_fd(struct iommufd_fault *fault) +{ + struct file *filep; + int fdno; + + fdno = get_unused_fd_flags(O_CLOEXEC); + if (fdno < 0) + return fdno; + + filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops, + fault, O_RDWR); + if (IS_ERR(filep)) { + put_unused_fd(fdno); + return PTR_ERR(filep); + } + + fd_install(fdno, filep); + + return fdno; +} + +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_fault_alloc *cmd = ucmd->cmd; + struct iommufd_fault *fault; + int rc; + + if (cmd->flags) + return -EOPNOTSUPP; + + fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT); + if (IS_ERR(fault)) + return PTR_ERR(fault); + + fault->ictx = ucmd->ictx; + INIT_LIST_HEAD(&fault->deliver); + INIT_LIST_HEAD(&fault->response); + mutex_init(&fault->mutex); + init_waitqueue_head(&fault->wait_queue); + + rc = get_fault_fd(fault); + if (rc < 0) + goto out_abort; + + cmd->out_fault_id = fault->obj.id; + cmd->out_fault_fd = rc; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &fault->obj); + + return 0; +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj); + + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 991f864d1f9b..52d83e888bd0 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -128,6 +128,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_HWPT_NESTED, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, + IOMMUFD_OBJ_FAULT, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -395,6 +396,8 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + /* outstanding faults awaiting response indexed by fault group id */ + struct xarray faults; };
static inline struct iommufd_device * @@ -426,6 +429,26 @@ void iopt_remove_access(struct io_pagetable *iopt, u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj);
+/* + * An iommufd_fault object represents an interface to deliver I/O page faults + * to the user space. These objects are created/destroyed by the user space and + * associated with hardware page table objects during page-table allocation. + */ +struct iommufd_fault { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + + /* The lists of outstanding faults protected by below mutex. */ + struct mutex mutex; + struct list_head deliver; + struct list_head response; + + struct wait_queue_head wait_queue; +}; + +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); +void iommufd_fault_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 39b32932c61e..792db077d33e 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -332,6 +332,7 @@ union ucmd_buffer { struct iommu_ioas_unmap unmap; struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; + struct iommu_fault_alloc fault; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { val64), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), + IOCTL_OP(IOMMU_FAULT_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc, + out_fault_fd), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_hwpt_nested_destroy, .abort = iommufd_hwpt_nested_abort, }, + [IOMMUFD_OBJ_FAULT] = { + .destroy = iommufd_fault_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 51298f3c530c..989a287a2109 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -141,6 +141,8 @@ struct iopf_group { /* The device's fault data parameter. */ struct iommu_fault_param *fault_param; struct iopf_attach_cookie *cookie; + /* Used by handler provider to hook the group on its own lists. */ + struct list_head node; };
/** diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d59e839ae49e..c32d62b02306 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -50,6 +50,7 @@ enum { IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, IOMMUFD_CMD_HWPT_INVALIDATE, + IOMMUFD_CMD_FAULT_ALLOC, };
/** @@ -759,4 +760,21 @@ struct iommu_hwpt_page_response { __u32 code; __u64 addr; }; + +/** + * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_ALLOC) + * @size: sizeof(struct iommu_fault_alloc) + * @flags: Must be 0 + * @out_fault_id: The ID of the new FAULT + * @out_fault_fd: The fd of the new FAULT + * + * Explicitly allocate a fault handling object. + */ +struct iommu_fault_alloc { + __u32 size; + __u32 flags; + __u32 out_fault_id; + __u32 out_fault_fd; +}; +#define IOMMU_FAULT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_ALLOC) #endif
From: Lu Baolu baolu.lu@linux.intel.com
When allocating a user iommufd_hw_pagetable, the user space is allowed to associate a fault object with the hw_pagetable by specifying the fault object ID in the page table allocation data and setting the IOMMU_HWPT_FAULT_ID_VALID flag bit.
On a successful return of hwpt allocation, the user can retrieve and respond to page faults by reading and writing the file interface of the fault object.
Once a fault object has been associated with a hwpt, the hwpt is iopf-capable, indicated by fault_capable set to true. Attaching, detaching, or replacing an iopf-capable hwpt to an RID or PASID will differ from those that are not iopf-capable. The implementation of these will be introduced in the next patch.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/fault.c | 14 ++++++++++ drivers/iommu/iommufd/hw_pagetable.c | 36 +++++++++++++++++++------ drivers/iommu/iommufd/iommufd_private.h | 11 ++++++++ include/uapi/linux/iommufd.h | 6 +++++ 4 files changed, 59 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 9844a85feeb2..e752d1c49dde 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -253,3 +253,17 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
return rc; } + +int iommufd_fault_iopf_handler(struct iopf_group *group) +{ + struct iommufd_hw_pagetable *hwpt = group->cookie->domain->fault_data; + struct iommufd_fault *fault = hwpt->fault; + + mutex_lock(&fault->mutex); + list_add_tail(&group->node, &fault->deliver); + mutex_unlock(&fault->mutex); + + wake_up_interruptible(&fault->wait_queue); + + return 0; +} diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 33d142f8057d..95ea943f5be5 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) +{ + if (hwpt->domain) + iommu_domain_free(hwpt->domain); + + if (hwpt->fault) + iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj); +} + void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) { struct iommufd_hwpt_paging *hwpt_paging = @@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) hwpt_paging->common.domain); }
- if (hwpt_paging->common.domain) - iommu_domain_free(hwpt_paging->common.domain); - + __iommufd_hwpt_destroy(&hwpt_paging->common); refcount_dec(&hwpt_paging->ioas->obj.users); }
@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) struct iommufd_hwpt_nested *hwpt_nested = container_of(obj, struct iommufd_hwpt_nested, common.obj);
- if (hwpt_nested->common.domain) - iommu_domain_free(hwpt_nested->common.domain); - + __iommufd_hwpt_destroy(&hwpt_nested->common); refcount_dec(&hwpt_nested->parent->common.obj.users); }
@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt; int rc;
- if (flags || !user_data->len || !ops->domain_alloc_user) + if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || + !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (parent->auto_domain || !parent->nest_parent) return ERR_PTR(-EINVAL); @@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, refcount_inc(&parent->common.obj.users); hwpt_nested->parent = parent;
- hwpt->domain = ops->domain_alloc_user(idev->dev, flags, + hwpt->domain = ops->domain_alloc_user(idev->dev, 0, parent->common.domain, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); @@ -308,6 +314,20 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_pt; }
+ if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { + struct iommufd_fault *fault; + + fault = iommufd_get_fault(ucmd, cmd->fault_id); + if (IS_ERR(fault)) { + rc = PTR_ERR(fault); + goto out_hwpt; + } + hwpt->fault = fault; + hwpt->domain->iopf_handler = iommufd_fault_iopf_handler; + hwpt->domain->fault_data = hwpt; + hwpt->fault_capable = true; + } + cmd->out_hwpt_id = hwpt->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 52d83e888bd0..2780bed0c6b1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -293,6 +293,8 @@ int iommufd_check_iova_range(struct io_pagetable *iopt, struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain; + struct iommufd_fault *fault; + bool fault_capable : 1; };
struct iommufd_hwpt_paging { @@ -446,8 +448,17 @@ struct iommufd_fault { struct wait_queue_head wait_queue; };
+static inline struct iommufd_fault * +iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_FAULT), + struct iommufd_fault, obj); +} + int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); +int iommufd_fault_iopf_handler(struct iopf_group *group);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c32d62b02306..7481cdd57027 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,10 +357,13 @@ struct iommu_vfio_ioas { * the parent HWPT in a nesting configuration. * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is * enforced on device attachment + * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is + * valid. */ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, + IOMMU_HWPT_FAULT_ID_VALID = 1 << 2, };
/** @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type { * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type * @data_len: Length of the type specific data + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of + * IOMMU_HWPT_FAULT_ID_VALID is set. * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc { __u32 __reserved; __u32 data_type; __u32 data_len; + __u32 fault_id; __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
From: Lu Baolu baolu.lu@linux.intel.com
The iopf-capable hw page table attach/detach/replace should use the iommu iopf-specific interfaces. The pointer to iommufd_device is stored in the private field of the attachment cookie, so that it can be easily retrieved in the fault handling paths. The references to iommufd_device and iommufd_hw_pagetable objects are held until the cookie is released, which happens after the hw_pagetable is detached from the device and all outstanding iopf's are responded to. This guarantees that both the device and hw_pagetable are valid before domain detachment and outstanding faults are handled.
The iopf-capable hw page tables can only be attached to devices that support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an iopf-capable hw_pagetable to the device, the IOPF feature is enabled on the device. Similarly, after the last iopf-capable hwpt is detached from the device, the IOPF feature is disabled on the device.
The current implementation allows a replacement between iopf-capable and non-iopf-capable hw page tables. This matches the nested translation use case, where a parent domain is attached by default and can then be replaced with a nested user domain with iopf support.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 15 ++- drivers/iommu/iommufd/fault.c | 122 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 7 ++ 3 files changed, 141 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index d70913ee8fdf..c4737e876ebc 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -377,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { - rc = iommu_attach_group(hwpt->domain, idev->igroup->group); + if (hwpt->fault_capable) + rc = iommufd_fault_domain_attach_dev(hwpt, idev); + else + rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; idev->igroup->hwpt = hwpt; @@ -403,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) mutex_lock(&idev->igroup->lock); list_del(&idev->group_item); if (list_empty(&idev->igroup->device_list)) { - iommu_detach_group(hwpt->domain, idev->igroup->group); + if (hwpt->fault_capable) + iommufd_fault_domain_detach_dev(hwpt, idev); + else + iommu_detach_group(hwpt->domain, idev->igroup->group); idev->igroup->hwpt = NULL; } if (hwpt_is_paging(hwpt)) @@ -498,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, goto err_unlock; }
- rc = iommu_group_replace_domain(igroup->group, hwpt->domain); + if (old_hwpt->fault_capable || hwpt->fault_capable) + rc = iommufd_fault_domain_replace_dev(hwpt, idev); + else + rc = iommu_group_replace_domain(igroup->group, hwpt->domain); if (rc) goto err_unresv;
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index e752d1c49dde..a4a49f3cd4c2 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -267,3 +267,125 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
return 0; } + +static void release_attach_cookie(struct iopf_attach_cookie *cookie) +{ + struct iommufd_hw_pagetable *hwpt = cookie->domain->fault_data; + struct iommufd_device *idev = cookie->private; + + refcount_dec(&idev->obj.users); + refcount_dec(&hwpt->obj.users); + kfree(cookie); +} + +static int iommufd_fault_iopf_enable(struct iommufd_device *idev) +{ + int ret; + + if (idev->iopf_enabled) + return 0; + + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF); + if (ret) + return ret; + + idev->iopf_enabled = true; + + return 0; +} + +static void iommufd_fault_iopf_disable(struct iommufd_device *idev) +{ + if (!idev->iopf_enabled) + return; + + iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF); + idev->iopf_enabled = false; +} + +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + struct iopf_attach_cookie *cookie; + int ret; + + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); + if (!cookie) + return -ENOMEM; + + refcount_inc(&hwpt->obj.users); + refcount_inc(&idev->obj.users); + cookie->release = release_attach_cookie; + cookie->private = idev; + + if (!idev->iopf_enabled) { + ret = iommufd_fault_iopf_enable(idev); + if (ret) + goto out_put_cookie; + } + + ret = iopf_domain_attach(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie); + if (ret) + goto out_disable_iopf; + + return 0; +out_disable_iopf: + iommufd_fault_iopf_disable(idev); +out_put_cookie: + release_attach_cookie(cookie); + + return ret; +} + +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + iopf_domain_detach(hwpt->domain, idev->dev, IOMMU_NO_PASID); + iommufd_fault_iopf_disable(idev); +} + +int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + bool iopf_enabled_originally = idev->iopf_enabled; + struct iopf_attach_cookie *cookie = NULL; + int ret; + + if (hwpt->fault_capable) { + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); + if (!cookie) + return -ENOMEM; + + refcount_inc(&hwpt->obj.users); + refcount_inc(&idev->obj.users); + cookie->release = release_attach_cookie; + cookie->private = idev; + + if (!idev->iopf_enabled) { + ret = iommufd_fault_iopf_enable(idev); + if (ret) { + release_attach_cookie(cookie); + return ret; + } + } + } + + ret = iopf_domain_replace(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie); + if (ret) { + goto out_put_cookie; + } + + if (iopf_enabled_originally && !hwpt->fault_capable) + iommufd_fault_iopf_disable(idev); + + return 0; +out_put_cookie: + if (hwpt->fault_capable) + release_attach_cookie(cookie); + if (iopf_enabled_originally && !idev->iopf_enabled) + iommufd_fault_iopf_enable(idev); + else if (!iopf_enabled_originally && idev->iopf_enabled) + iommufd_fault_iopf_disable(idev); + + return ret; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 2780bed0c6b1..9844a1289c01 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -398,6 +398,7 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + bool iopf_enabled; /* outstanding faults awaiting response indexed by fault group id */ struct xarray faults; }; @@ -459,6 +460,12 @@ iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); int iommufd_fault_iopf_handler(struct iopf_group *group); +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd);
From: Lu Baolu baolu.lu@linux.intel.com
Extend the selftest mock device to support generating and responding to an IOPF. Also add an ioctl interface to userspace applications to trigger the IOPF on the mock device. This would allow userspace applications to test the IOMMUFD's handling of IOPFs without having to rely on any real hardware.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_test.h | 8 ++++ drivers/iommu/iommufd/selftest.c | 63 ++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index e854d3f67205..acbbba1c6671 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -22,6 +22,7 @@ enum { IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, + IOMMU_TEST_OP_TRIGGER_IOPF, };
enum { @@ -127,6 +128,13 @@ struct iommu_test_cmd { __u32 id; __u32 iotlb; } check_iotlb; + struct { + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; + } trigger_iopf; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 7a2199470f31..53a3be9047a5 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -504,6 +504,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) return false; }
+static struct iopf_queue *mock_iommu_iopf_queue; + static struct iommu_device mock_iommu_device = { };
@@ -514,6 +516,29 @@ static struct iommu_device *mock_probe_device(struct device *dev) return &mock_iommu_device; }
+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, + struct iommu_page_response *msg) +{ +} + +static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + return iopf_queue_add_device(mock_iommu_iopf_queue, dev); +} + +static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + iopf_queue_remove_device(mock_iommu_iopf_queue, dev); + + return 0; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -529,6 +554,9 @@ static const struct iommu_ops mock_ops = { .capable = mock_domain_capable, .device_group = generic_device_group, .probe_device = mock_probe_device, + .page_response = mock_domain_page_response, + .dev_enable_feat = mock_dev_enable_feat, + .dev_disable_feat = mock_dev_disable_feat, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free, @@ -1375,6 +1403,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, return rc; }
+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct iopf_fault event = { }; + struct iommufd_device *idev; + + idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; + if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID) + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + event.fault.type = IOMMU_FAULT_PAGE_REQ; + event.fault.prm.addr = cmd->trigger_iopf.addr; + event.fault.prm.pasid = cmd->trigger_iopf.pasid; + event.fault.prm.grpid = cmd->trigger_iopf.grpid; + event.fault.prm.perm = cmd->trigger_iopf.perm; + + iommu_report_device_fault(idev->dev, &event); + iommufd_put_object(ucmd->ictx, &idev->obj); + + return 0; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); @@ -1450,6 +1503,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.page_size, u64_to_user_ptr(cmd->dirty.uptr), cmd->dirty.flags); + case IOMMU_TEST_OP_TRIGGER_IOPF: + return iommufd_test_trigger_iopf(ucmd, cmd); default: return -EOPNOTSUPP; } @@ -1491,6 +1546,9 @@ int __init iommufd_test_init(void) &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs; + + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq"); + return 0;
err_sysfs: @@ -1506,6 +1564,11 @@ int __init iommufd_test_init(void)
void iommufd_test_exit(void) { + if (mock_iommu_iopf_queue) { + iopf_queue_free(mock_iommu_iopf_queue); + mock_iommu_iopf_queue = NULL; + } + iommu_device_sysfs_remove(&mock_iommu_device); iommu_device_unregister_bus(&mock_iommu_device, &iommufd_mock_bus_type.bus,
From: Lu Baolu baolu.lu@linux.intel.com
Extend the selftest tool to add coverage of testing IOPF handling. This would include the following tests:
- Allocating and destorying an iommufd fault object. - Allocating and destroying an IOPF-capable HWPT. - Attaching/detaching/replacing an IOPF-capable HWPT on a device. - Triggering an IOPF on the mock device. - Retrieving and responding to the IOPF through the file interface.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- tools/testing/selftests/iommu/iommufd.c | 17 ++++ .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++++++++++++-- 3 files changed, 96 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index edf1c99c9936..17cc2f672284 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) uint32_t parent_hwpt_id = 0; uint32_t parent_hwpt_id_not_work = 0; uint32_t test_hwpt_id = 0; + uint32_t iopf_hwpt_id; + uint32_t fault_id; + uint32_t fault_fd;
if (self->device_id) { /* Negative tests */ @@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) sizeof(data));
/* Allocate two nested hwpts sharing one common parent hwpt */ + test_ioctl_fault_alloc(&fault_id, &fault_fd); test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, &nested_hwpt_id[0], IOMMU_HWPT_DATA_SELFTEST, &data, @@ -334,6 +338,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) &nested_hwpt_id[1], IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id, + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], IOMMU_TEST_IOTLB_DEFAULT); test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], @@ -504,14 +512,23 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); test_ioctl_destroy(nested_hwpt_id[0]);
+ /* Switch from nested_hwpt_id[1] to iopf_hwpt_id */ + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); + /* Trigger an IOPF on the device */ + test_cmd_trigger_iopf(self->device_id, fault_fd); + /* Detach from nested_hwpt_id[1] and destroy it */ test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); test_ioctl_destroy(nested_hwpt_id[1]); + test_ioctl_destroy(iopf_hwpt_id);
/* Detach from the parent hw_pagetable and destroy it */ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); test_ioctl_destroy(parent_hwpt_id); test_ioctl_destroy(parent_hwpt_id_not_work); + test_ioctl_destroy(fault_id); } else { test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0, &parent_hwpt_id); diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index f590417cd67a..c5d5e69452b0 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL)) return -1;
- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id, + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id, IOMMU_HWPT_DATA_NONE, 0, 0)) return -1;
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 8d2b46b2114d..9be62e5781e7 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \ pt_id, NULL))
-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, +static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id, __u32 flags, __u32 *hwpt_id, __u32 data_type, void *data, size_t data_len) { @@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, .data_type = data_type, .data_len = data_len, .data_uptr = (uint64_t)data, + .fault_id = ft_id, }; int ret;
@@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, }
#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \ 0)) #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \ - self->fd, device_id, pt_id, flags, \ + self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))
#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len)) #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len) \ EXPECT_ERRNO(_errno, \ - _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len))
+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id, \ + data_type, data, data_len) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \ + flags, hwpt_id, data_type, data, \ + data_len)) + #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \ ({ \ struct iommu_test_cmd test_cmd = { \ @@ -684,3 +691,69 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
#define test_cmd_get_hw_capabilities(device_id, caps, mask) \ ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps)) + +static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd) +{ + struct iommu_fault_alloc cmd = { + .size = sizeof(cmd), + }; + int ret; + + ret = ioctl(fd, IOMMU_FAULT_ALLOC, &cmd); + if (ret) + return ret; + *fault_id = cmd.out_fault_id; + *fault_fd = cmd.out_fault_fd; + return 0; +} + +#define test_ioctl_fault_alloc(fault_id, fault_fd) \ + ({ \ + ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \ + fault_fd)); \ + ASSERT_NE(0, *(fault_id)); \ + ASSERT_NE(0, *(fault_fd)); \ + }) + +static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd) +{ + struct iommu_test_cmd trigger_iopf_cmd = { + .size = sizeof(trigger_iopf_cmd), + .op = IOMMU_TEST_OP_TRIGGER_IOPF, + .trigger_iopf = { + .dev_id = device_id, + .pasid = 0x1, + .grpid = 0x2, + .perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE, + .addr = 0xdeadbeaf, + }, + }; + struct iommu_hwpt_page_response response = { + .size = sizeof(struct iommu_hwpt_page_response), + .dev_id = device_id, + .pasid = 0x1, + .grpid = 0x2, + .code = 0, + .addr = 0xdeadbeaf, + }; + struct iommu_hwpt_pgfault fault = {}; + ssize_t bytes; + int ret; + + ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd); + if (ret) + return ret; + + bytes = read(fault_fd, &fault, sizeof(fault)); + if (bytes < 0) + return bytes; + + bytes = write(fault_fd, &response, sizeof(response)); + if (bytes < 0) + return bytes; + + return 0; +} + +#define test_cmd_trigger_iopf(device_id, fault_fd) \ + ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
From: Jason Gunthorpe jgg@nvidia.com
As the comment in arm_smmu_write_strtab_ent() explains, this routine has been limited to only work correctly in certain scenarios that the caller must ensure. Generally the caller must put the STE into ABORT or BYPASS before attempting to program it to something else.
The iommu core APIs would ideally expect the driver to do a hitless change of iommu_domain in a number of cases:
- RESV_DIRECT support wants IDENTITY -> DMA -> IDENTITY to be hitless for the RESV ranges
- PASID upgrade has IDENTIY on the RID with no PASID then a PASID paging domain installed. The RID should not be impacted
- PASID downgrade has IDENTIY on the RID and all PASID's removed. The RID should not be impacted
- RID does PAGING -> BLOCKING with active PASID, PASID's should not be impacted
- NESTING -> NESTING for carrying all the above hitless cases in a VM into the hypervisor. To comprehensively emulate the HW in a VM we should assume the VM OS is running logic like this and expecting hitless updates to be relayed to real HW.
For CD updates arm_smmu_write_ctx_desc() has a similar comment explaining how limited it is, and the driver does have a need for hitless CD updates:
- SMMUv3 BTM S1 ASID re-label
- SVA mm release should change the CD to answert not-present to all requests without allowing logging (EPD0)
The next patches/series are going to start removing some of this logic from the callers, and add more complex state combinations than currently. At the end everything that can be hitless will be hitless, including all of the above.
Introduce arm_smmu_write_ste() which will run through the multi-qword programming sequence to avoid creating an incoherent 'torn' STE in the HW caches. It automatically detects which of two algorithms to use:
1) The disruptive V=0 update described in the spec which disrupts the entry and does three syncs to make the change: - Write V=0 to QWORD 0 - Write the entire STE except QWORD 0 - Write QWORD 0
2) A hitless update algorithm that follows the same rational that the driver already uses. It is safe to change IGNORED bits that HW doesn't use: - Write the target value into all currently unused bits - Write a single QWORD, this makes the new STE live atomically - Ensure now unused bits are 0
The detection of which path to use and the implementation of the hitless update rely on a "used bitmask" describing what bits the HW is actually using based on the V/CFG/etc bits. This flows from the spec language, typically indicated as IGNORED.
Knowing which bits the HW is using we can update the bits it does not use and then compute how many QWORDS need to be changed. If only one qword needs to be updated the hitless algorithm is possible.
Later patches will include CD updates in this mechanism so make the implementation generic using a struct arm_smmu_entry_writer and struct arm_smmu_entry_writer_ops to abstract the differences between STE and CD to be plugged in.
At this point it generates the same sequence of updates as the current code, except that zeroing the VMID on entry to BYPASS/ABORT will do an extra sync (this seems to be an existing bug).
Going forward this will use a V=0 transition instead of cycling through ABORT if a hitfull change is required. This seems more appropriate as ABORT will fail DMAs without any logging, but dropping a DMA due to transient V=0 is probably signaling a bug, so the C_BAD_STE is valuable.
Add STRTAB_STE_1_SHCFG_INCOMING to s2_cfg, this was editing the STE in place and subtly inherited the value of data[1] from abort/bypass.
Signed-off-by: Michael Shavit mshavit@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 275 +++++++++++++++----- 1 file changed, 211 insertions(+), 64 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3d3e0c935023..2ee1387ab5a0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -106,6 +106,9 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
+static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, + ioasid_t sid); + static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { ARM_SMMU_EVTQ_IRQ_CFG0, @@ -1182,6 +1185,199 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); }
+/* + * Based on the value of ent report which bits of the STE the HW will access. It + * would be nice if this was complete according to the spec, but minimally it + * has to capture the bits this driver uses. + */ +static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, + struct arm_smmu_ste *used_bits) +{ + unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0])); + + used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V); + if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V))) + return; + + used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG); + + /* S1 translates */ + if (cfg & BIT(0)) { + used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | + STRTAB_STE_0_S1CTXPTR_MASK | + STRTAB_STE_0_S1CDMAX); + used_bits->data[1] |= + cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | + STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | + STRTAB_STE_1_EATS); + used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + } + + /* S2 translates */ + if (cfg & BIT(1)) { + used_bits->data[1] |= + cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); + used_bits->data[2] |= + cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | + STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | + STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R); + used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); + } + + if (cfg == STRTAB_STE_0_CFG_BYPASS) + used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); +} + +/* + * Figure out if we can do a hitless update of entry to become target. Returns a + * bit mask where 1 indicates that qword needs to be set disruptively. + * unused_update is an intermediate value of entry that has unused bits set to + * their new values. + */ +static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target, + struct arm_smmu_ste *unused_update) +{ + struct arm_smmu_ste target_used = {}; + struct arm_smmu_ste cur_used = {}; + u8 used_qword_diff = 0; + unsigned int i; + + arm_smmu_get_ste_used(entry, &cur_used); + arm_smmu_get_ste_used(target, &target_used); + + for (i = 0; i != ARRAY_SIZE(target_used.data); i++) { + /* + * Check that masks are up to date, the make functions are not + * allowed to set a bit to 1 if the used function doesn't say it + * is used. + */ + WARN_ON_ONCE(target->data[i] & ~target_used.data[i]); + + /* Bits can change because they are not currently being used */ + unused_update->data[i] = (entry->data[i] & cur_used.data[i]) | + (target->data[i] & ~cur_used.data[i]); + /* + * Each bit indicates that a used bit in a qword needs to be + * changed after unused_update is applied. + */ + if ((unused_update->data[i] & target_used.data[i]) != + target->data[i]) + used_qword_diff |= 1 << i; + } + return used_qword_diff; +} + +static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, + struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target, unsigned int start, + unsigned int len) +{ + bool changed = false; + unsigned int i; + + for (i = start; len != 0; len--, i++) { + if (entry->data[i] != target->data[i]) { + WRITE_ONCE(entry->data[i], target->data[i]); + changed = true; + } + } + + if (changed) + arm_smmu_sync_ste_for_sid(smmu, sid); + return changed; +} + +/* + * Update the STE/CD to the target configuration. The transition from the + * current entry to the target entry takes place over multiple steps that + * attempts to make the transition hitless if possible. This function takes care + * not to create a situation where the HW can perceive a corrupted entry. HW is + * only required to have a 64 bit atomicity with stores from the CPU, while + * entries are many 64 bit values big. + * + * The difference between the current value and the target value is analyzed to + * determine which of three updates are required - disruptive, hitless or no + * change. + * + * In the most general disruptive case we can make any update in three steps: + * - Disrupting the entry (V=0) + * - Fill now unused qwords, execpt qword 0 which contains V + * - Make qword 0 have the final value and valid (V=1) with a single 64 + * bit store + * + * However this disrupts the HW while it is happening. There are several + * interesting cases where a STE/CD can be updated without disturbing the HW + * because only a small number of bits are changing (S1DSS, CONFIG, etc) or + * because the used bits don't intersect. We can detect this by calculating how + * many 64 bit values need update after adjusting the unused bits and skip the + * V=0 process. This relies on the IGNORED behavior described in the + * specification. + */ +static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, + struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target) +{ + unsigned int num_entry_qwords = ARRAY_SIZE(target->data); + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_ste unused_update; + u8 used_qword_diff; + + used_qword_diff = + arm_smmu_entry_qword_diff(entry, target, &unused_update); + if (hweight8(used_qword_diff) == 1) { + /* + * Only one qword needs its used bits to be changed. This is a + * hitless update, update all bits the current STE is ignoring + * to their new values, then update a single "critical qword" to + * change the STE and finally 0 out any bits that are now unused + * in the target configuration. + */ + unsigned int critical_qword_index = ffs(used_qword_diff) - 1; + + /* + * Skip writing unused bits in the critical qword since we'll be + * writing it in the next step anyways. This can save a sync + * when the only change is in that qword. + */ + unused_update.data[critical_qword_index] = + entry->data[critical_qword_index]; + entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords); + entry_set(smmu, sid, entry, target, critical_qword_index, 1); + entry_set(smmu, sid, entry, target, 0, num_entry_qwords); + } else if (used_qword_diff) { + /* + * At least two qwords need their inuse bits to be changed. This + * requires a breaking update, zero the V bit, write all qwords + * but 0, then set qword 0 + */ + unused_update.data[0] = entry->data[0] & (~STRTAB_STE_0_V); + entry_set(smmu, sid, entry, &unused_update, 0, 1); + entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); + entry_set(smmu, sid, entry, target, 0, 1); + } else { + /* + * No inuse bit changed. Sanity check that all unused bits are 0 + * in the entry. The target was already sanity checked by + * compute_qword_diff(). + */ + WARN_ON_ONCE( + entry_set(smmu, sid, entry, target, 0, num_entry_qwords)); + } + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { + struct arm_smmu_cmdq_ent + prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + } }; + + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + } +} + static void arm_smmu_sync_cd(struct arm_smmu_master *master, int ssid, bool leaf) { @@ -1475,34 +1671,12 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { - /* - * This is hideously complicated, but we only really care about - * three cases at the moment: - * - * 1. Invalid (all zero) -> bypass/fault (init) - * 2. Bypass/fault -> translation/bypass (attach) - * 3. Translation/bypass -> bypass/fault (detach) - * - * Given that we can't update the STE atomically and the SMMU - * doesn't read the thing in a defined order, that leaves us - * with the following maintenance requirements: - * - * 1. Update Config, return (init time STEs aren't live) - * 2. Write everything apart from dword 0, sync, write dword 0, sync - * 3. Update Config, sync - */ - u64 val = le64_to_cpu(dst->data[0]); - bool ste_live = false; + u64 val; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = NULL; struct arm_smmu_s2_cfg *s2_cfg = NULL; struct arm_smmu_domain *smmu_domain = master->domain; - struct arm_smmu_cmdq_ent prefetch_cmd = { - .opcode = CMDQ_OP_PREFETCH_CFG, - .prefetch = { - .sid = sid, - }, - }; + struct arm_smmu_ste target = {};
if (smmu_domain) { switch (smmu_domain->stage) { @@ -1517,22 +1691,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, } }
- if (val & STRTAB_STE_0_V) { - switch (FIELD_GET(STRTAB_STE_0_CFG, val)) { - case STRTAB_STE_0_CFG_BYPASS: - break; - case STRTAB_STE_0_CFG_S1_TRANS: - case STRTAB_STE_0_CFG_S2_TRANS: - ste_live = true; - break; - case STRTAB_STE_0_CFG_ABORT: - BUG_ON(!disable_bypass); - break; - default: - BUG(); /* STE corruption */ - } - } - /* Nuke the existing STE_0 value, as we're going to rewrite it */ val = STRTAB_STE_0_V;
@@ -1543,16 +1701,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, else val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
- dst->data[0] = cpu_to_le64(val); - dst->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + target.data[0] = cpu_to_le64(val); + target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); - dst->data[2] = 0; /* Nuke the VMID */ - /* - * The SMMU can perform negative caching, so we must sync - * the STE regardless of whether the old value was live. - */ - if (smmu) - arm_smmu_sync_ste_for_sid(smmu, sid); + target.data[2] = 0; /* Nuke the VMID */ + arm_smmu_write_ste(master, sid, dst, &target); return; }
@@ -1560,8 +1713,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
- BUG_ON(ste_live); - dst->data[1] = cpu_to_le64( + target.data[1] = cpu_to_le64( FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | @@ -1570,7 +1722,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
if (smmu->features & ARM_SMMU_FEAT_STALLS && !master->stall_enabled) - dst->data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); + target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | @@ -1579,8 +1731,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, }
if (s2_cfg) { - BUG_ON(ste_live); - dst->data[2] = cpu_to_le64( + target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + target.data[2] = cpu_to_le64( FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | #ifdef __BIG_ENDIAN @@ -1589,23 +1742,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2R);
- dst->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); + target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); }
if (master->ats_enabled) - dst->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, + target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, STRTAB_STE_1_EATS_TRANS));
- arm_smmu_sync_ste_for_sid(smmu, sid); - /* See comment in arm_smmu_write_ctx_desc() */ - WRITE_ONCE(dst->data[0], cpu_to_le64(val)); - arm_smmu_sync_ste_for_sid(smmu, sid); - - /* It's likely that we'll want to use the new STE soon */ - if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) - arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + target.data[0] = cpu_to_le64(val); + arm_smmu_write_ste(master, sid, dst, &target); }
static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
From: Jason Gunthorpe jgg@nvidia.com
This allows writing the flow of arm_smmu_write_strtab_ent() around abort and bypass domains more naturally.
Note that the core code no longer supplies NULL domains, though there is still a flow in the driver that end up in arm_smmu_write_strtab_ent() with NULL. A later patch will remove it.
Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes() and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now simply invoke arm_smmu_make_bypass_ste() directly.
Rename arm_smmu_init_bypass_stes() to arm_smmu_init_initial_stes() to better reflect its purpose.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++--------- 1 file changed, 55 insertions(+), 42 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2ee1387ab5a0..b1f719e3089f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1668,6 +1668,24 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); }
+static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) +{ + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); +} + +static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) +{ + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS)); + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { @@ -1678,37 +1696,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_domain *smmu_domain = master->domain; struct arm_smmu_ste target = {};
- if (smmu_domain) { - switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: - cd_table = &master->cd_table; - break; - case ARM_SMMU_DOMAIN_S2: - s2_cfg = &smmu_domain->s2_cfg; - break; - default: - break; - } - } - - /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = STRTAB_STE_0_V; - - /* Bypass/fault */ - if (!smmu_domain || !(cd_table || s2_cfg)) { - if (!smmu_domain && disable_bypass) - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); + if (!smmu_domain) { + if (disable_bypass) + arm_smmu_make_abort_ste(&target); else - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS); + arm_smmu_make_bypass_ste(&target); + arm_smmu_write_ste(master, sid, dst, &target); + return; + }
- target.data[0] = cpu_to_le64(val); - target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, - STRTAB_STE_1_SHCFG_INCOMING)); - target.data[2] = 0; /* Nuke the VMID */ + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: + cd_table = &master->cd_table; + break; + case ARM_SMMU_DOMAIN_S2: + s2_cfg = &smmu_domain->s2_cfg; + break; + case ARM_SMMU_DOMAIN_BYPASS: + arm_smmu_make_bypass_ste(&target); arm_smmu_write_ste(master, sid, dst, &target); return; }
+ /* Nuke the existing STE_0 value, as we're going to rewrite it */ + val = STRTAB_STE_0_V; + if (cd_table) { u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; @@ -1755,22 +1767,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, arm_smmu_write_ste(master, sid, dst, &target); }
-static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab, - unsigned int nent, bool force) +/* + * This can safely directly manipulate the STE memory without a sync sequence + * because the STE table has not been installed in the SMMU yet. + */ +static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, + unsigned int nent) { unsigned int i; - u64 val = STRTAB_STE_0_V; - - if (disable_bypass && !force) - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); - else - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
for (i = 0; i < nent; ++i) { - strtab->data[0] = cpu_to_le64(val); - strtab->data[1] = cpu_to_le64(FIELD_PREP( - STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); - strtab->data[2] = 0; + if (disable_bypass) + arm_smmu_make_abort_ste(strtab); + else + arm_smmu_make_bypass_ste(strtab); strtab++; } } @@ -1798,7 +1808,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false); + arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); return 0; } @@ -3715,7 +3725,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); cfg->strtab_base_cfg = reg;
- arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false); + arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); return 0; }
@@ -4805,7 +4815,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
list_for_each_entry(e, &rmr_list, list) { - struct arm_smmu_ste *step; struct iommu_iort_rmr_data *rmr; int ret, i;
@@ -4818,8 +4827,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) continue; }
- step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]); - arm_smmu_init_bypass_stes(step, 1, true); + /* + * STE table is not programmed to HW, see + * arm_smmu_initial_bypass_stes() + */ + arm_smmu_make_bypass_ste( + arm_smmu_get_step_for_sid(smmu, rmr->sids[i])); } }
From: Jason Gunthorpe jgg@nvidia.com
This is preparation to move the STE calculation higher up in to the call chain and remove arm_smmu_write_strtab_ent(). These new functions will be called directly from attach_dev.
Reviewed-by: Moritz Fischer mdf@kernel.org Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 138 ++++++++++++-------- 1 file changed, 83 insertions(+), 55 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b1f719e3089f..5b4fd80feb29 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1686,13 +1686,89 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); }
+static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master) +{ + struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; + struct arm_smmu_device *smmu = master->smmu; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | + FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) | + (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | + FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | + ((smmu->features & ARM_SMMU_FEAT_STALLS && + !master->stall_enabled) ? + STRTAB_STE_1_S1STALLD : + 0) | + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + + if (smmu->features & ARM_SMMU_FEAT_E2H) { + /* + * To support BTM the streamworld needs to match the + * configuration of the CPU so that the ASID broadcasts are + * properly matched. This means either S/NS-EL2-E2H (hypervisor) + * or NS-EL1 (guest). Since an SVA domain can be installed in a + * PASID this should always use a BTM compatible configuration + * if the HW supports it. + */ + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2)); + } else { + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1)); + + /* + * VMID 0 is reserved for stage-2 bypass EL1 STEs, see + * arm_smmu_domain_alloc_id() + */ + target->data[2] = + cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0)); + } +} + +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | + FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + + target->data[2] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | + FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | + STRTAB_STE_2_S2AA64 | +#ifdef __BIG_ENDIAN + STRTAB_STE_2_S2ENDI | +#endif + STRTAB_STE_2_S2PTW | + STRTAB_STE_2_S2R); + + target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { - u64 val; - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ctx_desc_cfg *cd_table = NULL; - struct arm_smmu_s2_cfg *s2_cfg = NULL; struct arm_smmu_domain *smmu_domain = master->domain; struct arm_smmu_ste target = {};
@@ -1707,63 +1783,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: - cd_table = &master->cd_table; + arm_smmu_make_cdtable_ste(&target, master); break; case ARM_SMMU_DOMAIN_S2: - s2_cfg = &smmu_domain->s2_cfg; + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); break; case ARM_SMMU_DOMAIN_BYPASS: arm_smmu_make_bypass_ste(&target); - arm_smmu_write_ste(master, sid, dst, &target); - return; - } - - /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = STRTAB_STE_0_V; - - if (cd_table) { - u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? - STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; - - target.data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | - FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | - FIELD_PREP(STRTAB_STE_1_STRW, strw)); - - if (smmu->features & ARM_SMMU_FEAT_STALLS && - !master->stall_enabled) - target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); - - val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | - FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | - FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) | - FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt); - } - - if (s2_cfg) { - target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, - STRTAB_STE_1_SHCFG_INCOMING)); - target.data[2] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | - FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | -#ifdef __BIG_ENDIAN - STRTAB_STE_2_S2ENDI | -#endif - STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | - STRTAB_STE_2_S2R); - - target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); - - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); + break; } - - if (master->ats_enabled) - target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, - STRTAB_STE_1_EATS_TRANS)); - - target.data[0] = cpu_to_le64(val); arm_smmu_write_ste(master, sid, dst, &target); }
From: Jason Gunthorpe jgg@nvidia.com
Half the code was living in arm_smmu_domain_finalise_s2(), just move it here and take the values directly from the pgtbl_ops instead of storing copies.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 ++++++++++++--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 -- 2 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5b4fd80feb29..6aaafbe3afe2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1741,6 +1741,11 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; + const struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; + typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr = + &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; + u64 vtcr_val;
memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1753,9 +1758,16 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
+ vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); target->data[2] = cpu_to_le64( FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | - FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | + FIELD_PREP(STRTAB_STE_2_VTCR, vtcr_val) | STRTAB_STE_2_S2AA64 | #ifdef __BIG_ENDIAN STRTAB_STE_2_S2ENDI | @@ -1763,7 +1775,8 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
- target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); + target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr & + STRTAB_STE_3_S2TTB_MASK); }
static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, @@ -2508,7 +2521,6 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, int vmid; struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; - typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
/* Reserve VMID 0 for stage-2 bypass STEs */ vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1, @@ -2516,16 +2528,7 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, if (vmid < 0) return vmid;
- vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; cfg->vmid = (u16)vmid; - cfg->vttbr = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; - cfg->vtcr = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); return 0; }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 859d21729882..a7f61b853e4c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -656,8 +656,6 @@ struct arm_smmu_ctx_desc_cfg {
struct arm_smmu_s2_cfg { u16 vmid; - u64 vttbr; - u64 vtcr; };
struct arm_smmu_strtab_cfg {
From: Jason Gunthorpe jgg@nvidia.com
The BTM support wants to be able to change the ASID of any smmu_domain. When it goes to do this it holds the arm_smmu_asid_lock and iterates over the target domain's devices list.
During attach of a S1 domain we must ensure that the devices list and CD are in sync, otherwise we could miss CD updates or a parallel CD update could push an out of date CD.
This is pretty complicated, and almost works today because arm_smmu_detach_dev() removes the master from the linked list before working on the CD entries, preventing parallel update of the CD.
However, it does have an issue where the CD can remain programed while the domain appears to be unattached. arm_smmu_share_asid() will then not clear any CD entriess and install its own CD entry with the same ASID concurrently. This creates a small race window where the IOMMU can see two ASIDs pointing to different translations.
CPU0 CPU1 arm_smmu_attach_dev() arm_smmu_detach_dev() spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_mmu_notifier_get() arm_smmu_alloc_shared_cd() arm_smmu_share_asid(): // Does nothing due to list_del above arm_smmu_update_ctx_desc_devices() arm_smmu_tlb_inv_asid() arm_smmu_write_ctx_desc() ** Now the ASID is in two CDs with different translation
arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
Solve this by wrapping most of the attach flow in the arm_smmu_asid_lock. This locks more than strictly needed to prepare for the next patch which will reorganize the order of the linked list, STE and CD changes.
Move arm_smmu_detach_dev() till after we have initialized the domain so the lock can be held for less time.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6aaafbe3afe2..08dcdceab3ba 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2800,8 +2800,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -EBUSY; }
- arm_smmu_detach_dev(master); - mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { @@ -2816,6 +2814,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) if (ret) return ret;
+ /* + * Prevent arm_smmu_share_asid() from trying to change the ASID + * of either the old or new domain while we are working on it. + * This allows the STE and the smmu_domain->devices list to + * be inconsistent during this routine. + */ + mutex_lock(&arm_smmu_asid_lock); + + arm_smmu_detach_dev(master); + master->domain = smmu_domain;
/* @@ -2841,13 +2849,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } }
- /* - * Prevent SVA from concurrently modifying the CD or writing to - * the CD entry - */ - mutex_lock(&arm_smmu_asid_lock); ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - mutex_unlock(&arm_smmu_asid_lock); if (ret) { master->domain = NULL; goto out_list_del; @@ -2857,13 +2859,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_install_ste_for_dev(master);
arm_smmu_enable_ats(master); - return 0; + goto out_unlock;
out_list_del: spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+out_unlock: + mutex_unlock(&arm_smmu_asid_lock); return ret; }
From: Jason Gunthorpe jgg@nvidia.com
Currently arm_smmu_install_ste_for_dev() iterates over every SID and computes from scratch an identical STE. Every SID should have the same STE contents. Turn this inside out so that the STE is supplied by the caller and arm_smmu_install_ste_for_dev() simply installs it to every SID.
This is possible now that the STE generation does not inform what sequence should be used to program it.
This allows splitting the STE calculation up according to the call site, which following patches will make use of, and removes the confusing NULL domain special case that only supported arm_smmu_detach_dev().
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++++++++------------- 1 file changed, 22 insertions(+), 35 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 08dcdceab3ba..0648d1ab3e3a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1779,35 +1779,6 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, STRTAB_STE_3_S2TTB_MASK); }
-static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, - struct arm_smmu_ste *dst) -{ - struct arm_smmu_domain *smmu_domain = master->domain; - struct arm_smmu_ste target = {}; - - if (!smmu_domain) { - if (disable_bypass) - arm_smmu_make_abort_ste(&target); - else - arm_smmu_make_bypass_ste(&target); - arm_smmu_write_ste(master, sid, dst, &target); - return; - } - - switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: - arm_smmu_make_cdtable_ste(&target, master); - break; - case ARM_SMMU_DOMAIN_S2: - arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); - break; - case ARM_SMMU_DOMAIN_BYPASS: - arm_smmu_make_bypass_ste(&target); - break; - } - arm_smmu_write_ste(master, sid, dst, &target); -} - /* * This can safely directly manipulate the STE memory without a sync sequence * because the STE table has not been installed in the SMMU yet. @@ -2627,7 +2598,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) } }
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) +static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, + const struct arm_smmu_ste *target) { int i, j; struct arm_smmu_device *smmu = master->smmu; @@ -2644,7 +2616,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) if (j < i) continue;
- arm_smmu_write_strtab_ent(master, sid, step); + arm_smmu_write_ste(master, sid, step, target); } }
@@ -2751,6 +2723,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static void arm_smmu_detach_dev(struct arm_smmu_master *master) { unsigned long flags; + struct arm_smmu_ste target; struct arm_smmu_domain *smmu_domain = master->domain;
if (!smmu_domain) @@ -2764,7 +2737,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - arm_smmu_install_ste_for_dev(master); + if (disable_bypass) + arm_smmu_make_abort_ste(&target); + else + arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); /* * Clearing the CD entry isn't strictly required to detach the domain * since the table is uninstalled anyway, but it helps avoid confusion @@ -2779,6 +2756,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret = 0; unsigned long flags; + struct arm_smmu_ste target; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2840,7 +2818,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) list_add(&master->domain_head, &smmu_domain->devices); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); if (ret) { @@ -2854,9 +2833,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->domain = NULL; goto out_list_del; } - }
- arm_smmu_install_ste_for_dev(master); + arm_smmu_make_cdtable_ste(&target, master); + break; + case ARM_SMMU_DOMAIN_S2: + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + break; + case ARM_SMMU_DOMAIN_BYPASS: + arm_smmu_make_bypass_ste(&target); + break; + } + arm_smmu_install_ste_for_dev(master, &target);
arm_smmu_enable_ats(master); goto out_unlock;
From: Jason Gunthorpe jgg@nvidia.com
This was needed because the STE code required the STE to be in ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code can automatically handle all transitions we can remove this step from the attach_dev flow.
A few small bugs exist because of this:
1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false then there will be a moment where the STE points at BYPASS. Since this can be done by VFIO/IOMMUFD it is a small security race.
2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT regions will temporarily become BLOCKED. We'd like drivers to work in a way that allows IOMMU_RESV_DIRECT to be continuously functional during these transitions.
Make arm_smmu_release_device() put the STE back to the correct ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on this path.
As noted before the reordering of the linked list/STE/CD changes is OK against concurrent arm_smmu_share_asid() because of the arm_smmu_asid_lock.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0648d1ab3e3a..4792600019d1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2723,7 +2723,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static void arm_smmu_detach_dev(struct arm_smmu_master *master) { unsigned long flags; - struct arm_smmu_ste target; struct arm_smmu_domain *smmu_domain = master->domain;
if (!smmu_domain) @@ -2737,11 +2736,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - if (disable_bypass) - arm_smmu_make_abort_ste(&target); - else - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); /* * Clearing the CD entry isn't strictly required to detach the domain * since the table is uninstalled anyway, but it helps avoid confusion @@ -3108,9 +3102,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) static void arm_smmu_release_device(struct device *dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_ste target;
if (WARN_ON(arm_smmu_master_sva_enabled(master))) iopf_queue_remove_device(master->smmu->evtq.iopf, dev); + + /* Put the STE back to what arm_smmu_init_strtab() sets */ + if (disable_bypass && !dev->iommu->require_direct) + arm_smmu_make_abort_ste(&target); + else + arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); + arm_smmu_detach_dev(master); arm_smmu_disable_pasid(master); arm_smmu_remove_master(master);
From: Jason Gunthorpe jgg@nvidia.com
Get closer to the IOMMU API ideal that changes between domains can be hitless. The ordering for the CD table entry is not entirely clean from this perspective.
When switching away from a STE with a CD table programmed in it we should write the new STE first, then clear any old data in the CD entry.
If we are programming a CD table for the first time to a STE then the CD entry should be programmed before the STE is loaded.
If we are replacing a CD table entry when the STE already points at the CD entry then we just need to do the make/break sequence.
Lift this code out of arm_smmu_detach_dev() so it can all be sequenced properly. The only other caller is arm_smmu_release_device() and it is going to free the cdtable anyhow, so it doesn't matter what is in it.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/8-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++------- 1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4792600019d1..3c36550ff050 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2736,14 +2736,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - /* - * Clearing the CD entry isn't strictly required to detach the domain - * since the table is uninstalled anyway, but it helps avoid confusion - * in the call to arm_smmu_write_ctx_desc on the next attach (which - * expects the entry to be empty). - */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) @@ -2820,6 +2812,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->domain = NULL; goto out_list_del; } + } else { + /* + * arm_smmu_write_ctx_desc() relies on the entry being + * invalid to work, clear any existing entry. + */ + ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); + if (ret) { + master->domain = NULL; + goto out_list_del; + } }
ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); @@ -2829,15 +2832,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
arm_smmu_make_cdtable_ste(&target, master); + arm_smmu_install_ste_for_dev(master, &target); break; case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + arm_smmu_install_ste_for_dev(master, &target); + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); break; case ARM_SMMU_DOMAIN_BYPASS: arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); break; } - arm_smmu_install_ste_for_dev(master, &target);
arm_smmu_enable_ats(master); goto out_unlock;
From: Jason Gunthorpe jgg@nvidia.com
The caller already has the domain, just pass it in. A following patch will remove master->domain.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/9-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3c36550ff050..157be241db1b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2635,12 +2635,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master) return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); }
-static void arm_smmu_enable_ats(struct arm_smmu_master *master) +static void arm_smmu_enable_ats(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) { size_t stu; struct pci_dev *pdev; struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_domain *smmu_domain = master->domain;
/* Don't enable ATS at the endpoint if it's not enabled in the STE */ if (!master->ats_enabled) @@ -2656,10 +2656,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
-static void arm_smmu_disable_ats(struct arm_smmu_master *master) +static void arm_smmu_disable_ats(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_domain *smmu_domain = master->domain; - if (!master->ats_enabled) return;
@@ -2728,7 +2727,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) if (!smmu_domain) return;
- arm_smmu_disable_ats(master); + arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); @@ -2850,7 +2849,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) break; }
- arm_smmu_enable_ats(master); + arm_smmu_enable_ats(master, smmu_domain); goto out_unlock;
out_list_del:
From: Jason Gunthorpe jgg@nvidia.com
Introducing global statics which are of type struct iommu_domain, not struct arm_smmu_domain makes it difficult to retain arm_smmu_master->domain, as it can no longer point to an IDENTITY or BLOCKED domain.
The only place that uses the value is arm_smmu_detach_dev(). Change things to work like other drivers and call iommu_get_domain_for_dev() to obtain the current domain.
The master->domain is subtly protecting the master->domain_head against being unused as only PAGING domains will set master->domain and only paging domains use the master->domain_head. To make it simple keep the master->domain_head initialized so that the list_del() logic just does nothing for attached non-PAGING domains.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/10-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - 2 files changed, 10 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 157be241db1b..7fc29d826600 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2721,19 +2721,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static void arm_smmu_detach_dev(struct arm_smmu_master *master) { + struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_domain *smmu_domain; unsigned long flags; - struct arm_smmu_domain *smmu_domain = master->domain;
- if (!smmu_domain) + if (!domain) return;
+ smmu_domain = to_smmu_domain(domain); arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del(&master->domain_head); + list_del_init(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- master->domain = NULL; master->ats_enabled = false; }
@@ -2787,8 +2788,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
arm_smmu_detach_dev(master);
- master->domain = smmu_domain; - /* * The SMMU does not support enabling ATS with bypass. When the STE is * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and @@ -2807,10 +2806,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) case ARM_SMMU_DOMAIN_S1: if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - } } else { /* * arm_smmu_write_ctx_desc() relies on the entry being @@ -2818,17 +2815,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) */ ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - } }
ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - }
arm_smmu_make_cdtable_ste(&target, master); arm_smmu_install_ste_for_dev(master, &target); @@ -2854,7 +2847,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
out_list_del: spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del(&master->domain_head); + list_del_init(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
out_unlock: @@ -3074,6 +3067,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) master->dev = dev; master->smmu = smmu; INIT_LIST_HEAD(&master->bonds); + INIT_LIST_HEAD(&master->domain_head); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a7f61b853e4c..f13557a84395 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -760,7 +760,6 @@ struct arm_smmu_stream { struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; - struct arm_smmu_domain *domain; struct list_head domain_head; struct arm_smmu_stream *streams; /* Locked by the iommu core using the group mutex */
From: Jason Gunthorpe jgg@nvidia.com
The SVA code only works if the RID domain is a S1 domain and has already installed the cdtable.
Originally the check for this was in arm_smmu_sva_bind() but when the op was removed the test didn't get copied over to the new arm_smmu_sva_set_dev_pasid().
Without the test wrong usage usually will hit a WARN_ON() in arm_smmu_write_ctx_desc() due to a missing ctx table.
However, the next patches wil change things so that an IDENTITY domain is not a struct arm_smmu_domain and this will get into memory corruption if the struct is wrongly casted.
Fail in arm_smmu_sva_set_dev_pasid() if the STE does not have a S1, which is a proxy for the STE having a pointer to the CD table. Write it in a way that will be compatible with the next patches.
Fixes: 386fa64fd52b ("arm-smmu-v3/sva: Add SVA domain support") Reported-by: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Closes: https://lore.kernel.org/linux-iommu/2a828e481416405fb3a4cceb9e075a59@huawei.... Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/11-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index fd93039038a6..53c06c85b0e9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -365,7 +365,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_domain *smmu_domain; + + if (!(domain->type & __IOMMU_DOMAIN_PAGING)) + return -ENODEV; + smmu_domain = to_smmu_domain(domain); + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -ENODEV;
if (!master || !master->sva_enabled) return -ENODEV;
From: Jason Gunthorpe jgg@nvidia.com
Move to the new static global for identity domains. Move all the logic out of arm_smmu_attach_dev into an identity only function.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/12-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 +++++++++++++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - 2 files changed, 58 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 7fc29d826600..a0f28bd0724f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2405,8 +2405,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) return arm_smmu_sva_domain_alloc();
if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_IDENTITY) + type != IOMMU_DOMAIN_DMA) return NULL;
/* @@ -2515,11 +2514,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu;
- if (domain->type == IOMMU_DOMAIN_IDENTITY) { - smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS; - return 0; - } - /* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) smmu_domain->stage = ARM_SMMU_DOMAIN_S2; @@ -2725,7 +2719,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) struct arm_smmu_domain *smmu_domain; unsigned long flags;
- if (!domain) + if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) return;
smmu_domain = to_smmu_domain(domain); @@ -2788,15 +2782,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
arm_smmu_detach_dev(master);
- /* - * The SMMU does not support enabling ATS with bypass. When the STE is - * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and - * Translated transactions are denied as though ATS is disabled for the - * stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and - * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). - */ - if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS) - master->ats_enabled = arm_smmu_ats_supported(master); + master->ats_enabled = arm_smmu_ats_supported(master);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_add(&master->domain_head, &smmu_domain->devices); @@ -2833,13 +2819,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); break; - case ARM_SMMU_DOMAIN_BYPASS: - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); - break; }
arm_smmu_enable_ats(master, smmu_domain); @@ -2855,6 +2834,60 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return ret; }
+static int arm_smmu_attach_dev_ste(struct device *dev, + struct arm_smmu_ste *ste) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + if (arm_smmu_master_sva_enabled(master)) + return -EBUSY; + + /* + * Do not allow any ASID to be changed while are working on the STE, + * otherwise we could miss invalidations. + */ + mutex_lock(&arm_smmu_asid_lock); + + /* + * The SMMU does not support enabling ATS with bypass/abort. When the + * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests + * and Translated transactions are denied as though ATS is disabled for + * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and + * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). + */ + arm_smmu_detach_dev(master); + + arm_smmu_install_ste_for_dev(master, ste); + mutex_unlock(&arm_smmu_asid_lock); + + /* + * This has to be done after removing the master from the + * arm_smmu_domain->devices to avoid races updating the same context + * descriptor from arm_smmu_share_asid(). + */ + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); + return 0; +} + +static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_ste ste; + + arm_smmu_make_bypass_ste(&ste); + return arm_smmu_attach_dev_ste(dev, &ste); +} + +static const struct iommu_domain_ops arm_smmu_identity_ops = { + .attach_dev = arm_smmu_attach_dev_identity, +}; + +static struct iommu_domain arm_smmu_identity_domain = { + .type = IOMMU_DOMAIN_IDENTITY, + .ops = &arm_smmu_identity_ops, +}; + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3471,6 +3504,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) }
static struct iommu_ops arm_smmu_ops = { + .identity_domain = &arm_smmu_identity_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f13557a84395..5d4974c2cd82 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -777,7 +777,6 @@ struct arm_smmu_master { enum arm_smmu_domain_stage { ARM_SMMU_DOMAIN_S1 = 0, ARM_SMMU_DOMAIN_S2, - ARM_SMMU_DOMAIN_BYPASS, };
struct arm_smmu_domain {
From: Jason Gunthorpe jgg@nvidia.com
Using the same design as the IDENTITY domain install an STRTAB_STE_0_CFG_ABORT STE.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/13-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a0f28bd0724f..ef1ce5acbcbc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2888,6 +2888,24 @@ static struct iommu_domain arm_smmu_identity_domain = { .ops = &arm_smmu_identity_ops, };
+static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_ste ste; + + arm_smmu_make_abort_ste(&ste); + return arm_smmu_attach_dev_ste(dev, &ste); +} + +static const struct iommu_domain_ops arm_smmu_blocked_ops = { + .attach_dev = arm_smmu_attach_dev_blocked, +}; + +static struct iommu_domain arm_smmu_blocked_domain = { + .type = IOMMU_DOMAIN_BLOCKED, + .ops = &arm_smmu_blocked_ops, +}; + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3505,6 +3523,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, + .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device,
From: Jason Gunthorpe jgg@nvidia.com
Consolidate some more code by having release call arm_smmu_attach_dev_identity/blocked() instead of open coding this.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/14-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ef1ce5acbcbc..17b375e2b389 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3157,19 +3157,16 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) static void arm_smmu_release_device(struct device *dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct arm_smmu_ste target;
if (WARN_ON(arm_smmu_master_sva_enabled(master))) iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
/* Put the STE back to what arm_smmu_init_strtab() sets */ if (disable_bypass && !dev->iommu->require_direct) - arm_smmu_make_abort_ste(&target); + arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev); else - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); + arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev);
- arm_smmu_detach_dev(master); arm_smmu_disable_pasid(master); arm_smmu_remove_master(master); if (master->cd_table.cdtab)
From: Jason Gunthorpe jgg@nvidia.com
Instead of putting container_of() casts in the internals, use the proper type in this call chain. This makes it easier to check that the two global static domains are not leaking into call chains they should not.
Passing the smmu avoids the only caller from having to set it and unset it in the error path.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/15-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 +++++++++++---------- 1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 17b375e2b389..a3267f069830 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -147,6 +147,9 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, };
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_device *smmu); + static void parse_driver_options(struct arm_smmu_device *smmu) { int i = 0; @@ -2447,12 +2450,12 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) kfree(smmu_domain); }
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, +static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg) { int ret; u32 asid; - struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
@@ -2485,11 +2488,11 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, return ret; }
-static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, +static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg) { int vmid; - struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
/* Reserve VMID 0 for stage-2 bypass STEs */ @@ -2502,17 +2505,17 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, return 0; }
-static int arm_smmu_domain_finalise(struct iommu_domain *domain) +static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_device *smmu) { int ret; unsigned long ias, oas; enum io_pgtable_fmt fmt; struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; - int (*finalise_stage_fn)(struct arm_smmu_domain *, - struct io_pgtable_cfg *); - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - struct arm_smmu_device *smmu = smmu_domain->smmu; + int (*finalise_stage_fn)(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg);
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2559,17 +2562,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) if (!pgtbl_ops) return -ENOMEM;
- domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; - domain->geometry.force_aperture = true; + smmu_domain->domain.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; + smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; + smmu_domain->domain.geometry.force_aperture = true;
- ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg); + ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg); if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); return ret; }
smmu_domain->pgtbl_ops = pgtbl_ops; + smmu_domain->smmu = smmu; return 0; }
@@ -2761,10 +2765,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { - smmu_domain->smmu = smmu; - ret = arm_smmu_domain_finalise(domain); - if (ret) - smmu_domain->smmu = NULL; + ret = arm_smmu_domain_finalise(smmu_domain, smmu); } else if (smmu_domain->smmu != smmu) ret = -EINVAL;
From: Jason Gunthorpe jgg@nvidia.com
Now that the BLOCKED and IDENTITY behaviors are managed with their own domains change to the domain_alloc_paging() op.
For now SVA remains using the old interface, eventually it will get its own op that can pass in the device and mm_struct which will let us have a sane lifetime for the mmu_notifier.
Call arm_smmu_domain_finalise() early if dev is available.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/16-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a3267f069830..6b1379e46ad1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2402,14 +2402,15 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) { - struct arm_smmu_domain *smmu_domain;
if (type == IOMMU_DOMAIN_SVA) return arm_smmu_sva_domain_alloc(); + return ERR_PTR(-EOPNOTSUPP); +}
- if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA) - return NULL; +static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) +{ + struct arm_smmu_domain *smmu_domain;
/* * Allocate the domain and initialise some of its data structures. @@ -2418,13 +2419,23 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) */ smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); if (!smmu_domain) - return NULL; + return ERR_PTR(-ENOMEM);
mutex_init(&smmu_domain->init_mutex); INIT_LIST_HEAD(&smmu_domain->devices); spin_lock_init(&smmu_domain->devices_lock); INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+ if (dev) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + int ret; + + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + if (ret) { + kfree(smmu_domain); + return ERR_PTR(ret); + } + } return &smmu_domain->domain; }
@@ -3524,6 +3535,7 @@ static struct iommu_ops arm_smmu_ops = { .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, + .domain_alloc_paging = arm_smmu_domain_alloc_paging, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group,
From: Jason Gunthorpe jgg@nvidia.com
STRTAB_STE_0_V is a CPU value, it needs conversion for sparse to be clean.
The missing annotation was a mistake introduced by splitting the ops out from the STE writer.
Fixes: 7da51af9125c ("iommu/arm-smmu-v3: Make STE programming independent of the callers") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202403011441.5WqGrYjp-lkp@intel.com/ Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6b1379e46ad1..e2c37072c919 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1355,7 +1355,8 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * requires a breaking update, zero the V bit, write all qwords * but 0, then set qword 0 */ - unused_update.data[0] = entry->data[0] & (~STRTAB_STE_0_V); + unused_update.data[0] = entry->data[0] & + cpu_to_le64(~STRTAB_STE_0_V); entry_set(smmu, sid, entry, &unused_update, 0, 1); entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); entry_set(smmu, sid, entry, target, 0, 1);
From: Jason Gunthorpe jgg@nvidia.com
The SVA code is wired to assume that the SVA is programmed onto the mm->pasid. The current core code always does this, so it is fine.
Add a check for clarity.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 53c06c85b0e9..15ba13078a10 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -571,6 +571,9 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, int ret = 0; struct mm_struct *mm = domain->mm;
+ if (mm_get_enqcmd_pasid(mm) != id) + return -EINVAL; + mutex_lock(&sva_lock); ret = __arm_smmu_sva_bind(dev, id, mm); mutex_unlock(&sva_lock);
From: Jason Gunthorpe jgg@nvidia.com
At this point we know which master we are going to change the PCI config on, this is the only device we need to invalidate. Switch arm_smmu_atc_inv_domain() for arm_smmu_atc_inv_master().
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e2c37072c919..32ed241bbf2d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2661,7 +2661,10 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master, pdev = to_pci_dev(master->dev);
atomic_inc(&smmu_domain->nr_ats_masters); - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, 0, 0); + /* + * ATC invalidation of PASID 0 causes the entire ATC to be flushed. + */ + arm_smmu_atc_inv_master(master); if (pci_enable_ats(pdev, stu)) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
From: Jason Gunthorpe jgg@nvidia.com
Instead of passing a naked __le16 * around to represent a CD table entry wrap it in a "struct arm_smmu_cd" with an array of the correct size. This makes it much clearer which functions will comprise the "CD API".
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 5b658bc56069b: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 +++++++++++--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 ++++++- 2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 32ed241bbf2d..506f8c229246 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1433,7 +1433,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) +static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { __le64 *l1ptr; unsigned int idx; @@ -1442,7 +1443,8 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) - return cd_table->cdtab + ssid * CTXDESC_CD_DWORDS; + return (struct arm_smmu_cd *)(cd_table->cdtab + + ssid * CTXDESC_CD_DWORDS);
idx = ssid >> CTXDESC_SPLIT; l1_desc = &cd_table->l1_desc[idx]; @@ -1456,7 +1458,7 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) arm_smmu_sync_cd(master, ssid, false); } idx = ssid & (CTXDESC_L2_ENTRIES - 1); - return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; + return &l1_desc->l2ptr[idx]; }
int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, @@ -1475,7 +1477,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, */ u64 val; bool cd_live; - __le64 *cdptr; + struct arm_smmu_cd *cdptr; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
@@ -1486,7 +1488,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (!cdptr) return -ENOMEM;
- val = le64_to_cpu(cdptr[0]); + val = le64_to_cpu(cdptr->data[0]); cd_live = !!(val & CTXDESC_CD_0_V);
if (!cd) { /* (5) */ @@ -1505,9 +1507,9 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, } else { /* (1) and (2) */ u64 tcr = cd->tcr;
- cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - cdptr[2] = 0; - cdptr[3] = cpu_to_le64(cd->mair); + cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); + cdptr->data[2] = 0; + cdptr->data[3] = cpu_to_le64(cd->mair);
if (!(smmu->features & ARM_SMMU_FEAT_HD)) tcr &= ~CTXDESC_CD_0_TCR_HD; @@ -1544,7 +1546,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, * field within an aligned 64-bit span of a structure can be altered * without first making the structure invalid. */ - WRITE_ONCE(cdptr[0], cpu_to_le64(val)); + WRITE_ONCE(cdptr->data[0], cpu_to_le64(val)); arm_smmu_sync_cd(master, ssid, true); return 0; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 5d4974c2cd82..2f71b10044dd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -312,6 +312,11 @@ struct arm_smmu_ste { #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
#define CTXDESC_CD_DWORDS 8 + +struct arm_smmu_cd { + __le64 data[CTXDESC_CD_DWORDS]; +}; + #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6) #define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8) @@ -638,7 +643,7 @@ struct arm_smmu_ctx_desc { };
struct arm_smmu_l1_ctx_desc { - __le64 *l2ptr; + struct arm_smmu_cd *l2ptr; dma_addr_t l2ptr_dma; };
From: Jason Gunthorpe jgg@nvidia.com
Prepare to put the CD code into the same mechanism. Add an ops indirection around all the STE specific code and make the worker functions independent of the entry content being processed.
get_used and sync ops are provided to hook the correct code.
Signed-off-by: Michael Shavit mshavit@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 172 ++++++++++++-------- 1 file changed, 104 insertions(+), 68 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 506f8c229246..b406ad5e9975 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -106,8 +106,20 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, - ioasid_t sid); +struct arm_smmu_entry_writer_ops; +struct arm_smmu_entry_writer { + const struct arm_smmu_entry_writer_ops *ops; + struct arm_smmu_master *master; +}; + +struct arm_smmu_entry_writer_ops { + unsigned int num_entry_qwords; + __le64 v_bit; + void (*get_used)(const __le64 *entry, __le64 *used); + void (*sync)(struct arm_smmu_entry_writer *writer); +}; + +#define NUM_ENTRY_QWORDS (sizeof(struct arm_smmu_ste) / sizeof(u64))
static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { @@ -1193,43 +1205,42 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) * would be nice if this was complete according to the spec, but minimally it * has to capture the bits this driver uses. */ -static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, - struct arm_smmu_ste *used_bits) +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) { - unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0])); + unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
- used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V); - if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V))) + used_bits[0] = cpu_to_le64(STRTAB_STE_0_V); + if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V))) return;
- used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG); + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
/* S1 translates */ if (cfg & BIT(0)) { - used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | - STRTAB_STE_0_S1CTXPTR_MASK | - STRTAB_STE_0_S1CDMAX); - used_bits->data[1] |= + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | + STRTAB_STE_0_S1CTXPTR_MASK | + STRTAB_STE_0_S1CDMAX); + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | STRTAB_STE_1_EATS); - used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); }
/* S2 translates */ if (cfg & BIT(1)) { - used_bits->data[1] |= + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); - used_bits->data[2] |= + used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R); - used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); + used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); }
if (cfg == STRTAB_STE_0_CFG_BYPASS) - used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); }
/* @@ -1238,57 +1249,55 @@ static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, * unused_update is an intermediate value of entry that has unused bits set to * their new values. */ -static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target, - struct arm_smmu_ste *unused_update) +static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, + const __le64 *entry, const __le64 *target, + __le64 *unused_update) { - struct arm_smmu_ste target_used = {}; - struct arm_smmu_ste cur_used = {}; + __le64 target_used[NUM_ENTRY_QWORDS] = {}; + __le64 cur_used[NUM_ENTRY_QWORDS] = {}; u8 used_qword_diff = 0; unsigned int i;
- arm_smmu_get_ste_used(entry, &cur_used); - arm_smmu_get_ste_used(target, &target_used); + writer->ops->get_used(entry, cur_used); + writer->ops->get_used(target, target_used);
- for (i = 0; i != ARRAY_SIZE(target_used.data); i++) { + for (i = 0; i != writer->ops->num_entry_qwords; i++) { /* * Check that masks are up to date, the make functions are not * allowed to set a bit to 1 if the used function doesn't say it * is used. */ - WARN_ON_ONCE(target->data[i] & ~target_used.data[i]); + WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */ - unused_update->data[i] = (entry->data[i] & cur_used.data[i]) | - (target->data[i] & ~cur_used.data[i]); + unused_update[i] = (entry[i] & cur_used[i]) | + (target[i] & ~cur_used[i]); /* * Each bit indicates that a used bit in a qword needs to be * changed after unused_update is applied. */ - if ((unused_update->data[i] & target_used.data[i]) != - target->data[i]) + if ((unused_update[i] & target_used[i]) != target[i]) used_qword_diff |= 1 << i; } return used_qword_diff; }
-static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, - struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target, unsigned int start, +static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry, + const __le64 *target, unsigned int start, unsigned int len) { bool changed = false; unsigned int i;
for (i = start; len != 0; len--, i++) { - if (entry->data[i] != target->data[i]) { - WRITE_ONCE(entry->data[i], target->data[i]); + if (entry[i] != target[i]) { + WRITE_ONCE(entry[i], target[i]); changed = true; } }
if (changed) - arm_smmu_sync_ste_for_sid(smmu, sid); + writer->ops->sync(writer); return changed; }
@@ -1318,17 +1327,15 @@ static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, * V=0 process. This relies on the IGNORED behavior described in the * specification. */ -static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, - struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target) +static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, + __le64 *entry, const __le64 *target) { - unsigned int num_entry_qwords = ARRAY_SIZE(target->data); - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ste unused_update; + unsigned int num_entry_qwords = writer->ops->num_entry_qwords; + __le64 unused_update[NUM_ENTRY_QWORDS]; u8 used_qword_diff;
used_qword_diff = - arm_smmu_entry_qword_diff(entry, target, &unused_update); + arm_smmu_entry_qword_diff(writer, entry, target, unused_update); if (hweight8(used_qword_diff) == 1) { /* * Only one qword needs its used bits to be changed. This is a @@ -1344,22 +1351,21 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * writing it in the next step anyways. This can save a sync * when the only change is in that qword. */ - unused_update.data[critical_qword_index] = - entry->data[critical_qword_index]; - entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords); - entry_set(smmu, sid, entry, target, critical_qword_index, 1); - entry_set(smmu, sid, entry, target, 0, num_entry_qwords); + unused_update[critical_qword_index] = + entry[critical_qword_index]; + entry_set(writer, entry, unused_update, 0, num_entry_qwords); + entry_set(writer, entry, target, critical_qword_index, 1); + entry_set(writer, entry, target, 0, num_entry_qwords); } else if (used_qword_diff) { /* * At least two qwords need their inuse bits to be changed. This * requires a breaking update, zero the V bit, write all qwords * but 0, then set qword 0 */ - unused_update.data[0] = entry->data[0] & - cpu_to_le64(~STRTAB_STE_0_V); - entry_set(smmu, sid, entry, &unused_update, 0, 1); - entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); - entry_set(smmu, sid, entry, target, 0, 1); + unused_update[0] = entry[0] & (~writer->ops->v_bit); + entry_set(writer, entry, unused_update, 0, 1); + entry_set(writer, entry, target, 1, num_entry_qwords - 1); + entry_set(writer, entry, target, 0, 1); } else { /* * No inuse bit changed. Sanity check that all unused bits are 0 @@ -1367,18 +1373,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * compute_qword_diff(). */ WARN_ON_ONCE( - entry_set(smmu, sid, entry, target, 0, num_entry_qwords)); - } - - /* It's likely that we'll want to use the new STE soon */ - if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { - struct arm_smmu_cmdq_ent - prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, - .prefetch = { - .sid = sid, - } }; - - arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + entry_set(writer, entry, target, 0, num_entry_qwords)); } }
@@ -1661,17 +1656,58 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) +struct arm_smmu_ste_writer { + struct arm_smmu_entry_writer writer; + u32 sid; +}; + +static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer) { + struct arm_smmu_ste_writer *ste_writer = + container_of(writer, struct arm_smmu_ste_writer, writer); struct arm_smmu_cmdq_ent cmd = { .opcode = CMDQ_OP_CFGI_STE, .cfgi = { - .sid = sid, + .sid = ste_writer->sid, .leaf = true, }, };
- arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd); +} + +static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = { + .sync = arm_smmu_ste_writer_sync_entry, + .get_used = arm_smmu_get_ste_used, + .v_bit = cpu_to_le64(STRTAB_STE_0_V), + .num_entry_qwords = sizeof(struct arm_smmu_ste) / sizeof(u64), +}; + +static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, + struct arm_smmu_ste *ste, + const struct arm_smmu_ste *target) +{ + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_ste_writer ste_writer = { + .writer = { + .ops = &arm_smmu_ste_writer_ops, + .master = master, + }, + .sid = sid, + }; + + arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data); + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { + struct arm_smmu_cmdq_ent + prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + } }; + + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + } }
static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
From: Jason Gunthorpe jgg@nvidia.com
CD table entries and STE's have the same essential programming sequence, just with different types and sizes.
Have arm_smmu_write_ctx_desc() generate a target CD and call arm_smmu_write_entry() to do the programming. Due to the way the target CD is generated by modifying the existing CD this alone is not enough for the CD callers to be freed of the ordering requirements.
The following patches will make the rest of the CD flow mirror the STE flow with precise CD contents generated in all cases.
Currently the logic can't ensure that the CD always conforms to the used requirements until all the CD generation is moved to the new method. Add a temporary no_used_check to disable the assertions.
Signed-off-by: Michael Shavit mshavit@google.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 100 +++++++++++++++----- 1 file changed, 74 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b406ad5e9975..4e575749a0d6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -115,11 +115,14 @@ struct arm_smmu_entry_writer { struct arm_smmu_entry_writer_ops { unsigned int num_entry_qwords; __le64 v_bit; + bool no_used_check; void (*get_used)(const __le64 *entry, __le64 *used); void (*sync)(struct arm_smmu_entry_writer *writer); };
-#define NUM_ENTRY_QWORDS (sizeof(struct arm_smmu_ste) / sizeof(u64)) +#define NUM_ENTRY_QWORDS \ + (max(sizeof(struct arm_smmu_ste), sizeof(struct arm_smmu_cd)) / \ + sizeof(u64))
static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { @@ -1267,7 +1270,8 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, * allowed to set a bit to 1 if the used function doesn't say it * is used. */ - WARN_ON_ONCE(target[i] & ~target_used[i]); + if (!writer->ops->no_used_check) + WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */ unused_update[i] = (entry[i] & cur_used[i]) | @@ -1276,7 +1280,8 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, * Each bit indicates that a used bit in a qword needs to be * changed after unused_update is applied. */ - if ((unused_update[i] & target_used[i]) != target[i]) + if ((unused_update[i] & target_used[i]) != + (target[i] & target_used[i])) used_qword_diff |= 1 << i; } return used_qword_diff; @@ -1372,8 +1377,11 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, * in the entry. The target was already sanity checked by * compute_qword_diff(). */ - WARN_ON_ONCE( - entry_set(writer, entry, target, 0, num_entry_qwords)); + if (writer->ops->no_used_check) + entry_set(writer, entry, target, 0, num_entry_qwords); + else + WARN_ON_ONCE(entry_set(writer, entry, target, 0, + num_entry_qwords)); } }
@@ -1456,6 +1464,59 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, return &l1_desc->l2ptr[idx]; }
+struct arm_smmu_cd_writer { + struct arm_smmu_entry_writer writer; + unsigned int ssid; +}; + +static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) +{ + used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V); + if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V))) + return; + memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd)); + + /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */ + if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) { + used_bits[0] &= ~cpu_to_le64( + CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | + CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | + CTXDESC_CD_0_TCR_SH0); + used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); + } +} + +static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer) +{ + struct arm_smmu_cd_writer *cd_writer = + container_of(writer, struct arm_smmu_cd_writer, writer); + + arm_smmu_sync_cd(writer->master, cd_writer->ssid, true); +} + +static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { + .sync = arm_smmu_cd_writer_sync_entry, + .get_used = arm_smmu_get_cd_used, + .v_bit = cpu_to_le64(CTXDESC_CD_0_V), + .no_used_check = true, + .num_entry_qwords = sizeof(struct arm_smmu_cd) / sizeof(u64), +}; + +static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target) +{ + struct arm_smmu_cd_writer cd_writer = { + .writer = { + .ops = &arm_smmu_cd_writer_ops, + .master = master, + }, + .ssid = ssid, + }; + + arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); +} + int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, struct arm_smmu_ctx_desc *cd) { @@ -1473,16 +1534,20 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, u64 val; bool cd_live; struct arm_smmu_cd *cdptr; + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr = ⌖ + struct arm_smmu_cd *cd_table_entry; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) return -E2BIG;
- cdptr = arm_smmu_get_cd_ptr(master, ssid); - if (!cdptr) + cd_table_entry = arm_smmu_get_cd_ptr(master, ssid); + if (!cd_table_entry) return -ENOMEM;
+ target = *cd_table_entry; val = le64_to_cpu(cdptr->data[0]); cd_live = !!(val & CTXDESC_CD_0_V);
@@ -1511,13 +1576,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (!(smmu->features & ARM_SMMU_FEAT_HA)) tcr &= ~CTXDESC_CD_0_TCR_HA;
- /* - * STE may be live, and the SMMU might read dwords of this CD in any - * order. Ensure that it observes valid values before reading - * V=1. - */ - arm_smmu_sync_cd(master, ssid, true); - val = tcr | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | @@ -1531,18 +1589,8 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (cd_table->stall_enabled) val |= CTXDESC_CD_0_S; } - - /* - * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3 - * "Configuration structures and configuration invalidation completion" - * - * The size of single-copy atomic reads made by the SMMU is - * IMPLEMENTATION DEFINED but must be at least 64 bits. Any single - * field within an aligned 64-bit span of a structure can be altered - * without first making the structure invalid. - */ - WRITE_ONCE(cdptr->data[0], cpu_to_le64(val)); - arm_smmu_sync_cd(master, ssid, true); + cdptr->data[0] = cpu_to_le64(val); + arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target); return 0; }
From: Jason Gunthorpe jgg@nvidia.com
A cleared entry is all 0's. Make arm_smmu_clear_cd() do this sequence.
If we are clearing an entry and for some reason it is not already allocated in the CD table then something has gone wrong.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ 3 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 15ba13078a10..3d247c33b7be 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -548,7 +548,7 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
mutex_lock(&sva_lock);
- arm_smmu_write_ctx_desc(master, id, NULL); + arm_smmu_clear_cd(master, id);
list_for_each_entry(t, &master->bonds, list) { if (t->mm == mm) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4e575749a0d6..5280d0e71314 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1517,6 +1517,19 @@ static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); }
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) +{ + struct arm_smmu_cd target = {}; + struct arm_smmu_cd *cdptr; + + if (!master->cd_table.cdtab) + return; + cdptr = arm_smmu_get_cd_ptr(master, ssid); + if (WARN_ON(!cdptr)) + return; + arm_smmu_write_cd_entry(master, ssid, cdptr, &target); +} + int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, struct arm_smmu_ctx_desc *cd) { @@ -2917,9 +2930,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); arm_smmu_install_ste_for_dev(master, &target); - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); + arm_smmu_clear_cd(master, IOMMU_NO_PASID); break; }
@@ -2967,8 +2978,7 @@ static int arm_smmu_attach_dev_ste(struct device *dev, * arm_smmu_domain->devices to avoid races updating the same context * descriptor from arm_smmu_share_asid(). */ - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); + arm_smmu_clear_cd(master, IOMMU_NO_PASID); return 0; }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 2f71b10044dd..3708532f8a51 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -814,6 +814,8 @@ extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; extern struct arm_smmu_ctx_desc quiet_cd;
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); + int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, struct arm_smmu_ctx_desc *cd); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
From: Jason Gunthorpe jgg@nvidia.com
Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain, and reorganize all the places programming S1 domain CD table entries to call it.
Split arm_smmu_update_s1_domain_cd_entry() from arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call chain separate from the unrelated SVA path.
arm_smmu_update_s1_domain_cd_entry() only works on S1 domains attached to RIDs and refreshes all their CDs.
Remove the forced clear of the CD during S1 domain attach, arm_smmu_write_cd_entry() will do this automatically if necessary.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 5b658bc56069b: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 25 ++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 ++++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 +++ 3 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 3d247c33b7be..e90aa1dd71fd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -53,6 +53,29 @@ static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); }
+static void +arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_master *master; + struct arm_smmu_cd target_cd; + unsigned long flags; + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + struct arm_smmu_cd *cdptr; + + /* S1 domains only support RID attachment right now */ + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + if (WARN_ON(!cdptr)) + continue; + + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + &target_cd); + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); +} + /* * Check if the CPU ASID is available on the SMMU side. If a private context * descriptor is using it, try to replace it. @@ -96,7 +119,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) * be some overlap between use of both ASIDs, until we invalidate the * TLB. */ - arm_smmu_update_ctx_desc_devices(smmu_domain, IOMMU_NO_PASID, cd); + arm_smmu_update_s1_domain_cd_entry(smmu_domain);
/* Invalidate TLB entries previously associated with that context */ arm_smmu_tlb_inv_asid(smmu, asid); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5280d0e71314..a893eeaac184 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1436,8 +1436,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, - u32 ssid) +struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { __le64 *l1ptr; unsigned int idx; @@ -1502,9 +1502,9 @@ static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { .num_entry_qwords = sizeof(struct arm_smmu_cd) / sizeof(u64), };
-static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, - struct arm_smmu_cd *cdptr, - const struct arm_smmu_cd *target) +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target) { struct arm_smmu_cd_writer cd_writer = { .writer = { @@ -1517,6 +1517,37 @@ static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); }
+void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; + + memset(target, 0, sizeof(*target)); + + target->data[0] = cpu_to_le64( + cd->tcr | +#ifdef __BIG_ENDIAN + CTXDESC_CD_0_ENDI | +#endif + CTXDESC_CD_0_V | + CTXDESC_CD_0_AA64 | + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | + CTXDESC_CD_0_R | + CTXDESC_CD_0_A | + CTXDESC_CD_0_ASET | + FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) + ); + + if (!(master->smmu->features & ARM_SMMU_FEAT_HD)) + target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (!(master->smmu->features & ARM_SMMU_FEAT_HA)) + target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HA); + + target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); + target->data[3] = cpu_to_le64(cd->mair); +} + void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) { struct arm_smmu_cd target = {}; @@ -2904,29 +2935,29 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: + case ARM_SMMU_DOMAIN_S1: { + struct arm_smmu_cd target_cd; + struct arm_smmu_cd *cdptr; + if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); if (ret) goto out_list_del; - } else { - /* - * arm_smmu_write_ctx_desc() relies on the entry being - * invalid to work, clear any existing entry. - */ - ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); - if (ret) - goto out_list_del; }
- ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - if (ret) + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + if (!cdptr) { + ret = -ENOMEM; goto out_list_del; + }
+ arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + &target_cd); arm_smmu_make_cdtable_ste(&target, master); arm_smmu_install_ste_for_dev(master, &target); break; + } case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); arm_smmu_install_ste_for_dev(master, &target); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3708532f8a51..0fb18cc8cbeb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -815,6 +815,14 @@ extern struct mutex arm_smmu_asid_lock; extern struct arm_smmu_ctx_desc quiet_cd;
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); +struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid); +void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain); +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target);
int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, struct arm_smmu_ctx_desc *cd);
From: Jason Gunthorpe jgg@nvidia.com
No reason to force callers to do two steps. Make arm_smmu_get_cd_ptr() able to return an entry in all cases except OOM.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a893eeaac184..9ee8f6f420c1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -164,6 +164,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct arm_smmu_device *smmu); +static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
static void parse_driver_options(struct arm_smmu_device *smmu) { @@ -1445,6 +1446,11 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+ if (!master->cd_table.cdtab) { + if (arm_smmu_alloc_cd_tables(master)) + return NULL; + } + if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) return (struct arm_smmu_cd *)(cd_table->cdtab + ssid * CTXDESC_CD_DWORDS); @@ -2939,12 +2945,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct arm_smmu_cd target_cd; struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab) { - ret = arm_smmu_alloc_cd_tables(master); - if (ret) - goto out_list_del; - } - cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); if (!cdptr) { ret = -ENOMEM;
From: Jason Gunthorpe jgg@nvidia.com
Avoid arm_smmu_attach_dev() having to undo the changes to the smmu_domain->devices list, acquire the cdptr earlier so we don't need to handle that error.
Now there is a clear break in arm_smmu_attach_dev() where all the prep-work has been done non-disruptively and we commit to making the HW change, which cannot fail.
This completes transforming arm_smmu_attach_dev() so that it does not disturb the HW if it fails.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 +++++++-------------- 1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9ee8f6f420c1..723d41ae8105 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2896,6 +2896,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_master *master; + struct arm_smmu_cd *cdptr;
if (!fwspec) return -ENOENT; @@ -2924,6 +2925,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) if (ret) return ret;
+ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + if (!cdptr) + return -ENOMEM; + } + /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2943,13 +2950,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: { struct arm_smmu_cd target_cd; - struct arm_smmu_cd *cdptr; - - cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); - if (!cdptr) { - ret = -ENOMEM; - goto out_list_del; - }
arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, @@ -2966,16 +2966,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
arm_smmu_enable_ats(master, smmu_domain); - goto out_unlock; - -out_list_del: - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del_init(&master->domain_head); - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); - -out_unlock: mutex_unlock(&arm_smmu_asid_lock); - return ret; + return 0; }
static int arm_smmu_attach_dev_ste(struct device *dev,
From: Jason Gunthorpe jgg@nvidia.com
Pull all the calculations for building the CD table entry for a mmu_struct into arm_smmu_make_sva_cd().
Call it in the two places installing the SVA CD table entry.
Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove the function.
Remove arm_smmu_write_ctx_desc() since all callers are gone.
Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates the same value.
The behavior of quiet_cd changes slightly, the old implementation edited the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD entry. This version generates a full CD entry with a 0 TTB0 and relies on arm_smmu_write_cd_entry() to install it hitlessly.
Remove no_used_check since this was the flow that could result in an used bit inconsistent CD.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 5b658bc56069b: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 163 +++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 103 +---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 - 3 files changed, 110 insertions(+), 161 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index e90aa1dd71fd..5b135ad19252 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -34,25 +34,6 @@ struct arm_smmu_bond {
static DEFINE_MUTEX(sva_lock);
-/* - * Write the CD to the CD tables for all masters that this domain is attached - * to. Note that this is only used to update existing CD entries in the target - * CD table, for which it's assumed that arm_smmu_write_ctx_desc can't fail. - */ -static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain, - int ssid, - struct arm_smmu_ctx_desc *cd) -{ - struct arm_smmu_master *master; - unsigned long flags; - - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { - arm_smmu_write_ctx_desc(master, ssid, cd); - } - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); -} - static void arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { @@ -128,11 +109,91 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) return NULL; }
+static u64 page_size_to_cd(void) +{ + static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K || + PAGE_SIZE == SZ_64K); + if (PAGE_SIZE == SZ_64K) + return ARM_LPAE_TCR_TG0_64K; + if (PAGE_SIZE == SZ_16K) + return ARM_LPAE_TCR_TG0_16K; + return ARM_LPAE_TCR_TG0_4K; +} + +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct mm_struct *mm, u16 asid) +{ + u64 par; + + memset(target, 0, sizeof(*target)); + + par = cpuid_feature_extract_unsigned_field( + read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1), + ID_AA64MMFR0_EL1_PARANGE_SHIFT); + + target->data[0] = cpu_to_le64( + CTXDESC_CD_0_TCR_EPD1 | +#ifdef __BIG_ENDIAN + CTXDESC_CD_0_ENDI | +#endif + CTXDESC_CD_0_V | + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) | + CTXDESC_CD_0_AA64 | + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | + CTXDESC_CD_0_R | + CTXDESC_CD_0_A | + CTXDESC_CD_0_ASET | + FIELD_PREP(CTXDESC_CD_0_ASID, asid)); + + if (master->smmu->features & ARM_SMMU_FEAT_HD) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (master->smmu->features & ARM_SMMU_FEAT_HA) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA); + + /* + * If no MM is passed then this creates a SVA entry that faults + * everything. arm_smmu_write_cd_entry() can hitlessly go between these + * two entries types since TTB0 is ignored by HW when EPD0 is set. + */ + if (mm) { + target->data[0] |= cpu_to_le64( + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, + 64ULL - vabits_actual) | + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) | + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, + ARM_LPAE_TCR_RGN_WBWA) | + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, + ARM_LPAE_TCR_RGN_WBWA) | + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS)); + + target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) & + CTXDESC_CD_1_TTB0_MASK); + } else { + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0); + + /* + * Disable stall and immediately generate an abort if stall + * disable is permitted. This speeds up cleanup for an unclean + * exit if the device is still doing a lot of DMA. + */ + if (master->stall_enabled && + !(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) + target->data[0] &= + cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R)); + } + + /* + * MAIR value is pretty much constant and global, so we can just get it + * from the current CPU register + */ + target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); +} + static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) { u16 asid; int err = 0; - u64 tcr, par, reg; struct arm_smmu_ctx_desc *cd; struct arm_smmu_ctx_desc *ret = NULL;
@@ -166,41 +227,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) if (err) goto out_free_asid;
- /* HA and HD will be filtered out later if not supported by the SMMU */ - tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) | - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) | - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) | - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) | - CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; - - switch (PAGE_SIZE) { - case SZ_4K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K); - break; - case SZ_16K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K); - break; - case SZ_64K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K); - break; - default: - WARN_ON(1); - err = -EINVAL; - goto out_free_asid; - } - - reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); - par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_EL1_PARANGE_SHIFT); - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par); - - cd->ttbr = virt_to_phys(mm->pgd); - cd->tcr = tcr; - /* - * MAIR value is pretty much constant and global, so we can just get it - * from the current CPU register - */ - cd->mair = read_sysreg(mair_el1); cd->asid = asid; cd->mm = mm;
@@ -278,6 +304,8 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_master *master; + unsigned long flags;
mutex_lock(&sva_lock); if (smmu_mn->cleared) { @@ -289,8 +317,19 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events, * but disable translation. */ - arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm), - &quiet_cd); + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr; + + cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + if (WARN_ON(!cdptr)) + continue; + arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid); + arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr, + &target); + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); @@ -385,6 +424,8 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, struct mm_struct *mm) { int ret; + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr; struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); @@ -411,9 +452,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, goto err_free_bond; }
- ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd); - if (ret) + cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + if (!cdptr) { + ret = -ENOMEM; goto err_put_notifier; + } + arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); + arm_smmu_write_cd_entry(master, pasid, cdptr, &target);
list_add(&bond->list, &master->bonds); return 0; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 723d41ae8105..ad37423f7323 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -115,7 +115,6 @@ struct arm_smmu_entry_writer { struct arm_smmu_entry_writer_ops { unsigned int num_entry_qwords; __le64 v_bit; - bool no_used_check; void (*get_used)(const __le64 *entry, __le64 *used); void (*sync)(struct arm_smmu_entry_writer *writer); }; @@ -150,12 +149,6 @@ struct arm_smmu_option_prop { DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa); DEFINE_MUTEX(arm_smmu_asid_lock);
-/* - * Special value used by SVA when a process dies, to quiesce a CD without - * disabling it. - */ -struct arm_smmu_ctx_desc quiet_cd = { 0 }; - static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, @@ -1271,8 +1264,7 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, * allowed to set a bit to 1 if the used function doesn't say it * is used. */ - if (!writer->ops->no_used_check) - WARN_ON_ONCE(target[i] & ~target_used[i]); + WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */ unused_update[i] = (entry[i] & cur_used[i]) | @@ -1281,8 +1273,7 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, * Each bit indicates that a used bit in a qword needs to be * changed after unused_update is applied. */ - if ((unused_update[i] & target_used[i]) != - (target[i] & target_used[i])) + if ((unused_update[i] & target_used[i]) != target[i]) used_qword_diff |= 1 << i; } return used_qword_diff; @@ -1378,11 +1369,8 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, * in the entry. The target was already sanity checked by * compute_qword_diff(). */ - if (writer->ops->no_used_check) - entry_set(writer, entry, target, 0, num_entry_qwords); - else - WARN_ON_ONCE(entry_set(writer, entry, target, 0, - num_entry_qwords)); + WARN_ON_ONCE( + entry_set(writer, entry, target, 0, num_entry_qwords)); } }
@@ -1433,7 +1421,7 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
- /* See comment in arm_smmu_write_ctx_desc() */ + /* The HW has 64 bit atomicity with stores to the L2 CD table */ WRITE_ONCE(*dst, cpu_to_le64(val)); }
@@ -1504,7 +1492,6 @@ static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { .sync = arm_smmu_cd_writer_sync_entry, .get_used = arm_smmu_get_cd_used, .v_bit = cpu_to_le64(CTXDESC_CD_0_V), - .no_used_check = true, .num_entry_qwords = sizeof(struct arm_smmu_cd) / sizeof(u64), };
@@ -1567,83 +1554,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) arm_smmu_write_cd_entry(master, ssid, cdptr, &target); }
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, - struct arm_smmu_ctx_desc *cd) -{ - /* - * This function handles the following cases: - * - * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0). - * (2) Install a secondary CD, for SID+SSID traffic. - * (3) Update ASID of a CD. Atomically write the first 64 bits of the - * CD, then invalidate the old entry and mappings. - * (4) Quiesce the context without clearing the valid bit. Disable - * translation, and ignore any translation fault. - * (5) Remove a secondary CD. - */ - u64 val; - bool cd_live; - struct arm_smmu_cd *cdptr; - struct arm_smmu_cd target; - struct arm_smmu_cd *cdptr = ⌖ - struct arm_smmu_cd *cd_table_entry; - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; - - if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) - return -E2BIG; - - cd_table_entry = arm_smmu_get_cd_ptr(master, ssid); - if (!cd_table_entry) - return -ENOMEM; - - target = *cd_table_entry; - val = le64_to_cpu(cdptr->data[0]); - cd_live = !!(val & CTXDESC_CD_0_V); - - if (!cd) { /* (5) */ - val = 0; - } else if (cd == &quiet_cd) { /* (4) */ - if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) - val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R); - val |= CTXDESC_CD_0_TCR_EPD0; - } else if (cd_live) { /* (3) */ - val &= ~CTXDESC_CD_0_ASID; - val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid); - /* - * Until CD+TLB invalidation, both ASIDs may be used for tagging - * this substream's traffic - */ - } else { /* (1) and (2) */ - u64 tcr = cd->tcr; - - cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - cdptr->data[2] = 0; - cdptr->data[3] = cpu_to_le64(cd->mair); - - if (!(smmu->features & ARM_SMMU_FEAT_HD)) - tcr &= ~CTXDESC_CD_0_TCR_HD; - if (!(smmu->features & ARM_SMMU_FEAT_HA)) - tcr &= ~CTXDESC_CD_0_TCR_HA; - - val = tcr | -#ifdef __BIG_ENDIAN - CTXDESC_CD_0_ENDI | -#endif - CTXDESC_CD_0_R | CTXDESC_CD_0_A | - (cd->mm ? 0 : CTXDESC_CD_0_ASET) | - CTXDESC_CD_0_AA64 | - FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | - CTXDESC_CD_0_V; - - if (cd_table->stall_enabled) - val |= CTXDESC_CD_0_S; - } - cdptr->data[0] = cpu_to_le64(val); - arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target); - return 0; -} - static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) { int ret; @@ -1652,7 +1562,6 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- cd_table->stall_enabled = master->stall_enabled; cd_table->s1cdmax = master->ssid_bits; max_contexts = 1 << cd_table->s1cdmax;
@@ -1750,7 +1659,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span); val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
- /* See comment in arm_smmu_write_ctx_desc() */ + /* The HW has 64 bit atomicity with stores to the L2 STE table */ WRITE_ONCE(*dst, cpu_to_le64(val)); }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 0fb18cc8cbeb..4374703eae52 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -655,8 +655,6 @@ struct arm_smmu_ctx_desc_cfg { u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; - /* Whether CD entries in this table have the stall bit set. */ - u8 stall_enabled:1; };
struct arm_smmu_s2_cfg { @@ -812,7 +810,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; -extern struct arm_smmu_ctx_desc quiet_cd;
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, @@ -824,8 +821,6 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target);
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, - struct arm_smmu_ctx_desc *cd); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf,
From: Jason Gunthorpe jgg@nvidia.com
Half the code was living in arm_smmu_domain_finalise_s1(), just move it here and take the values directly from the pgtbl_ops instead of storing copies.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 5b658bc56069b: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 55 +++++++++------------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 -- 2 files changed, 22 insertions(+), 36 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ad37423f7323..493f26d0e14a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1515,15 +1515,25 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; + const struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = + &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
memset(target, 0, sizeof(*target));
target->data[0] = cpu_to_le64( - cd->tcr | + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | + CTXDESC_CD_0_TCR_EPD1 | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | #endif CTXDESC_CD_0_V | + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | CTXDESC_CD_0_AA64 | (master->stall_enabled ? CTXDESC_CD_0_S : 0) | CTXDESC_CD_0_R | @@ -1532,13 +1542,14 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) );
- if (!(master->smmu->features & ARM_SMMU_FEAT_HD)) - target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HD); - if (!(master->smmu->features & ARM_SMMU_FEAT_HA)) - target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HA); + if (master->smmu->features & ARM_SMMU_FEAT_HD) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (master->smmu->features & ARM_SMMU_FEAT_HA) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA);
- target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - target->data[3] = cpu_to_le64(cd->mair); + target->data[1] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.ttbr & + CTXDESC_CD_1_TTB0_MASK); + target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.mair); }
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) @@ -2508,13 +2519,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) }
static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg) + struct arm_smmu_domain *smmu_domain) { int ret; u32 asid; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; - typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
refcount_set(&cd->refs, 1);
@@ -2522,32 +2531,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, mutex_lock(&arm_smmu_asid_lock); ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); - if (ret) - goto out_unlock; - cd->asid = (u16)asid; - cd->ttbr = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; - cd->tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | - FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | - FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | - CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; - cd->mair = pgtbl_cfg->arm_lpae_s1_cfg.mair; - - mutex_unlock(&arm_smmu_asid_lock); - return 0; - -out_unlock: mutex_unlock(&arm_smmu_asid_lock); return ret; }
static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg) + struct arm_smmu_domain *smmu_domain) { int vmid; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; @@ -2571,8 +2561,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; int (*finalise_stage_fn)(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg); + struct arm_smmu_domain *smmu_domain);
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2623,7 +2612,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; smmu_domain->domain.geometry.force_aperture = true;
- ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg); + ret = finalise_stage_fn(smmu, smmu_domain); if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 4374703eae52..ac24fb480faa 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -634,9 +634,6 @@ struct arm_smmu_strtab_l1_desc {
struct arm_smmu_ctx_desc { u16 asid; - u64 ttbr; - u64 tcr; - u64 mair;
refcount_t refs; struct mm_struct *mm;
From: Jason Gunthorpe jgg@nvidia.com
Add arm_smmu_set_pasid()/arm_smmu_remove_pasid() which are to be used by callers that already constructed the arm_smmu_cd they wish to program.
These functions will encapsulate the shared logic to setup a CD entry that will be shared by SVA and S1 domain cases.
Prior fixes had already moved this logic most up into __arm_smmu_sva_bind(), move it to it's final home.
This does not yet relieve the SVA restrictions: - The RID domain is a S1 domain - The programmed PASID is the mm_get_enqcmd_pasid() - Nothing changes while SVA is running (sva_enable)
Also, SVA invalidation will still iterate over the S1 domain's master list.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 35 +++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++++ 3 files changed, 51 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 5b135ad19252..257c8af09017 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -420,12 +420,10 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) arm_smmu_free_shared_cd(cd); }
-static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, - struct mm_struct *mm) +static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, + struct arm_smmu_cd *target) { int ret; - struct arm_smmu_cd target; - struct arm_smmu_cd *cdptr; struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); @@ -452,19 +450,10 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, goto err_free_bond; }
- cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); - if (!cdptr) { - ret = -ENOMEM; - goto err_put_notifier; - } - arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); - arm_smmu_write_cd_entry(master, pasid, cdptr, &target); - + arm_smmu_make_sva_cd(target, master, mm, bond->smmu_mn->cd->asid); list_add(&bond->list, &master->bonds); return 0;
-err_put_notifier: - arm_smmu_mmu_notifier_put(bond->smmu_mn); err_free_bond: kfree(bond); return ret; @@ -614,10 +603,9 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct arm_smmu_bond *bond = NULL, *t; struct arm_smmu_master *master = dev_iommu_priv_get(dev);
- mutex_lock(&sva_lock); - - arm_smmu_clear_cd(master, id); + arm_smmu_remove_pasid(master, to_smmu_domain(domain), id);
+ mutex_lock(&sva_lock); list_for_each_entry(t, &master->bonds, list) { if (t->mm == mm) { bond = t; @@ -636,17 +624,26 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); int ret = 0; struct mm_struct *mm = domain->mm; + struct arm_smmu_cd target;
if (mm_get_enqcmd_pasid(mm) != id) return -EINVAL;
+ if (!arm_smmu_get_cd_ptr(master, id)) + return -ENOMEM; + mutex_lock(&sva_lock); - ret = __arm_smmu_sva_bind(dev, id, mm); + ret = __arm_smmu_sva_bind(dev, mm, &target); mutex_unlock(&sva_lock); + if (ret) + return ret;
- return ret; + /* This cannot fail since we preallocated the cdptr */ + arm_smmu_set_pasid(master, to_smmu_domain(domain), id, &target); + return 0; }
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 493f26d0e14a..b145fa82fef9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2868,6 +2868,35 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
+static bool arm_smmu_is_s1_domain(struct iommu_domain *domain) +{ + if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) + return false; + return to_smmu_domain(domain)->stage == ARM_SMMU_DOMAIN_S1; +} + +int arm_smmu_set_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid, + const struct arm_smmu_cd *cd) +{ + struct arm_smmu_cd *cdptr; + + if (!arm_smmu_is_s1_domain(iommu_get_domain_for_dev(master->dev))) + return -ENODEV; + + cdptr = arm_smmu_get_cd_ptr(master, pasid); + if (!cdptr) + return -ENOMEM; + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); + return 0; +} + +void arm_smmu_remove_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid) +{ + arm_smmu_clear_cd(master, pasid); +} + static int arm_smmu_attach_dev_ste(struct device *dev, struct arm_smmu_ste *ste) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index ac24fb480faa..030f0b6602c6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -818,6 +818,12 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target);
+int arm_smmu_set_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid, + const struct arm_smmu_cd *cd); +void arm_smmu_remove_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid); + void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf,
From: Jason Gunthorpe jgg@nvidia.com
The next patch will need to store the same master twice (with different SSIDs), so allocate memory for each list element.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 ++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++- 3 files changed, 47 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 257c8af09017..a7ae12f0292d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -37,12 +37,13 @@ static DEFINE_MUTEX(sva_lock); static void arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain; struct arm_smmu_cd target_cd; unsigned long flags;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { + struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd *cdptr;
/* S1 domains only support RID attachment right now */ @@ -304,7 +305,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); struct arm_smmu_domain *smmu_domain = smmu_mn->domain; - struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain; unsigned long flags;
mutex_lock(&sva_lock); @@ -318,7 +319,9 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) * but disable translation. */ spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd target; struct arm_smmu_cd *cdptr;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b145fa82fef9..0f83d95974c6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2230,9 +2230,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, unsigned long iova, size_t size) { int i, ret; + struct arm_smmu_master_domain *master_domain; unsigned long flags; struct arm_smmu_cmdq_ent cmd; - struct arm_smmu_master *master; struct arm_smmu_cmdq_batch cmds;
if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) @@ -2261,7 +2261,10 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
arm_smmu_preempt_disable(smmu_domain->smmu); spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + struct arm_smmu_master *master = master_domain->master; + if (!master->ats_enabled) continue;
@@ -2766,9 +2769,26 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) pci_disable_pasid(pdev); }
+static struct arm_smmu_master_domain * +arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_master *master) +{ + struct arm_smmu_master_domain *master_domain; + + lockdep_assert_held(&smmu_domain->devices_lock); + + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + if (master_domain->master == master) + return master_domain; + } + return NULL; +} + static void arm_smmu_detach_dev(struct arm_smmu_master *master) { struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_master_domain *master_domain; struct arm_smmu_domain *smmu_domain; unsigned long flags;
@@ -2779,7 +2799,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del_init(&master->domain_head); + master_domain = arm_smmu_find_master_domain(smmu_domain, master); + if (master_domain) { + list_del(&master_domain->devices_elm); + kfree(master_domain); + } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
master->ats_enabled = false; @@ -2793,6 +2817,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master_domain *master_domain; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr;
@@ -2829,6 +2854,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENOMEM; }
+ master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); + if (!master_domain) + return -ENOMEM; + master_domain->master = master; + /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2842,7 +2872,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->ats_enabled = arm_smmu_ats_supported(master);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_add(&master->domain_head, &smmu_domain->devices); + list_add(&master_domain->devices_elm, &smmu_domain->devices); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
switch (smmu_domain->stage) { @@ -3180,7 +3210,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) master->dev = dev; master->smmu = smmu; INIT_LIST_HEAD(&master->bonds); - INIT_LIST_HEAD(&master->domain_head); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 030f0b6602c6..8e6b89305805 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -760,7 +760,6 @@ struct arm_smmu_stream { struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; - struct list_head domain_head; struct arm_smmu_stream *streams; /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; @@ -794,12 +793,18 @@ struct arm_smmu_domain {
struct iommu_domain domain;
+ /* List of struct arm_smmu_master_domain */ struct list_head devices; spinlock_t devices_lock;
struct list_head mmu_notifiers; };
+struct arm_smmu_master_domain { + struct list_head devices_elm; + struct arm_smmu_master *master; +}; + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain);
From: Jason Gunthorpe jgg@nvidia.com
The core code allows the domain to be changed on the fly without a forced stop in BLOCKED/IDENTITY. In this flow the driver should just continually maintain the ATS with no change while the STE is updated.
ATS relies on a linked list smmu_domain->devices to keep track of which masters have the domain programmed, but this list is also used by arm_smmu_share_asid(), unrelated to ats.
Create two new functions to encapsulate this combined logic: arm_smmu_attach_prepare() <caller generates and sets the STE> arm_smmu_attach_commit()
The two functions can sequence both enabling ATS and disabling across the STE store. Have every update of the STE use this sequence.
Installing a S1/S2 domain always enables the ATS if the PCIe device supports it.
The enable flow is now ordered differently to allow it to be hitless:
1) Add the master to the new smmu_domain->devices list 2) Program the STE 3) Enable ATS at PCIe 4) Remove the master from the old smmu_domain
This flow ensures that invalidations to either domain will generate an ATC invalidation to the device while the STE is being switched. Thus we don't need to turn off the ATS anymore for correctness.
The disable flow is the reverse: 1) Disable ATS at PCIe 2) Program the STE 3) Invalidate the ATC 4) Remove the master from the old smmu_domain
Move the nr_ats_masters adjustments to be close to the list manipulations. It is a count of the number of ATS enabled masters currently in the list. This is stricly before and after the STE/CD are revised, and done under the list's spin_lock.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 198 ++++++++++++++------ 1 file changed, 140 insertions(+), 58 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0f83d95974c6..3896bd2ba9c1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1747,7 +1747,8 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) }
static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master) + struct arm_smmu_master *master, + bool ats_enabled) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1770,7 +1771,7 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, STRTAB_STE_1_S1STALLD : 0) | FIELD_PREP(STRTAB_STE_1_EATS, - master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
if (smmu->features & ARM_SMMU_FEAT_E2H) { /* @@ -1798,7 +1799,8 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) + struct arm_smmu_domain *smmu_domain, + bool ats_enabled) { struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; const struct io_pgtable_cfg *pgtbl_cfg = @@ -1814,7 +1816,7 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
target->data[1] = cpu_to_le64( FIELD_PREP(STRTAB_STE_1_EATS, - master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | + ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
@@ -2682,22 +2684,16 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master) return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); }
-static void arm_smmu_enable_ats(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) +static void arm_smmu_enable_ats(struct arm_smmu_master *master) { size_t stu; struct pci_dev *pdev; struct arm_smmu_device *smmu = master->smmu;
- /* Don't enable ATS at the endpoint if it's not enabled in the STE */ - if (!master->ats_enabled) - return; - /* Smallest Translation Unit: log2 of the smallest supported granule */ stu = __ffs(smmu->pgsize_bitmap); pdev = to_pci_dev(master->dev);
- atomic_inc(&smmu_domain->nr_ats_masters); /* * ATC invalidation of PASID 0 causes the entire ATC to be flushed. */ @@ -2706,22 +2702,6 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master, dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
-static void arm_smmu_disable_ats(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) -{ - if (!master->ats_enabled) - return; - - pci_disable_ats(to_pci_dev(master->dev)); - /* - * Ensure ATS is disabled at the endpoint before we issue the - * ATC invalidation via the SMMU. - */ - wmb(); - arm_smmu_atc_inv_master(master); - atomic_dec(&smmu_domain->nr_ats_masters); -} - static int arm_smmu_enable_pasid(struct arm_smmu_master *master) { int ret; @@ -2785,39 +2765,145 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, return NULL; }
-static void arm_smmu_detach_dev(struct arm_smmu_master *master) +/* + * If the domain uses the smmu_domain->devices list return the arm_smmu_domain + * structure, otherwise NULL. These domains track attached devices so they can + * issue invalidations. + */ +static struct arm_smmu_domain * +to_smmu_domain_devices(struct iommu_domain *domain) +{ + /* The domain can be NULL only when processing the first attach */ + if (!domain) + return NULL; + if (domain->type & __IOMMU_DOMAIN_PAGING) + return to_smmu_domain(domain); + return NULL; +} + +static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, + struct iommu_domain *domain) { - struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; - struct arm_smmu_domain *smmu_domain; unsigned long flags;
- if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) + if (!smmu_domain) return;
- smmu_domain = to_smmu_domain(domain); - arm_smmu_disable_ats(master, smmu_domain); - spin_lock_irqsave(&smmu_domain->devices_lock, flags); master_domain = arm_smmu_find_master_domain(smmu_domain, master); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); + if (master->ats_enabled) + atomic_dec(&smmu_domain->nr_ats_masters); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); +} + +struct attach_state { + bool want_ats; + bool disable_ats; +}; + +/* + * Prepare to attach a domain to a master. If disable_ats is not set this will + * turn on ATS if supported. smmu_domain can be NULL if the domain being + * attached does not have a page table and does not require invalidation + * tracking. + */ +static int arm_smmu_attach_prepare(struct arm_smmu_master *master, + struct iommu_domain *domain, + struct attach_state *state) +{ + struct arm_smmu_domain *smmu_domain = + to_smmu_domain_devices(domain); + struct arm_smmu_master_domain *master_domain; + unsigned long flags; + + /* + * arm_smmu_share_asid() must not see two domains pointing to the same + * arm_smmu_master_domain contents otherwise it could randomly write one + * or the other to the CD. + */ + lockdep_assert_held(&arm_smmu_asid_lock); + + state->want_ats = !state->disable_ats && arm_smmu_ats_supported(master); + + if (smmu_domain) { + master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); + if (!master_domain) + return -ENOMEM; + master_domain->master = master; + + /* + * During prepare we want the current smmu_domain and new + * smmu_domain to be in the devices list before we change any + * HW. This ensures that both domains will send ATS + * invalidations to the master until we are done. + * + * It is tempting to make this list only track masters that are + * using ATS, but arm_smmu_share_asid() also uses this to change + * the ASID of a domain, unrelated to ATS. + * + * Notice if we are re-attaching the same domain then the list + * will have two identical entries and commit will remove only + * one of them. + */ + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + if (state->want_ats) + atomic_inc(&smmu_domain->nr_ats_masters); + list_add(&master_domain->devices_elm, &smmu_domain->devices); + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + } + + if (!state->want_ats && master->ats_enabled) { + pci_disable_ats(to_pci_dev(master->dev)); + /* + * This is probably overkill, but the config write for disabling + * ATS should complete before the STE is configured to generate + * UR to avoid AER noise. + */ + wmb(); + } + return 0; +} + +/* + * Commit is done after the STE/CD are configured with the EATS setting. It + * completes synchronizing the PCI device's ATC and finishes manipulating the + * smmu_domain->devices list. + */ +static void arm_smmu_attach_commit(struct arm_smmu_master *master, + struct attach_state *state) +{ + lockdep_assert_held(&arm_smmu_asid_lock);
- master->ats_enabled = false; + if (state->want_ats && !master->ats_enabled) { + arm_smmu_enable_ats(master); + } else if (master->ats_enabled) { + /* + * The translation has changed, flush the ATC. At this point the + * SMMU is translating for the new domain and both the old&new + * domain will issue invalidations. + */ + arm_smmu_atc_inv_master(master); + } + master->ats_enabled = state->want_ats; + + arm_smmu_remove_master_domain(master, + iommu_get_domain_for_dev(master->dev)); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret = 0; - unsigned long flags; struct arm_smmu_ste target; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - struct arm_smmu_master_domain *master_domain; + struct attach_state state = {}; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr;
@@ -2854,11 +2940,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENOMEM; }
- master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); - if (!master_domain) - return -ENOMEM; - master_domain->master = master; - /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2867,13 +2948,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) */ mutex_lock(&arm_smmu_asid_lock);
- arm_smmu_detach_dev(master); - - master->ats_enabled = arm_smmu_ats_supported(master); - - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_add(&master_domain->devices_elm, &smmu_domain->devices); - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + ret = arm_smmu_attach_prepare(master, domain, &state); + if (ret) { + mutex_unlock(&arm_smmu_asid_lock); + return ret; + }
switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: { @@ -2882,18 +2961,19 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, &target_cd); - arm_smmu_make_cdtable_ste(&target, master); + arm_smmu_make_cdtable_ste(&target, master, state.want_ats); arm_smmu_install_ste_for_dev(master, &target); break; } case ARM_SMMU_DOMAIN_S2: - arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain, + state.want_ats); arm_smmu_install_ste_for_dev(master, &target); arm_smmu_clear_cd(master, IOMMU_NO_PASID); break; }
- arm_smmu_enable_ats(master, smmu_domain); + arm_smmu_attach_commit(master, &state); mutex_unlock(&arm_smmu_asid_lock); return 0; } @@ -2927,10 +3007,11 @@ void arm_smmu_remove_pasid(struct arm_smmu_master *master, arm_smmu_clear_cd(master, pasid); }
-static int arm_smmu_attach_dev_ste(struct device *dev, - struct arm_smmu_ste *ste) +static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, + struct device *dev, struct arm_smmu_ste *ste) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct attach_state state = {};
if (arm_smmu_master_sva_enabled(master)) return -EBUSY; @@ -2948,9 +3029,10 @@ static int arm_smmu_attach_dev_ste(struct device *dev, * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). */ - arm_smmu_detach_dev(master); - + state.disable_ats = true; + arm_smmu_attach_prepare(master, domain, &state); arm_smmu_install_ste_for_dev(master, ste); + arm_smmu_attach_commit(master, &state); mutex_unlock(&arm_smmu_asid_lock);
/* @@ -2968,7 +3050,7 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_bypass_ste(&ste); - return arm_smmu_attach_dev_ste(dev, &ste); + return arm_smmu_attach_dev_ste(domain, dev, &ste); }
static const struct iommu_domain_ops arm_smmu_identity_ops = { @@ -2986,7 +3068,7 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_abort_ste(&ste); - return arm_smmu_attach_dev_ste(dev, &ste); + return arm_smmu_attach_dev_ste(domain, dev, &ste); }
static const struct iommu_domain_ops arm_smmu_blocked_ops = {
From: Jason Gunthorpe jgg@nvidia.com
Prepare to allow a S1 domain to be attached to a PASID as well. Keep track of the SSID the domain is using on each master in the arm_smmu_master_domain.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 15 ++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++++++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++- 3 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index a7ae12f0292d..36166126833e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -46,13 +46,12 @@ arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd *cdptr;
- /* S1 domains only support RID attachment right now */ - cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + cdptr = arm_smmu_get_cd_ptr(master, master_domain->ssid); if (WARN_ON(!cdptr)) continue;
arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); - arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + arm_smmu_write_cd_entry(master, master_domain->ssid, cdptr, &target_cd); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); @@ -297,8 +296,8 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, smmu_domain); }
- arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), start, - size); + arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), start, + size); }
static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) @@ -335,7 +334,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); - arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); + arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0);
smmu_mn->cleared = true; mutex_unlock(&sva_lock); @@ -414,8 +413,8 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) */ if (!smmu_mn->cleared) { arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid); - arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, - 0); + arm_smmu_atc_inv_domain_sva(smmu_domain, + mm_get_enqcmd_pasid(mm), 0, 0); }
/* Frees smmu_mn */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3896bd2ba9c1..5b2161d2fb20 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2228,8 +2228,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) return ret; }
-int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, - unsigned long iova, size_t size) +static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size) { int i, ret; struct arm_smmu_master_domain *master_domain; @@ -2257,8 +2257,6 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, if (!atomic_read(&smmu_domain->nr_ats_masters)) return 0;
- arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); - cmds.num = 0;
arm_smmu_preempt_disable(smmu_domain->smmu); @@ -2270,6 +2268,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, if (!master->ats_enabled) continue;
+ /* + * Non-zero ssid means SVA is co-opting the S1 domain to issue + * invalidations for SVA PASIDs. + */ + if (ssid != IOMMU_NO_PASID) + arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); + else + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd); @@ -2283,6 +2291,19 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, return ret; }
+static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size) +{ + return __arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, + size); +} + +int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size) +{ + return __arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size); +} + /* IO_PGTABLE API */ static void arm_smmu_tlb_inv_context(void *cookie) { @@ -2304,7 +2325,7 @@ static void arm_smmu_tlb_inv_context(void *cookie) cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); } - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, 0, 0); + arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, @@ -2404,7 +2425,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, * Unfortunately, this can't be leaf-only since we may have * zapped an entire table. */ - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, size); + arm_smmu_atc_inv_domain(smmu_domain, iova, size); }
void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, @@ -2751,7 +2772,8 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_master *master) + struct arm_smmu_master *master, + ioasid_t ssid) { struct arm_smmu_master_domain *master_domain;
@@ -2759,7 +2781,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { - if (master_domain->master == master) + if (master_domain->master == master && + master_domain->ssid == ssid) return master_domain; } return NULL; @@ -2792,7 +2815,8 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, return;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, + IOMMU_NO_PASID); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 8e6b89305805..f116893957a0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -803,6 +803,7 @@ struct arm_smmu_domain { struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; + u16 ssid; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) @@ -834,8 +835,8 @@ void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain); bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd); -int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, - unsigned long iova, size_t size); +int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size);
#ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
From: Jason Gunthorpe jgg@nvidia.com
We no longer need a master->sva_enable to control what attaches are allowed.
Instead keep track inside the cd_table how many valid CD entries exist, and if the RID has a valid entry.
Replace all the attach focused master->sva_enabled tests with a check if the CD has valid entries (or not). If there are any valid entries then the CD table must be currently programmed to the STE.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 10 +++++++ 3 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 36166126833e..d7f3b2785fc1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -437,9 +437,6 @@ static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) return -ENODEV;
- if (!master || !master->sva_enabled) - return -ENODEV; - bond = kzalloc(sizeof(*bond), GFP_KERNEL); if (!bond) return -ENOMEM; @@ -631,7 +628,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, struct mm_struct *mm = domain->mm; struct arm_smmu_cd target;
- if (mm_get_enqcmd_pasid(mm) != id) + if (mm_get_enqcmd_pasid(mm) != id || !master->cd_table.used_sid) return -EINVAL;
if (!arm_smmu_get_cd_ptr(master, id)) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5b2161d2fb20..19e5e4d43c21 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1499,6 +1499,8 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target) { + bool target_valid = target->data[0] & cpu_to_le64(CTXDESC_CD_0_V); + bool cur_valid = cdptr->data[0] & cpu_to_le64(CTXDESC_CD_0_V); struct arm_smmu_cd_writer cd_writer = { .writer = { .ops = &arm_smmu_cd_writer_ops, @@ -1507,6 +1509,15 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, .ssid = ssid, };
+ if (cur_valid != target_valid) { + if (cur_valid) + master->cd_table.used_ssids--; + else + master->cd_table.used_ssids++; + } + if (ssid == IOMMU_NO_PASID) + master->cd_table.used_sid = target_valid; + arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); }
@@ -2937,16 +2948,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master = dev_iommu_priv_get(dev); smmu = master->smmu;
- /* - * Checking that SVA is disabled ensures that this device isn't bound to - * any mm, and can be safely detached from its old domain. Bonds cannot - * be removed concurrently since we're holding the group mutex. - */ - if (arm_smmu_master_sva_enabled(master)) { - dev_err(dev, "cannot attach - SVA enabled\n"); - return -EBUSY; - } - mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { @@ -2962,7 +2963,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); if (!cdptr) return -ENOMEM; - } + } else if (arm_smmu_ssids_in_use(&master->cd_table)) + return -EBUSY;
/* * Prevent arm_smmu_share_asid() from trying to change the ASID @@ -3037,7 +3039,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct attach_state state = {};
- if (arm_smmu_master_sva_enabled(master)) + if (arm_smmu_ssids_in_use(&master->cd_table)) return -EBUSY;
/* diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f116893957a0..35f8d79713fa 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -649,11 +649,21 @@ struct arm_smmu_ctx_desc_cfg { dma_addr_t cdtab_dma; struct arm_smmu_l1_ctx_desc *l1_desc; unsigned int num_l1_ents; + unsigned int used_ssids; + bool used_sid; u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; };
+/* True if the cd table has SSIDS > 0 in use. */ +static inline bool arm_smmu_ssids_in_use(struct arm_smmu_ctx_desc_cfg *cd_table) +{ + if (cd_table->used_sid) + return cd_table->used_ssids > 1; + return cd_table->used_ssids; +} + struct arm_smmu_s2_cfg { u16 vmid; };
From: Jason Gunthorpe jgg@nvidia.com
Allow creating and managing arm_smmu_mater_domain's with a non-zero SSID through the arm_smmu_attach_*() family of functions. This triggers ATC invalidation for the correct SSID in PASID cases and tracks the per-attachment SSID in the struct arm_smmu_master_domain.
Generalize arm_smmu_attach_remove() to be able to remove SSID's as well by ensuring the ATC for the PASID is flushed properly.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 +++++++++++++-------- 1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 19e5e4d43c21..af32e04689a1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2219,13 +2219,14 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size, cmd->atc.size = log2_span; }
-static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) +static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, + ioasid_t ssid) { int i, ret; struct arm_smmu_cmdq_ent cmd; struct arm_smmu_cmdq_batch cmds;
- arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
cmds.num = 0; arm_smmu_preempt_disable(master->smmu); @@ -2729,7 +2730,7 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master) /* * ATC invalidation of PASID 0 causes the entire ATC to be flushed. */ - arm_smmu_atc_inv_master(master); + arm_smmu_atc_inv_master(master, IOMMU_NO_PASID); if (pci_enable_ats(pdev, stu)) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); } @@ -2816,7 +2817,8 @@ to_smmu_domain_devices(struct iommu_domain *domain) }
static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, - struct iommu_domain *domain) + struct iommu_domain *domain, + ioasid_t ssid) { struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; @@ -2826,8 +2828,7 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, return;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, - IOMMU_NO_PASID); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); @@ -2840,6 +2841,7 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, struct attach_state { bool want_ats; bool disable_ats; + ioasid_t ssid; };
/* @@ -2871,6 +2873,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_master *master, if (!master_domain) return -ENOMEM; master_domain->master = master; + master_domain->ssid = state->ssid;
/* * During prepare we want the current smmu_domain and new @@ -2923,12 +2926,15 @@ static void arm_smmu_attach_commit(struct arm_smmu_master *master, * SMMU is translating for the new domain and both the old&new * domain will issue invalidations. */ - arm_smmu_atc_inv_master(master); + if (state->want_ats) + arm_smmu_atc_inv_master(master, state->ssid); + else + arm_smmu_atc_inv_master(master, IOMMU_NO_PASID); } master->ats_enabled = state->want_ats;
- arm_smmu_remove_master_domain(master, - iommu_get_domain_for_dev(master->dev)); + arm_smmu_remove_master_domain( + master, iommu_get_domain_for_dev(master->dev), state->ssid); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) @@ -2938,7 +2944,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - struct attach_state state = {}; + struct attach_state state = {.ssid = IOMMU_NO_PASID}; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr;
@@ -3037,7 +3043,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, struct device *dev, struct arm_smmu_ste *ste) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct attach_state state = {}; + struct attach_state state = {.ssid = IOMMU_NO_PASID};
if (arm_smmu_ssids_in_use(&master->cd_table)) return -EBUSY;
From: Jason Gunthorpe jgg@nvidia.com
Currently the SVA domain is a naked struct iommu_domain, allocate a struct arm_smmu_domain instead.
This is necessary to be able to use the struct arm_master_domain mechanism.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 19 +++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++++++++++-------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++- 3 files changed, 31 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index d7f3b2785fc1..c464059e2a5f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -647,7 +647,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) { - kfree(domain); + kfree(to_smmu_domain(domain)); }
static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { @@ -655,14 +655,17 @@ static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { .free = arm_smmu_sva_domain_free };
-struct iommu_domain *arm_smmu_sva_domain_alloc(void) +struct iommu_domain *arm_smmu_sva_domain_alloc(unsigned type) { - struct iommu_domain *domain; + struct arm_smmu_domain *smmu_domain;
- domain = kzalloc(sizeof(*domain), GFP_KERNEL); - if (!domain) - return NULL; - domain->ops = &arm_smmu_sva_domain_ops; + if (type != IOMMU_DOMAIN_SVA) + return ERR_PTR(-EOPNOTSUPP); + + smmu_domain = arm_smmu_domain_alloc(); + if (IS_ERR(smmu_domain)) + return ERR_CAST(smmu_domain); + smmu_domain->domain.ops = &arm_smmu_sva_domain_ops;
- return domain; + return &smmu_domain->domain; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index af32e04689a1..cb55b19cab53 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2495,31 +2495,35 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) } }
-static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) +struct arm_smmu_domain *arm_smmu_domain_alloc(void) { + struct arm_smmu_domain *smmu_domain; + + smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); + if (!smmu_domain) + return ERR_PTR(-ENOMEM); + + mutex_init(&smmu_domain->init_mutex); + INIT_LIST_HEAD(&smmu_domain->devices); + spin_lock_init(&smmu_domain->devices_lock); + INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
- if (type == IOMMU_DOMAIN_SVA) - return arm_smmu_sva_domain_alloc(); - return ERR_PTR(-EOPNOTSUPP); + return smmu_domain; }
static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) { struct arm_smmu_domain *smmu_domain;
+ smmu_domain = arm_smmu_domain_alloc(); + if (IS_ERR(smmu_domain)) + return ERR_CAST(smmu_domain); + /* * Allocate the domain and initialise some of its data structures. * We can't really do anything meaningful until we've added a * master. */ - smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); - if (!smmu_domain) - return ERR_PTR(-ENOMEM); - - mutex_init(&smmu_domain->init_mutex); - INIT_LIST_HEAD(&smmu_domain->devices); - spin_lock_init(&smmu_domain->devices_lock); - INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
if (dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); @@ -3727,7 +3731,7 @@ static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, - .domain_alloc = arm_smmu_domain_alloc, + .domain_alloc = arm_smmu_sva_domain_alloc, .domain_alloc_paging = arm_smmu_domain_alloc_paging, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 35f8d79713fa..302f635d9652 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -824,6 +824,8 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock;
+struct arm_smmu_domain *arm_smmu_domain_alloc(void); + void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid); @@ -856,7 +858,7 @@ int arm_smmu_master_enable_sva(struct arm_smmu_master *master); int arm_smmu_master_disable_sva(struct arm_smmu_master *master); bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); void arm_smmu_sva_notifier_synchronize(void); -struct iommu_domain *arm_smmu_sva_domain_alloc(void); +struct iommu_domain *arm_smmu_sva_domain_alloc(unsigned int type); void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */
From: Jason Gunthorpe jgg@nvidia.com
Currently the smmu_domain->devices list is unused for SVA domains. Fill it in with the SSID and master of every arm_smmu_set_pasid() using the same logic as the RID attach.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index cb55b19cab53..b4a10eda40a2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2815,7 +2815,8 @@ to_smmu_domain_devices(struct iommu_domain *domain) /* The domain can be NULL only when processing the first attach */ if (!domain) return NULL; - if (domain->type & __IOMMU_DOMAIN_PAGING) + if ((domain->type & __IOMMU_DOMAIN_PAGING) || + domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); return NULL; } @@ -3025,7 +3026,9 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, const struct arm_smmu_cd *cd) { + struct attach_state state = {.ssid = pasid}; struct arm_smmu_cd *cdptr; + int ret;
if (!arm_smmu_is_s1_domain(iommu_get_domain_for_dev(master->dev))) return -ENODEV; @@ -3033,14 +3036,30 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, cdptr = arm_smmu_get_cd_ptr(master, pasid); if (!cdptr) return -ENOMEM; + + mutex_lock(&arm_smmu_asid_lock); + ret = arm_smmu_attach_prepare(master, &smmu_domain->domain, &state); + if (ret) + goto out_unlock; + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); - return 0; + + arm_smmu_attach_commit(master, &state); + +out_unlock: + mutex_unlock(&arm_smmu_asid_lock); + return ret; }
void arm_smmu_remove_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid) { + mutex_lock(&arm_smmu_asid_lock); arm_smmu_clear_cd(master, pasid); + if (master->ats_enabled) + arm_smmu_atc_inv_master(master, pasid); + arm_smmu_remove_master_domain(master, &smmu_domain->domain, pasid); + mutex_unlock(&arm_smmu_asid_lock); }
static int arm_smmu_attach_dev_ste(struct iommu_domain *domain,
From: Jason Gunthorpe jgg@nvidia.com
Make a new op that receives the device and the mm_struct that the SVA domain should be created for. Unlike domain_alloc_paging() the dev argument is never NULL here.
This allows drivers to fully initialize the SVA domain and allocate the mmu_notifier during allocation. It allows the notifier lifetime to follow the lifetime of the iommu_domain.
Since we have only one call site, upgrade the new op to return ERR_PTR instead of NULL.
Change SMMUv3 to use the new op.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 0bafc3701d033: drivers/iommu/iommu.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 +++++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 +++++- drivers/iommu/iommu-sva.c | 16 +++++++++++----- include/linux/iommu.h | 3 +++ 5 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index c464059e2a5f..621d49e9c6d2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -655,17 +655,20 @@ static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { .free = arm_smmu_sva_domain_free };
-struct iommu_domain *arm_smmu_sva_domain_alloc(unsigned type) +struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, + struct mm_struct *mm) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_domain *smmu_domain;
- if (type != IOMMU_DOMAIN_SVA) - return ERR_PTR(-EOPNOTSUPP); - smmu_domain = arm_smmu_domain_alloc(); if (IS_ERR(smmu_domain)) return ERR_CAST(smmu_domain); + + smmu_domain->domain.type = IOMMU_DOMAIN_SVA; smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; + smmu_domain->smmu = smmu;
return &smmu_domain->domain; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b4a10eda40a2..fe172b1690de 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3750,8 +3750,8 @@ static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, - .domain_alloc = arm_smmu_sva_domain_alloc, .domain_alloc_paging = arm_smmu_domain_alloc_paging, + .domain_alloc_sva = arm_smmu_sva_domain_alloc, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 302f635d9652..277cc0de91b6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -858,7 +858,8 @@ int arm_smmu_master_enable_sva(struct arm_smmu_master *master); int arm_smmu_master_disable_sva(struct arm_smmu_master *master); bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); void arm_smmu_sva_notifier_synchronize(void); -struct iommu_domain *arm_smmu_sva_domain_alloc(unsigned int type); +struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, + struct mm_struct *mm); void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */ @@ -904,5 +905,8 @@ static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, ioasid_t id) { } + +#define arm_smmu_sva_domain_alloc NULL + #endif /* CONFIG_ARM_SMMU_V3_SVA */ #endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 0b3369d78389..7ed57667d6ef 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -141,8 +141,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
/* Allocate a new domain and set it on device pasid. */ domain = iommu_sva_domain_alloc(dev, mm); - if (!domain) { - ret = -ENOMEM; + if (IS_ERR(domain)) { + ret = PTR_ERR(domain); goto out_free_handle; }
@@ -312,9 +312,15 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, const struct iommu_ops *ops = dev_iommu_ops(dev); struct iommu_domain *domain;
- domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); - if (!domain) - return NULL; + if (ops->domain_alloc_sva) { + domain = ops->domain_alloc_sva(dev, mm); + if (IS_ERR(domain)) + return domain; + } else { + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); + if (!domain) + return ERR_PTR(-ENOMEM); + }
domain->type = IOMMU_DOMAIN_SVA; mmgrab(mm); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 989a287a2109..b4613378b5d8 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -531,6 +531,7 @@ static inline int __iommu_copy_struct_from_user_array( * Upon failure, ERR_PTR must be returned. * @domain_alloc_paging: Allocate an iommu_domain that can be used for * UNMANAGED, DMA, and DMA_FQ domain types. + * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -571,6 +572,8 @@ struct iommu_ops { struct device *dev, u32 flags, struct iommu_domain *parent, const struct iommu_user_data *user_data); struct iommu_domain *(*domain_alloc_paging)(struct device *dev); + struct iommu_domain *(*domain_alloc_sva)(struct device *dev, + struct mm_struct *mm);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev);
From: Jason Gunthorpe jgg@nvidia.com
This removes all the notifier de-duplication logic in the driver and relies on the core code to de-duplicate and allocate only one SVA domain per mm per smmu instance. This naturally gives a 1:1 relationship between SVA domain and mmu notifier.
Remove all of the previous mmu_notifier, bond, shared cd, and cd refcount logic entirely.
For the purpose of organizing patches lightly remove BTM support. The next patches will add it back in. BTM is a performance optimization so this is bisection friendly functionally invisible change. Note that the current code never even enables ARM_SMMU_FEAT_BTM so none of this code is ever even used.
The bond/shared_cd/btm/asid allocator are tightly wound together and changing them all at once would make this patch too big. The core issue is that having a single SVA domain per-smmu instance conflicts with the design of having a global ASID table that BTM currently needs, as we would end up having to assign multiple SVA domains to the same ASID.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 389 ++++-------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 81 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 +- 3 files changed, 101 insertions(+), 383 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 621d49e9c6d2..1925403c35b5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -12,29 +12,9 @@ #include "arm-smmu-v3.h" #include "../../io-pgtable-arm.h"
-struct arm_smmu_mmu_notifier { - struct mmu_notifier mn; - struct arm_smmu_ctx_desc *cd; - bool cleared; - refcount_t refs; - struct list_head list; - struct arm_smmu_domain *domain; -}; - -#define mn_to_smmu(mn) container_of(mn, struct arm_smmu_mmu_notifier, mn) - -struct arm_smmu_bond { - struct mm_struct *mm; - struct arm_smmu_mmu_notifier *smmu_mn; - struct list_head list; -}; - -#define sva_to_bond(handle) \ - container_of(handle, struct arm_smmu_bond, sva) - static DEFINE_MUTEX(sva_lock);
-static void +static void __maybe_unused arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { struct arm_smmu_master_domain *master_domain; @@ -57,58 +37,6 @@ arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); }
-/* - * Check if the CPU ASID is available on the SMMU side. If a private context - * descriptor is using it, try to replace it. - */ -static struct arm_smmu_ctx_desc * -arm_smmu_share_asid(struct mm_struct *mm, u16 asid) -{ - int ret; - u32 new_asid; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_device *smmu; - struct arm_smmu_domain *smmu_domain; - - cd = xa_load(&arm_smmu_asid_xa, asid); - if (!cd) - return NULL; - - if (cd->mm) { - if (WARN_ON(cd->mm != mm)) - return ERR_PTR(-EINVAL); - /* All devices bound to this mm use the same cd struct. */ - refcount_inc(&cd->refs); - return cd; - } - - smmu_domain = container_of(cd, struct arm_smmu_domain, cd); - smmu = smmu_domain->smmu; - - ret = xa_alloc(&arm_smmu_asid_xa, &new_asid, cd, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); - if (ret) - return ERR_PTR(-ENOSPC); - /* - * Race with unmap: TLB invalidations will start targeting the new ASID, - * which isn't assigned yet. We'll do an invalidate-all on the old ASID - * later, so it doesn't matter. - */ - cd->asid = new_asid; - /* - * Update ASID and invalidate CD in all associated masters. There will - * be some overlap between use of both ASIDs, until we invalidate the - * TLB. - */ - arm_smmu_update_s1_domain_cd_entry(smmu_domain); - - /* Invalidate TLB entries previously associated with that context */ - arm_smmu_tlb_inv_asid(smmu, asid); - - xa_erase(&arm_smmu_asid_xa, asid); - return NULL; -} - static u64 page_size_to_cd(void) { static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K || @@ -122,7 +50,8 @@ static u64 page_size_to_cd(void)
static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, - struct mm_struct *mm, u16 asid) + struct mm_struct *mm, u16 asid, + bool btm_invalidation) { u64 par;
@@ -143,7 +72,7 @@ static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, (master->stall_enabled ? CTXDESC_CD_0_S : 0) | CTXDESC_CD_0_R | CTXDESC_CD_0_A | - CTXDESC_CD_0_ASET | + (btm_invalidation ? 0 : CTXDESC_CD_0_ASET) | FIELD_PREP(CTXDESC_CD_0_ASID, asid));
if (master->smmu->features & ARM_SMMU_FEAT_HD) @@ -190,69 +119,6 @@ static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); }
-static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) -{ - u16 asid; - int err = 0; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_ctx_desc *ret = NULL; - - /* Don't free the mm until we release the ASID */ - mmgrab(mm); - - asid = arm64_mm_context_get(mm); - if (!asid) { - err = -ESRCH; - goto out_drop_mm; - } - - cd = kzalloc(sizeof(*cd), GFP_KERNEL); - if (!cd) { - err = -ENOMEM; - goto out_put_context; - } - - refcount_set(&cd->refs, 1); - - mutex_lock(&arm_smmu_asid_lock); - ret = arm_smmu_share_asid(mm, asid); - if (ret) { - mutex_unlock(&arm_smmu_asid_lock); - goto out_free_cd; - } - - err = xa_insert(&arm_smmu_asid_xa, asid, cd, GFP_KERNEL); - mutex_unlock(&arm_smmu_asid_lock); - - if (err) - goto out_free_asid; - - cd->asid = asid; - cd->mm = mm; - - return cd; - -out_free_asid: - arm_smmu_free_asid(cd); -out_free_cd: - kfree(cd); -out_put_context: - arm64_mm_context_put(mm); -out_drop_mm: - mmdrop(mm); - return err < 0 ? ERR_PTR(err) : ret; -} - -static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd) -{ - if (arm_smmu_free_asid(cd)) { - /* Unpin ASID */ - arm64_mm_context_put(cd->mm); - mmdrop(cd->mm); - kfree(cd); - } -} - /* * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this * is used as a threshold to replace per-page TLBI commands to issue in the @@ -267,8 +133,8 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, unsigned long start, unsigned long end) { - struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_domain *smmu_domain = + container_of(mn, struct arm_smmu_domain, mmu_notifier); size_t size;
/* @@ -285,34 +151,27 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, size = 0; }
- if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { + if (!smmu_domain->btm_invalidation) { if (!size) arm_smmu_tlb_inv_asid(smmu_domain->smmu, - smmu_mn->cd->asid); + smmu_domain->cd.asid); else arm_smmu_tlb_inv_range_asid(start, size, - smmu_mn->cd->asid, + smmu_domain->cd.asid, PAGE_SIZE, false, smmu_domain); }
- arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), start, - size); + arm_smmu_atc_inv_domain(smmu_domain, start, size); }
static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { - struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_domain *smmu_domain = + container_of(mn, struct arm_smmu_domain, mmu_notifier); struct arm_smmu_master_domain *master_domain; unsigned long flags;
- mutex_lock(&sva_lock); - if (smmu_mn->cleared) { - mutex_unlock(&sva_lock); - return; - } - /* * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events, * but disable translation. @@ -324,25 +183,27 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct arm_smmu_cd target; struct arm_smmu_cd *cdptr;
- cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + cdptr = arm_smmu_get_cd_ptr(master, master_domain->ssid); if (WARN_ON(!cdptr)) continue; - arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid); - arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr, + arm_smmu_make_sva_cd(&target, master, NULL, + smmu_domain->cd.asid, + smmu_domain->btm_invalidation); + arm_smmu_write_cd_entry(master, master_domain->ssid, cdptr, &target); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); - arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); - - smmu_mn->cleared = true; - mutex_unlock(&sva_lock); + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
static void arm_smmu_mmu_notifier_free(struct mmu_notifier *mn) { - kfree(mn_to_smmu(mn)); + struct arm_smmu_domain *smmu_domain = + container_of(mn, struct arm_smmu_domain, mmu_notifier); + + kfree(smmu_domain); }
static const struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = { @@ -351,113 +212,6 @@ static const struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = { .free_notifier = arm_smmu_mmu_notifier_free, };
-/* Allocate or get existing MMU notifier for this {domain, mm} pair */ -static struct arm_smmu_mmu_notifier * -arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain, - struct mm_struct *mm) -{ - int ret; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_mmu_notifier *smmu_mn; - - list_for_each_entry(smmu_mn, &smmu_domain->mmu_notifiers, list) { - if (smmu_mn->mn.mm == mm) { - refcount_inc(&smmu_mn->refs); - return smmu_mn; - } - } - - cd = arm_smmu_alloc_shared_cd(mm); - if (IS_ERR(cd)) - return ERR_CAST(cd); - - smmu_mn = kzalloc(sizeof(*smmu_mn), GFP_KERNEL); - if (!smmu_mn) { - ret = -ENOMEM; - goto err_free_cd; - } - - refcount_set(&smmu_mn->refs, 1); - smmu_mn->cd = cd; - smmu_mn->domain = smmu_domain; - smmu_mn->mn.ops = &arm_smmu_mmu_notifier_ops; - - ret = mmu_notifier_register(&smmu_mn->mn, mm); - if (ret) { - kfree(smmu_mn); - goto err_free_cd; - } - - list_add(&smmu_mn->list, &smmu_domain->mmu_notifiers); - return smmu_mn; - -err_free_cd: - arm_smmu_free_shared_cd(cd); - return ERR_PTR(ret); -} - -static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) -{ - struct mm_struct *mm = smmu_mn->mn.mm; - struct arm_smmu_ctx_desc *cd = smmu_mn->cd; - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; - - if (!refcount_dec_and_test(&smmu_mn->refs)) - return; - - list_del(&smmu_mn->list); - - /* - * If we went through clear(), we've already invalidated, and no - * new TLB entry can have been formed. - */ - if (!smmu_mn->cleared) { - arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid); - arm_smmu_atc_inv_domain_sva(smmu_domain, - mm_get_enqcmd_pasid(mm), 0, 0); - } - - /* Frees smmu_mn */ - mmu_notifier_put(&smmu_mn->mn); - arm_smmu_free_shared_cd(cd); -} - -static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, - struct arm_smmu_cd *target) -{ - int ret; - struct arm_smmu_bond *bond; - struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct arm_smmu_domain *smmu_domain; - - if (!(domain->type & __IOMMU_DOMAIN_PAGING)) - return -ENODEV; - smmu_domain = to_smmu_domain(domain); - if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) - return -ENODEV; - - bond = kzalloc(sizeof(*bond), GFP_KERNEL); - if (!bond) - return -ENOMEM; - - bond->mm = mm; - - bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm); - if (IS_ERR(bond->smmu_mn)) { - ret = PTR_ERR(bond->smmu_mn); - goto err_free_bond; - } - - arm_smmu_make_sva_cd(target, master, mm, bond->smmu_mn->cd->asid); - list_add(&bond->list, &master->bonds); - return 0; - -err_free_bond: - kfree(bond); - return ret; -} - bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) { unsigned long reg, fld; @@ -574,11 +328,6 @@ int arm_smmu_master_enable_sva(struct arm_smmu_master *master) int arm_smmu_master_disable_sva(struct arm_smmu_master *master) { mutex_lock(&sva_lock); - if (!list_empty(&master->bonds)) { - dev_err(master->dev, "cannot disable SVA, device is bound\n"); - mutex_unlock(&sva_lock); - return -EBUSY; - } arm_smmu_master_sva_disable_iopf(master); master->sva_enabled = false; mutex_unlock(&sva_lock); @@ -595,59 +344,54 @@ void arm_smmu_sva_notifier_synchronize(void) mmu_notifier_synchronize(); }
-void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t id) -{ - struct mm_struct *mm = domain->mm; - struct arm_smmu_bond *bond = NULL, *t; - struct arm_smmu_master *master = dev_iommu_priv_get(dev); - - arm_smmu_remove_pasid(master, to_smmu_domain(domain), id); - - mutex_lock(&sva_lock); - list_for_each_entry(t, &master->bonds, list) { - if (t->mm == mm) { - bond = t; - break; - } - } - - if (!WARN_ON(!bond)) { - list_del(&bond->list); - arm_smmu_mmu_notifier_put(bond->smmu_mn); - kfree(bond); - } - mutex_unlock(&sva_lock); -} - static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id) { + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_master *master = dev_iommu_priv_get(dev); - int ret = 0; - struct mm_struct *mm = domain->mm; struct arm_smmu_cd target; + int ret;
- if (mm_get_enqcmd_pasid(mm) != id || !master->cd_table.used_sid) + /* Prevent arm_smmu_mm_release from being called while we are attaching */ + if (!mmget_not_zero(domain->mm)) return -EINVAL;
- if (!arm_smmu_get_cd_ptr(master, id)) - return -ENOMEM; + /* + * This does not need the arm_smmu_asid_lock because SVA domains never + * get reassigned + */ + arm_smmu_make_sva_cd(&target, master, smmu_domain->domain.mm, + smmu_domain->cd.asid, + smmu_domain->btm_invalidation);
- mutex_lock(&sva_lock); - ret = __arm_smmu_sva_bind(dev, mm, &target); - mutex_unlock(&sva_lock); - if (ret) - return ret; + ret = arm_smmu_set_pasid(master, smmu_domain, id, &target);
- /* This cannot fail since we preallocated the cdptr */ - arm_smmu_set_pasid(master, to_smmu_domain(domain), id, &target); - return 0; + mmput(domain->mm); + return ret; }
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) { - kfree(to_smmu_domain(domain)); + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + /* + * Ensure the ASID is empty in the iommu cache before allowing reuse. + */ + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + + /* + * Notice that the arm_smmu_mm_arch_invalidate_secondary_tlbs op can + * still be called/running at this point. We allow the ASID to be + * reused, and if there is a race then it just suffers harmless + * unnecessary invalidation. + */ + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); + + /* + * Actual free is defered to the SRCU callback + * arm_smmu_mmu_notifier_free() + */ + mmu_notifier_put(&smmu_domain->mmu_notifier); }
static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { @@ -661,6 +405,8 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_domain *smmu_domain; + u32 asid; + int ret;
smmu_domain = arm_smmu_domain_alloc(); if (IS_ERR(smmu_domain)) @@ -670,5 +416,22 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; smmu_domain->smmu = smmu;
+ ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); + if (ret) + goto err_free; + + smmu_domain->cd.asid = asid; + smmu_domain->mmu_notifier.ops = &arm_smmu_mmu_notifier_ops; + ret = mmu_notifier_register(&smmu_domain->mmu_notifier, mm); + if (ret) + goto err_asid; + return &smmu_domain->domain; + +err_asid: + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); +err_free: + kfree(smmu_domain); + return ERR_PTR(ret); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index fe172b1690de..94e379242979 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1656,22 +1656,6 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) cd_table->cdtab = NULL; }
-bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd) -{ - bool free; - struct arm_smmu_ctx_desc *old_cd; - - if (!cd->asid) - return false; - - free = refcount_dec_and_test(&cd->refs); - if (free) { - old_cd = xa_erase(&arm_smmu_asid_xa, cd->asid); - WARN_ON(old_cd != cd); - } - return free; -} - /* Stream table manipulation functions */ static void arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) @@ -2240,8 +2224,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, return ret; }
-static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size) +int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size) { int i, ret; struct arm_smmu_master_domain *master_domain; @@ -2280,15 +2264,7 @@ static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- /* - * Non-zero ssid means SVA is co-opting the S1 domain to issue - * invalidations for SVA PASIDs. - */ - if (ssid != IOMMU_NO_PASID) - arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); - else - arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, - &cmd); + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd);
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2303,19 +2279,6 @@ static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, return ret; }
-static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, - unsigned long iova, size_t size) -{ - return __arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, - size); -} - -int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size) -{ - return __arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size); -} - /* IO_PGTABLE API */ static void arm_smmu_tlb_inv_context(void *cookie) { @@ -2506,7 +2469,6 @@ struct arm_smmu_domain *arm_smmu_domain_alloc(void) mutex_init(&smmu_domain->init_mutex); INIT_LIST_HEAD(&smmu_domain->devices); spin_lock_init(&smmu_domain->devices_lock); - INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
return smmu_domain; } @@ -2549,7 +2511,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { /* Prevent SVA from touching the CD while we're freeing it */ mutex_lock(&arm_smmu_asid_lock); - arm_smmu_free_asid(&smmu_domain->cd); + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); mutex_unlock(&arm_smmu_asid_lock); } else { struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; @@ -2567,11 +2529,9 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, u32 asid; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
- refcount_set(&cd->refs, 1); - /* Prevent SVA from modifying the ASID until it is written to the CD */ mutex_lock(&arm_smmu_asid_lock); - ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd, + ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); cd->asid = (u16)asid; mutex_unlock(&arm_smmu_asid_lock); @@ -3030,7 +2990,11 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_cd *cdptr; int ret;
- if (!arm_smmu_is_s1_domain(iommu_get_domain_for_dev(master->dev))) + if (smmu_domain->smmu != master->smmu) + return -EINVAL; + + if (!arm_smmu_is_s1_domain(iommu_get_domain_for_dev(master->dev)) || + !master->cd_table.used_sid) return -ENODEV;
cdptr = arm_smmu_get_cd_ptr(master, pasid); @@ -3051,9 +3015,18 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, return ret; }
-void arm_smmu_remove_pasid(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain, ioasid_t pasid) +static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *smmu_domain; + struct iommu_domain *domain; + + domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); + if (WARN_ON(IS_ERR(domain)) || !domain) + return; + + smmu_domain = to_smmu_domain(domain); + mutex_lock(&arm_smmu_asid_lock); arm_smmu_clear_cd(master, pasid); if (master->ats_enabled) @@ -3346,7 +3319,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
master->dev = dev; master->smmu = smmu; - INIT_LIST_HEAD(&master->bonds); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); @@ -3735,17 +3707,6 @@ static int arm_smmu_def_domain_type(struct device *dev) return ret; }
-static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) -{ - struct iommu_domain *domain; - - domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); - if (WARN_ON(IS_ERR(domain)) || !domain) - return; - - arm_smmu_sva_remove_dev_pasid(domain, dev, pasid); -} - static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 277cc0de91b6..1d645d17e867 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -634,9 +634,6 @@ struct arm_smmu_strtab_l1_desc {
struct arm_smmu_ctx_desc { u16 asid; - - refcount_t refs; - struct mm_struct *mm; };
struct arm_smmu_l1_ctx_desc { @@ -778,7 +775,6 @@ struct arm_smmu_master { bool stall_enabled; bool sva_enabled; bool iopf_enabled; - struct list_head bonds; unsigned int ssid_bits; };
@@ -807,7 +803,8 @@ struct arm_smmu_domain { struct list_head devices; spinlock_t devices_lock;
- struct list_head mmu_notifiers; + struct mmu_notifier mmu_notifier; + bool btm_invalidation; };
struct arm_smmu_master_domain { @@ -846,9 +843,8 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain); -bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd); -int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size); +int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size);
#ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); @@ -860,8 +856,6 @@ bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); void arm_smmu_sva_notifier_synchronize(void); struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct mm_struct *mm); -void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */ static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) {
From: Jason Gunthorpe jgg@nvidia.com
The SMMUv3 IOTLB is tagged with a VMID/ASID cache tag. Any time the underlying translation is changed these need to be invalidated. At boot time the IOTLB starts out empty and all cache tags are available for allocation.
When a tag is taken out of the allocator the code assumes the IOTLB doesn't reference it, and immediately programs it into a STE/CD. If the cache is referencing the tag then it will have stale data and IOMMU will become incoherent.
Thus, whenever an ASID/VMID is freed back to the allocator we need to know that the IOTLB doesn't have any references to it. The SVA code correctly had an invalidation here, but the paging code does not.
Consolidate freeing the VMID/ASID to one place and consistently flush both ID types before returning to their allocators.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 9 ++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 34 ++++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + 3 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 1925403c35b5..d942e25cf5a9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -374,18 +374,13 @@ static void arm_smmu_sva_domain_free(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
- /* - * Ensure the ASID is empty in the iommu cache before allowing reuse. - */ - arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); - /* * Notice that the arm_smmu_mm_arch_invalidate_secondary_tlbs op can * still be called/running at this point. We allow the ASID to be * reused, and if there is a race then it just suffers harmless * unnecessary invalidation. */ - xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); + arm_smmu_domain_free_id(smmu_domain);
/* * Actual free is defered to the SRCU callback @@ -430,7 +425,7 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, return &smmu_domain->domain;
err_asid: - xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); + arm_smmu_domain_free_id(smmu_domain); err_free: kfree(smmu_domain); return ERR_PTR(ret); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 94e379242979..f1cbb6ba3eff 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2500,25 +2500,41 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) return &smmu_domain->domain; }
-static void arm_smmu_domain_free(struct iommu_domain *domain) +/* + * Return the domain's ASID or VMID back to the allocator. All IDs in the + * allocator do not have an IOTLB entries referencing them. + */ +void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu;
- free_io_pgtable_ops(smmu_domain->pgtbl_ops); + if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1 || + smmu_domain->domain.type == IOMMU_DOMAIN_SVA) && + smmu_domain->cd.asid) { + arm_smmu_tlb_inv_asid(smmu, smmu_domain->cd.asid);
- /* Free the ASID or VMID */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { /* Prevent SVA from touching the CD while we're freeing it */ mutex_lock(&arm_smmu_asid_lock); xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); mutex_unlock(&arm_smmu_asid_lock); - } else { - struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; - if (cfg->vmid) - ida_free(&smmu->vmid_map, cfg->vmid); + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && + smmu_domain->s2_cfg.vmid) { + struct arm_smmu_cmdq_ent cmd = { + .opcode = CMDQ_OP_TLBI_S12_VMALL, + .tlbi.vmid = smmu_domain->s2_cfg.vmid + }; + + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + ida_free(&smmu->vmid_map, smmu_domain->s2_cfg.vmid); } +}
+static void arm_smmu_domain_free(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + free_io_pgtable_ops(smmu_domain->pgtbl_ops); + arm_smmu_domain_free_id(smmu_domain); kfree(smmu_domain); }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 1d645d17e867..dbfc63f303c3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -839,6 +839,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, void arm_smmu_remove_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid);
+void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf,
From: Jason Gunthorpe jgg@nvidia.com
The SVA BTM and shared cd code was the only thing keeping this as a global array. Now that is out of the way we can move it to per-smmu.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 +++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-- 3 files changed, 22 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index d942e25cf5a9..f49ba81c25f0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -411,7 +411,7 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; smmu_domain->smmu = smmu;
- ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, + ret = xa_alloc(&smmu->asid_map, &asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); if (ret) goto err_free; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f1cbb6ba3eff..6df98d8d7aeb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -146,9 +146,6 @@ struct arm_smmu_option_prop { const char *prop; };
-DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa); -DEFINE_MUTEX(arm_smmu_asid_lock); - static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, @@ -2514,9 +2511,9 @@ void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain) arm_smmu_tlb_inv_asid(smmu, smmu_domain->cd.asid);
/* Prevent SVA from touching the CD while we're freeing it */ - mutex_lock(&arm_smmu_asid_lock); - xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); - mutex_unlock(&arm_smmu_asid_lock); + mutex_lock(&smmu->asid_lock); + xa_erase(&smmu->asid_map, smmu_domain->cd.asid); + mutex_unlock(&smmu->asid_lock); } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && smmu_domain->s2_cfg.vmid) { struct arm_smmu_cmdq_ent cmd = { @@ -2546,11 +2543,11 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
/* Prevent SVA from modifying the ASID until it is written to the CD */ - mutex_lock(&arm_smmu_asid_lock); - ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, + mutex_lock(&smmu->asid_lock); + ret = xa_alloc(&smmu->asid_map, &asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); cd->asid = (u16)asid; - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&smmu->asid_lock); return ret; }
@@ -2845,7 +2842,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_master *master, * arm_smmu_master_domain contents otherwise it could randomly write one * or the other to the CD. */ - lockdep_assert_held(&arm_smmu_asid_lock); + lockdep_assert_held(&master->smmu->asid_lock);
state->want_ats = !state->disable_ats && arm_smmu_ats_supported(master);
@@ -2897,7 +2894,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_master *master, static void arm_smmu_attach_commit(struct arm_smmu_master *master, struct attach_state *state) { - lockdep_assert_held(&arm_smmu_asid_lock); + lockdep_assert_held(&master->smmu->asid_lock);
if (state->want_ats && !master->ats_enabled) { arm_smmu_enable_ats(master); @@ -2959,11 +2956,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) * This allows the STE and the smmu_domain->devices list to * be inconsistent during this routine. */ - mutex_lock(&arm_smmu_asid_lock); + mutex_lock(&smmu->asid_lock);
ret = arm_smmu_attach_prepare(master, domain, &state); if (ret) { - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&smmu->asid_lock); return ret; }
@@ -2987,7 +2984,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
arm_smmu_attach_commit(master, &state); - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&smmu->asid_lock); return 0; }
@@ -3017,7 +3014,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, if (!cdptr) return -ENOMEM;
- mutex_lock(&arm_smmu_asid_lock); + mutex_lock(&master->smmu->asid_lock); ret = arm_smmu_attach_prepare(master, &smmu_domain->domain, &state); if (ret) goto out_unlock; @@ -3027,7 +3024,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, arm_smmu_attach_commit(master, &state);
out_unlock: - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&master->smmu->asid_lock); return ret; }
@@ -3043,12 +3040,12 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
smmu_domain = to_smmu_domain(domain);
- mutex_lock(&arm_smmu_asid_lock); + mutex_lock(&master->smmu->asid_lock); arm_smmu_clear_cd(master, pasid); if (master->ats_enabled) arm_smmu_atc_inv_master(master, pasid); arm_smmu_remove_master_domain(master, &smmu_domain->domain, pasid); - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&master->smmu->asid_lock); }
static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, @@ -3064,7 +3061,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, * Do not allow any ASID to be changed while are working on the STE, * otherwise we could miss invalidations. */ - mutex_lock(&arm_smmu_asid_lock); + mutex_lock(&master->smmu->asid_lock);
/* * The SMMU does not support enabling ATS with bypass/abort. When the @@ -3077,7 +3074,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, arm_smmu_attach_prepare(master, domain, &state); arm_smmu_install_ste_for_dev(master, ste); arm_smmu_attach_commit(master, &state); - mutex_unlock(&arm_smmu_asid_lock); + mutex_unlock(&master->smmu->asid_lock);
/* * This has to be done after removing the master from the @@ -4033,6 +4030,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu) smmu->strtab_cfg.strtab_base = reg;
ida_init(&smmu->vmid_map); + xa_init_flags(&smmu->asid_map, XA_FLAGS_ALLOC1); + mutex_init(&smmu->asid_lock);
return 0; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index dbfc63f303c3..3992047fcb71 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -738,6 +738,8 @@ struct arm_smmu_device {
#define ARM_SMMU_MAX_ASIDS (1 << 16) unsigned int asid_bits; + struct xarray asid_map; + struct mutex asid_lock;
#define ARM_SMMU_MAX_VMIDS (1 << 16) unsigned int vmid_bits; @@ -818,9 +820,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) return container_of(dom, struct arm_smmu_domain, domain); }
-extern struct xarray arm_smmu_asid_xa; -extern struct mutex arm_smmu_asid_lock; - struct arm_smmu_domain *arm_smmu_domain_alloc(void);
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid);
From: Jason Gunthorpe jgg@nvidia.com
BTM support is a feature where the CPU TLB invalidation can be forwarded to the IOMMU and also invalidate the IOTLB. For this to work the CPU and IOMMU ASID must be the same.
Retain the prior SVA design here of keeping the ASID allocator for the IOMMU private to SMMU and force SVA domains to set an ASID that matches the CPU ASID.
This requires changing the ASID assigned to a S1 domain if it happens to be overlapping with the required CPU ASID. We hold on to the CPU ASID so long as the SVA iommu_domain exists, so SVA domain conflict is not possible.
With the asid per-smmu we no longer have a problem that two per-smmu iommu_domain's would need to share a CPU ASID entry in the IOMMU's xarray.
Use the same ASID move algorithm for the S1 domains as before with some streamlining around how the xarray is being used. Do not synchronize the ASID's if BTM mode is not supported. Just leave BTM features off everywhere.
Audit all the places that touch cd->asid and think carefully about how the locking works with the change to the cd->asid by the move algorithm. Use xarray internal locking during xa_alloc() instead of double locking. Add a note that concurrent S1 invalidation doesn't fully work.
Note that this is all still dead code, ARM_SMMU_FEAT_BTM is never set.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 133 ++++++++++++++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- 3 files changed, 129 insertions(+), 21 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index f49ba81c25f0..e35ae178a7b1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -14,12 +14,33 @@
static DEFINE_MUTEX(sva_lock);
-static void __maybe_unused -arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) +static int arm_smmu_realloc_s1_domain_asid(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain) { struct arm_smmu_master_domain *master_domain; + u32 old_asid = smmu_domain->cd.asid; struct arm_smmu_cd target_cd; unsigned long flags; + int ret; + + lockdep_assert_held(&smmu->asid_lock); + + /* + * FIXME: The unmap and invalidation path doesn't take any locks but + * this is not fully safe. Since updating the CD tables is not atomic + * there is always a hole where invalidating only one ASID of two active + * ASIDs during unmap will cause the IOTLB to become stale. + * + * This approach is to hopefully shift the racing CPUs to the new ASID + * before we start programming the HW. This increases the chance that + * racing IOPTE changes will pick up an invalidation for the new ASID + * and we achieve eventual consistency. For the brief period where the + * old ASID is still in the CD entries it will become incoherent. + */ + ret = xa_alloc(&smmu->asid_map, &smmu_domain->cd.asid, smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); + if (ret) + return ret;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { @@ -35,6 +56,10 @@ arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) &target_cd); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + /* Clean the ASID we are about to assign to a new translation */ + arm_smmu_tlb_inv_asid(smmu, old_asid); + return 0; }
static u64 page_size_to_cd(void) @@ -152,12 +177,12 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, }
if (!smmu_domain->btm_invalidation) { + ioasid_t asid = READ_ONCE(smmu_domain->cd.asid); + if (!size) - arm_smmu_tlb_inv_asid(smmu_domain->smmu, - smmu_domain->cd.asid); + arm_smmu_tlb_inv_asid(smmu_domain->smmu, asid); else - arm_smmu_tlb_inv_range_asid(start, size, - smmu_domain->cd.asid, + arm_smmu_tlb_inv_range_asid(start, size, asid, PAGE_SIZE, false, smmu_domain); } @@ -186,6 +211,8 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) cdptr = arm_smmu_get_cd_ptr(master, master_domain->ssid); if (WARN_ON(!cdptr)) continue; + + /* An SVA ASID never changes, no asid_lock required */ arm_smmu_make_sva_cd(&target, master, NULL, smmu_domain->cd.asid, smmu_domain->btm_invalidation); @@ -381,6 +408,8 @@ static void arm_smmu_sva_domain_free(struct iommu_domain *domain) * unnecessary invalidation. */ arm_smmu_domain_free_id(smmu_domain); + if (smmu_domain->btm_invalidation) + arm64_mm_context_put(domain->mm);
/* * Actual free is defered to the SRCU callback @@ -394,13 +423,97 @@ static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { .free = arm_smmu_sva_domain_free };
+static int arm_smmu_share_asid(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, + struct mm_struct *mm) +{ + struct arm_smmu_domain *old_s1_domain; + int ret; + + /* + * Notice that BTM is never currently enabled, this is all dead code. + * The specification cautions: + * + * Note: Arm expects that SMMU stage 2 address spaces are generally + * shared with their respective PE virtual machine stage 2 + * configuration. If broadcast invalidation is required to be avoided + * for a particular SMMU stage 2 address space, Arm recommends that a + * hypervisor configures the STE with a VMID that is not allocated for + * virtual machine use on the PEs + * + * However, in Linux, both KVM and SMMU think they own the VMID pool. + * Unfortunately the ARM design is problematic for Linux as we do not + * currently share the S2 table with KVM. This creates a situation where + * the S2 needs to have the same VMID as KVM in order to allow the guest + * to use BTM, however we must still invalidate the S2 directly since it + * is a different radix tree. What Linux would like is something like + * ASET for the STE to disable BTM only for the S2. + * + * Arguably in a system with BTM the driver should prefer to use a S1 + * table in all cases execpt when explicitly asked to create a nesting + * parent. Then it should use the VMID of KVM to enable BTM in the + * guest. We cannot optimize away the resulting double invalidation of + * the S2 :( Or we simply ignore BTM entirely as we are doing now. + */ + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) + return xa_alloc(&smmu->asid_map, &smmu_domain->cd.asid, + smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), + GFP_KERNEL); + + /* At this point the caller ensures we have a mmget() */ + smmu_domain->cd.asid = arm64_mm_context_get(mm); + + mutex_lock(&smmu->asid_lock); + old_s1_domain = xa_store(&smmu->asid_map, smmu_domain->cd.asid, + smmu_domain, GFP_KERNEL); + if (xa_err(old_s1_domain)) { + ret = xa_err(old_s1_domain); + goto out_put_asid; + } + + /* + * In BTM mode the CPU ASID and the IOMMU ASID have to be the same. + * Unfortunately we run separate allocators for this and the IOMMU + * ASID can already have been assigned to a S1 domain. SVA domains + * always align to their CPU ASIDs. In this case we change + * the S1 domain's ASID, update the CD entry and flush the caches. + * + * This is a bit tricky, all the places writing to a S1 CD, reading the + * S1 ASID, or doing xa_erase must hold the asid_lock or xa_lock to + * avoid IOTLB incoherence. + */ + if (old_s1_domain) { + if (WARN_ON(old_s1_domain->domain.type == IOMMU_DOMAIN_SVA)) { + ret = -EINVAL; + goto out_restore_s1; + } + ret = arm_smmu_realloc_s1_domain_asid(smmu, old_s1_domain); + if (ret) + goto out_restore_s1; + } + + smmu_domain->btm_invalidation = true; + + ret = 0; + goto out_unlock; + +out_restore_s1: + xa_store(&smmu->asid_map, smmu_domain->cd.asid, old_s1_domain, + GFP_KERNEL); +out_put_asid: + arm64_mm_context_put(mm); +out_unlock: + mutex_unlock(&smmu->asid_lock); + return ret; +} + struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct mm_struct *mm) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_domain *smmu_domain; - u32 asid; int ret;
smmu_domain = arm_smmu_domain_alloc(); @@ -411,12 +524,10 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; smmu_domain->smmu = smmu;
- ret = xa_alloc(&smmu->asid_map, &asid, smmu_domain, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); + ret = arm_smmu_share_asid(smmu, smmu_domain, mm); if (ret) goto err_free;
- smmu_domain->cd.asid = asid; smmu_domain->mmu_notifier.ops = &arm_smmu_mmu_notifier_ops; ret = mmu_notifier_register(&smmu_domain->mmu_notifier, mm); if (ret) @@ -426,6 +537,8 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev,
err_asid: arm_smmu_domain_free_id(smmu_domain); + if (smmu_domain->btm_invalidation) + arm64_mm_context_put(mm); err_free: kfree(smmu_domain); return ERR_PTR(ret); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6df98d8d7aeb..fdf5ad269769 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1528,6 +1528,8 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
+ lockdep_assert_held(&master->smmu->asid_lock); + memset(target, 0, sizeof(*target));
target->data[0] = cpu_to_le64( @@ -2291,7 +2293,7 @@ static void arm_smmu_tlb_inv_context(void *cookie) * careful, 007. */ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { - arm_smmu_tlb_inv_asid(smmu, smmu_domain->cd.asid); + arm_smmu_tlb_inv_asid(smmu, READ_ONCE(smmu_domain->cd.asid)); } else { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; @@ -2538,17 +2540,10 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain) { - int ret; - u32 asid; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
- /* Prevent SVA from modifying the ASID until it is written to the CD */ - mutex_lock(&smmu->asid_lock); - ret = xa_alloc(&smmu->asid_map, &asid, smmu_domain, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); - cd->asid = (u16)asid; - mutex_unlock(&smmu->asid_lock); - return ret; + return xa_alloc(&smmu->asid_map, &cd->asid, smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); }
static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3992047fcb71..1cd4161c329e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -633,7 +633,7 @@ struct arm_smmu_strtab_l1_desc { };
struct arm_smmu_ctx_desc { - u16 asid; + u32 asid; };
struct arm_smmu_l1_ctx_desc {
From: Jason Gunthorpe jgg@nvidia.com
The HW supports this, use the S1DSS bits to configure the behavior of SSID=0 which is the RID's translation.
If SSID's are currently being used in the CD table then just update the S1DSS bits in the STE, remove the master_domain and leave ATS alone.
For iommufd the driver design has a small problem that all the unused CD table entries are set with V=0 which will generate an event if VFIO userspace tries to use the CD entry. This patch extends this problem to include the RID as well if PASID is being used.
For BLOCKED with used PASIDs the F_STREAM_DISABLED (STRTAB_STE_1_S1DSS_TERMINATE) event is generated on untagged traffic and a substream CD table entry with V=0 (removed pasid) will generate C_BAD_CD. Arguably there is no advantage to using S1DSS over the CD entry 0 with V=0.
As we don't yet support PASID in iommufd this is a problem to resolve later, possibly by using EPD0 for unused CD table entries instead of V=0, and not using S1DSS for BLOCKED.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++------ 1 file changed, 47 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index fdf5ad269769..1f4688f77a3e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1220,6 +1220,14 @@ static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | STRTAB_STE_1_EATS); used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + + /* + * See 13.5 Summary of attribute/permission configuration fields + * for the SHCFG behavior. + */ + if (FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) == + STRTAB_STE_1_S1DSS_BYPASS) + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); }
/* S2 translates */ @@ -1742,7 +1750,7 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, - bool ats_enabled) + bool ats_enabled, unsigned int s1dss) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1756,7 +1764,7 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
target->data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1DSS, s1dss) | FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | @@ -1767,6 +1775,10 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_1_EATS, ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+ if (s1dss == STRTAB_STE_1_S1DSS_BYPASS) + target->data[1] |= cpu_to_le64(FIELD_PREP( + STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); + if (smmu->features & ARM_SMMU_FEAT_E2H) { /* * To support BTM the streamworld needs to match the @@ -2966,7 +2978,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, &target_cd); - arm_smmu_make_cdtable_ste(&target, master, state.want_ats); + arm_smmu_make_cdtable_ste(&target, master, state.want_ats, + STRTAB_STE_1_S1DSS_SSID0); arm_smmu_install_ste_for_dev(master, &target); break; } @@ -3043,15 +3056,14 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) mutex_unlock(&master->smmu->asid_lock); }
-static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, - struct device *dev, struct arm_smmu_ste *ste) +static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, + struct device *dev, + struct arm_smmu_ste *ste, + unsigned int s1dss) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct attach_state state = {.ssid = IOMMU_NO_PASID};
- if (arm_smmu_ssids_in_use(&master->cd_table)) - return -EBUSY; - /* * Do not allow any ASID to be changed while are working on the STE, * otherwise we could miss invalidations. @@ -3059,14 +3071,29 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, mutex_lock(&master->smmu->asid_lock);
/* - * The SMMU does not support enabling ATS with bypass/abort. When the - * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests - * and Translated transactions are denied as though ATS is disabled for - * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and - * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). + * If the CD table is not in use we can use the provided STE, otherwise + * we use a cdtable STE with the provided S1DSS. */ - state.disable_ats = true; - arm_smmu_attach_prepare(master, domain, &state); + if (arm_smmu_ssids_in_use(&master->cd_table)) { + /* + * If a CD table has to be present then we need to run with ATS + * on even though the RID will fail ATS queries with UR. This is + * because we have no idea what the PASID's need. + */ + arm_smmu_attach_prepare(master, domain, &state); + arm_smmu_make_cdtable_ste(ste, master, state.want_ats, s1dss); + } else { + /* + * The SMMU does not support enabling ATS with bypass/abort. + * When the STE is in bypass (STE.Config[2:0] == 0b100), ATS + * Translation Requests and Translated transactions are denied + * as though ATS is disabled for the stream (STE.EATS == 0b00), + * causing F_BAD_ATS_TREQ and F_TRANSL_FORBIDDEN events + * (IHI0070Ea 5.2 Stream Table Entry). + */ + state.disable_ats = true; + arm_smmu_attach_prepare(master, domain, &state); + } arm_smmu_install_ste_for_dev(master, ste); arm_smmu_attach_commit(master, &state); mutex_unlock(&master->smmu->asid_lock); @@ -3077,7 +3104,6 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, * descriptor from arm_smmu_share_asid(). */ arm_smmu_clear_cd(master, IOMMU_NO_PASID); - return 0; }
static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, @@ -3086,7 +3112,8 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_bypass_ste(&ste); - return arm_smmu_attach_dev_ste(domain, dev, &ste); + arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS); + return 0; }
static const struct iommu_domain_ops arm_smmu_identity_ops = { @@ -3104,7 +3131,9 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_abort_ste(&ste); - return arm_smmu_attach_dev_ste(domain, dev, &ste); + arm_smmu_attach_dev_ste(domain, dev, &ste, + STRTAB_STE_1_S1DSS_TERMINATE); + return 0; }
static const struct iommu_domain_ops arm_smmu_blocked_ops = {
From: Jason Gunthorpe jgg@nvidia.com
If the STE doesn't point to the CD table we can upgrade it by reprogramming the STE with the appropriate S1DSS. We may also need to turn on ATS at the same time.
Keep track if the installed STE is pointing at the cd_table and the ATS state to trigger this path.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 ++++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++- 2 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 1f4688f77a3e..65f02d6a1091 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2670,6 +2670,13 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, int i, j; struct arm_smmu_device *smmu = master->smmu;
+ master->cd_table.in_ste = + FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(target->data[0])) == + STRTAB_STE_0_CFG_S1_TRANS; + master->ste_ats_enabled = + FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(target->data[1])) == + STRTAB_STE_1_EATS_TRANS; + for (i = 0; i < master->num_streams; ++i) { u32 sid = master->streams[i].id; struct arm_smmu_ste *step = @@ -2996,27 +3003,47 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
-static bool arm_smmu_is_s1_domain(struct iommu_domain *domain) +static void arm_smmu_update_ste(struct arm_smmu_master *master, + struct iommu_domain *sid_domain, + bool want_ats) { - if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) - return false; - return to_smmu_domain(domain)->stage == ARM_SMMU_DOMAIN_S1; + unsigned int s1dss = STRTAB_STE_1_S1DSS_TERMINATE; + struct arm_smmu_ste ste; + + if (master->cd_table.in_ste && master->ste_ats_enabled == want_ats) + return; + + if (sid_domain->type == IOMMU_DOMAIN_IDENTITY) + s1dss = STRTAB_STE_1_S1DSS_BYPASS; + else + WARN_ON(sid_domain->type != IOMMU_DOMAIN_BLOCKED); + + /* + * Change the STE into a cdtable one with SID IDENTITY/BLOCKED behavior + * using s1dss if necessary. The cd_table is already installed then + * the S1DSS is correct and this will just update the EATS. Otherwise + * it installs the entire thing. This will be hitless. + */ + arm_smmu_make_cdtable_ste(&ste, master, want_ats, s1dss); + arm_smmu_install_ste_for_dev(master, &ste); }
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, const struct arm_smmu_cd *cd) { + struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct attach_state state = {.ssid = pasid}; struct arm_smmu_cd *cdptr; int ret;
- if (smmu_domain->smmu != master->smmu) + if (smmu_domain->smmu != master->smmu || pasid == IOMMU_NO_PASID) return -EINVAL;
- if (!arm_smmu_is_s1_domain(iommu_get_domain_for_dev(master->dev)) || - !master->cd_table.used_sid) - return -ENODEV; + if (!master->cd_table.in_ste && + sid_domain->type != IOMMU_DOMAIN_IDENTITY && + sid_domain->type != IOMMU_DOMAIN_BLOCKED) + return -EINVAL;
cdptr = arm_smmu_get_cd_ptr(master, pasid); if (!cdptr) @@ -3028,6 +3055,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, goto out_unlock;
arm_smmu_write_cd_entry(master, pasid, cdptr, cd); + arm_smmu_update_ste(master, sid_domain, state.want_ats);
arm_smmu_attach_commit(master, &state);
@@ -3041,6 +3069,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_domain *smmu_domain; struct iommu_domain *domain; + bool last_ssid = master->cd_table.used_ssids == 1;
domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); if (WARN_ON(IS_ERR(domain)) || !domain) @@ -3054,6 +3083,17 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) arm_smmu_atc_inv_master(master, pasid); arm_smmu_remove_master_domain(master, &smmu_domain->domain, pasid); mutex_unlock(&master->smmu->asid_lock); + + /* + * When the last user of the CD table goes away downgrade the STE back + * to a non-cd_table one. + */ + if (last_ssid && !master->cd_table.used_sid) { + struct iommu_domain *sid_domain = + iommu_get_domain_for_dev(master->dev); + + sid_domain->ops->attach_dev(sid_domain, master->dev); + } }
static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 1cd4161c329e..ed5f241c4613 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -647,7 +647,8 @@ struct arm_smmu_ctx_desc_cfg { struct arm_smmu_l1_ctx_desc *l1_desc; unsigned int num_l1_ents; unsigned int used_ssids; - bool used_sid; + u8 used_sid; + u8 in_ste; u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; @@ -773,7 +774,8 @@ struct arm_smmu_master { /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; unsigned int num_streams; - bool ats_enabled; + bool ats_enabled : 1; + bool ste_ats_enabled : 1; bool stall_enabled; bool sva_enabled; bool iopf_enabled;
From: Jason Gunthorpe jgg@nvidia.com
The SVA cleanup made the SSID logic entirely general so all we need to do is call it with the correct cd table entry for a S1 domain.
This is slightly tricky because of the ASID and how the locking works, the simple fix is to just update the ASID once we get the right locks.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 45 +++++++++++++++++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- 2 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 65f02d6a1091..1b32f5cbf90c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1536,8 +1536,6 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
- lockdep_assert_held(&master->smmu->asid_lock); - memset(target, 0, sizeof(*target));
target->data[0] = cpu_to_le64( @@ -3003,6 +3001,36 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
+static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_cd target_cd; + int ret = 0; + + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->smmu) + ret = arm_smmu_domain_finalise(smmu_domain, smmu); + else if (smmu_domain->smmu != smmu) + ret = -EINVAL; + mutex_unlock(&smmu_domain->init_mutex); + if (ret) + return ret; + + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -EINVAL; + + /* + * We can read cd.asid outside the lock because arm_smmu_set_pasid() + * will fix it + */ + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + return arm_smmu_set_pasid(master, to_smmu_domain(domain), id, + &target_cd); +} + static void arm_smmu_update_ste(struct arm_smmu_master *master, struct iommu_domain *sid_domain, bool want_ats) @@ -3030,7 +3058,7 @@ static void arm_smmu_update_ste(struct arm_smmu_master *master,
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, - const struct arm_smmu_cd *cd) + struct arm_smmu_cd *cd) { struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct attach_state state = {.ssid = pasid}; @@ -3054,6 +3082,14 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, if (ret) goto out_unlock;
+ /* + * We don't want to obtain to the asid_lock too early, so fix up the + * caller set ASID under the lock in case it changed. + */ + cd->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_ASID); + cd->data[0] |= cpu_to_le64( + FIELD_PREP(CTXDESC_CD_0_ASID, smmu_domain->cd.asid)); + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); arm_smmu_update_ste(master, sid_domain, state.want_ats);
@@ -3071,7 +3107,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) struct iommu_domain *domain; bool last_ssid = master->cd_table.used_ssids == 1;
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); + domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); if (WARN_ON(IS_ERR(domain)) || !domain) return;
@@ -3804,6 +3840,7 @@ static struct iommu_ops arm_smmu_ops = { .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = arm_smmu_attach_dev, + .set_dev_pasid = arm_smmu_s1_set_dev_pasid, .map_pages = arm_smmu_map_pages, .unmap_pages = arm_smmu_unmap_pages, .flush_iotlb_all = arm_smmu_flush_iotlb_all, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index ed5f241c4613..4481ba7f7ab7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -836,7 +836,7 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, - const struct arm_smmu_cd *cd); + struct arm_smmu_cd *cd); void arm_smmu_remove_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid);
From: Jason Gunthorpe jgg@nvidia.com
This is being used as both an array of STEs and an array of L1 descriptors.
Give the two usages different names and correct types.
Remove STRTAB_STE_DWORDS as most usages were indexing or sizing an array of struct arm_smmu_ste.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 +++++++++------------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++---- 2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 1b32f5cbf90c..a0e306c2dc9d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1873,8 +1873,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) if (desc->l2ptr) return 0;
- size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); - strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS]; + size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste); + strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
desc->span = STRTAB_SPLIT + 1; desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma, @@ -2657,8 +2657,7 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) return &cfg->l1_desc[idx1].l2ptr[idx2]; } else { /* Simple linear lookup */ - return (struct arm_smmu_ste *)&cfg - ->strtab[sid * STRTAB_STE_DWORDS]; + return &cfg->strtab.linear[sid]; } }
@@ -3981,17 +3980,15 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) { unsigned int i; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - void *strtab = smmu->strtab_cfg.strtab;
cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents, sizeof(*cfg->l1_desc), GFP_KERNEL); if (!cfg->l1_desc) return -ENOMEM;
- for (i = 0; i < cfg->num_l1_ents; ++i) { - arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); - strtab += STRTAB_L1_DESC_DWORDS << 3; - } + for (i = 0; i < cfg->num_l1_ents; ++i) + arm_smmu_write_strtab_l1_desc( + &smmu->strtab_cfg.strtab.l1_desc[i], &cfg->l1_desc[i]);
return 0; } @@ -4067,7 +4064,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) l1size); return -ENOMEM; } - cfg->strtab = strtab; + cfg->strtab.l1_desc = strtab;
/* Configure strtab_base_cfg for 2 levels */ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL); @@ -4091,7 +4088,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) u32 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3); + size = (1 << smmu->sid_bits) * sizeof(cfg->strtab.linear[0]); strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma, GFP_KERNEL); if (!strtab) { @@ -4100,7 +4097,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) size); return -ENOMEM; } - cfg->strtab = strtab; + cfg->strtab.linear = strtab; cfg->num_l1_ents = 1 << smmu->sid_bits;
/* Configure strtab_base_cfg for a linear table covering all SIDs */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 4481ba7f7ab7..0fd5d74f1fdd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -235,10 +235,8 @@ #define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0) #define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
-#define STRTAB_STE_DWORDS 8 - struct arm_smmu_ste { - __le64 data[STRTAB_STE_DWORDS]; + __le64 data[8]; };
#define STRTAB_STE_0_V (1UL << 0) @@ -667,7 +665,10 @@ struct arm_smmu_s2_cfg { };
struct arm_smmu_strtab_cfg { - __le64 *strtab; + union { + struct arm_smmu_ste *linear; + __le64 *l1_desc; + } strtab; dma_addr_t strtab_dma; struct arm_smmu_strtab_l1_desc *l1_desc; unsigned int num_l1_ents;
From: Jason Gunthorpe jgg@nvidia.com
This is being used as both an array of CDs and an array of L1 descriptors.
Give the two usages different names and correct types.
Remove CTXDESC_CD_DWORDS as most usages were indexing or sizing an array of struct arm_smmu_cd.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 58 ++++++++++----------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 17 ++++-- 2 files changed, 40 insertions(+), 35 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a0e306c2dc9d..0db7f59b9ed6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1408,7 +1408,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master, static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, struct arm_smmu_l1_ctx_desc *l1_desc) { - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l1_desc->l2ptr_dma, GFP_KERNEL); @@ -1439,14 +1439,13 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (!master->cd_table.cdtab) { + if (!arm_smmu_cdtab_allocated(&master->cd_table)) { if (arm_smmu_alloc_cd_tables(master)) return NULL; }
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) - return (struct arm_smmu_cd *)(cd_table->cdtab + - ssid * CTXDESC_CD_DWORDS); + return cd_table->cdtab.linear + ssid;
idx = ssid >> CTXDESC_SPLIT; l1_desc = &cd_table->l1_desc[idx]; @@ -1454,7 +1453,7 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) return NULL;
- l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; + l1ptr = cd_table->cdtab.l1_desc + idx; arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); /* An invalid L1CD can be cached */ arm_smmu_sync_cd(master, ssid, false); @@ -1573,7 +1572,7 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) struct arm_smmu_cd target = {}; struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab) + if (!arm_smmu_cdtab_allocated(&master->cd_table)) return; cdptr = arm_smmu_get_cd_ptr(master, ssid); if (WARN_ON(!cdptr)) @@ -1583,8 +1582,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) { - int ret; - size_t l1size; size_t max_contexts; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; @@ -1597,8 +1594,14 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; cd_table->num_l1_ents = max_contexts;
- l1size = max_contexts * (CTXDESC_CD_DWORDS << 3); + cd_table->cdtab.linear = dmam_alloc_coherent( + smmu->dev, max_contexts * sizeof(struct arm_smmu_cd), + &cd_table->cdtab_dma, GFP_KERNEL); + if (!cd_table->cdtab.linear) + return -ENOMEM; } else { + size_t l1size; + cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2; cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts, CTXDESC_L2_ENTRIES); @@ -1610,24 +1613,15 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + cd_table->cdtab.l1_desc = dmam_alloc_coherent( + smmu->dev, l1size, &cd_table->cdtab_dma, GFP_KERNEL); + if (!cd_table->cdtab.l1_desc) { + devm_kfree(smmu->dev, cd_table->l1_desc); + cd_table->l1_desc = NULL; + return -ENOMEM; + } } - - cd_table->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cd_table->cdtab_dma, - GFP_KERNEL); - if (!cd_table->cdtab) { - dev_warn(smmu->dev, "failed to allocate context descriptor\n"); - ret = -ENOMEM; - goto err_free_l1; - } - return 0; - -err_free_l1: - if (cd_table->l1_desc) { - devm_kfree(smmu->dev, cd_table->l1_desc); - cd_table->l1_desc = NULL; - } - return ret; }
static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) @@ -1638,7 +1632,7 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->l1_desc) { - size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
for (i = 0; i < cd_table->num_l1_ents; i++) { if (!cd_table->l1_desc[i].l2ptr) @@ -1652,13 +1646,17 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) cd_table->l1_desc = NULL;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc, + cd_table->cdtab_dma); } else { - l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3); + dmam_free_coherent(smmu->dev, + cd_table->num_l1_ents * + sizeof(struct arm_smmu_cd), + cd_table->cdtab.linear, cd_table->cdtab_dma); }
- dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma); cd_table->cdtab_dma = 0; - cd_table->cdtab = NULL; + cd_table->cdtab.linear = NULL; }
/* Stream table manipulation functions */ @@ -3481,7 +3479,7 @@ static void arm_smmu_release_device(struct device *dev)
arm_smmu_disable_pasid(master); arm_smmu_remove_master(master); - if (master->cd_table.cdtab) + if (arm_smmu_cdtab_allocated(&master->cd_table)) arm_smmu_free_cd_tables(master); kfree(master); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 0fd5d74f1fdd..518c5f07a09b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -309,10 +309,8 @@ struct arm_smmu_ste { #define CTXDESC_L1_DESC_V (1UL << 0) #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
-#define CTXDESC_CD_DWORDS 8 - struct arm_smmu_cd { - __le64 data[CTXDESC_CD_DWORDS]; + __le64 data[8]; };
#define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) @@ -345,7 +343,7 @@ struct arm_smmu_cd { * When the SMMU only supports linear context descriptor tables, pick a * reasonable size limit (64kB). */ -#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3)) +#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / sizeof(struct arm_smmu_cd))
/* Command queue */ #define CMDQ_ENT_SZ_SHIFT 4 @@ -640,7 +638,10 @@ struct arm_smmu_l1_ctx_desc { };
struct arm_smmu_ctx_desc_cfg { - __le64 *cdtab; + union { + struct arm_smmu_cd *linear; + __le64 *l1_desc; + } cdtab; dma_addr_t cdtab_dma; struct arm_smmu_l1_ctx_desc *l1_desc; unsigned int num_l1_ents; @@ -652,6 +653,12 @@ struct arm_smmu_ctx_desc_cfg { u8 s1cdmax; };
+static inline bool +arm_smmu_cdtab_allocated(struct arm_smmu_ctx_desc_cfg *cd_table) +{ + return cd_table->cdtab.linear; +} + /* True if the cd table has SSIDS > 0 in use. */ static inline bool arm_smmu_ssids_in_use(struct arm_smmu_ctx_desc_cfg *cd_table) {
From: Jason Gunthorpe jgg@nvidia.com
The master->cd_table is entirely contained within the struct arm_smmu_master which is guaranteed to be freed by the core code under arm_smmu_release_device().
There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably called to free the CD related memory. Remove it and save some memory.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 44 +++++++++------------ 1 file changed, 19 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0db7f59b9ed6..69b2eaf22cba 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1410,13 +1410,10 @@ static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, { size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
- l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, - &l1_desc->l2ptr_dma, GFP_KERNEL); - if (!l1_desc->l2ptr) { - dev_warn(smmu->dev, - "failed to allocate context descriptor table\n"); + l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size, + &l1_desc->l2ptr_dma, GFP_KERNEL); + if (!l1_desc->l2ptr) return -ENOMEM; - } return 0; }
@@ -1594,7 +1591,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; cd_table->num_l1_ents = max_contexts;
- cd_table->cdtab.linear = dmam_alloc_coherent( + cd_table->cdtab.linear = dma_alloc_coherent( smmu->dev, max_contexts * sizeof(struct arm_smmu_cd), &cd_table->cdtab_dma, GFP_KERNEL); if (!cd_table->cdtab.linear) @@ -1606,17 +1603,17 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts, CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = devm_kcalloc(smmu->dev, cd_table->num_l1_ents, - sizeof(*cd_table->l1_desc), - GFP_KERNEL); + cd_table->l1_desc = kcalloc(cd_table->num_l1_ents, + sizeof(*cd_table->l1_desc), + GFP_KERNEL); if (!cd_table->l1_desc) return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); - cd_table->cdtab.l1_desc = dmam_alloc_coherent( + cd_table->cdtab.l1_desc = dma_alloc_coherent( smmu->dev, l1size, &cd_table->cdtab_dma, GFP_KERNEL); if (!cd_table->cdtab.l1_desc) { - devm_kfree(smmu->dev, cd_table->l1_desc); + kfree(cd_table->l1_desc); cd_table->l1_desc = NULL; return -ENOMEM; } @@ -1638,25 +1635,22 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) if (!cd_table->l1_desc[i].l2ptr) continue;
- dmam_free_coherent(smmu->dev, size, - cd_table->l1_desc[i].l2ptr, - cd_table->l1_desc[i].l2ptr_dma); + dma_free_coherent(smmu->dev, size, + cd_table->l1_desc[i].l2ptr, + cd_table->l1_desc[i].l2ptr_dma); } - devm_kfree(smmu->dev, cd_table->l1_desc); + kfree(cd_table->l1_desc); cd_table->l1_desc = NULL;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); - dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc, - cd_table->cdtab_dma); + dma_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc, + cd_table->cdtab_dma); } else { - dmam_free_coherent(smmu->dev, - cd_table->num_l1_ents * - sizeof(struct arm_smmu_cd), - cd_table->cdtab.linear, cd_table->cdtab_dma); + dma_free_coherent(smmu->dev, + cd_table->num_l1_ents * + sizeof(struct arm_smmu_cd), + cd_table->cdtab.linear, cd_table->cdtab_dma); } - - cd_table->cdtab_dma = 0; - cd_table->cdtab.linear = NULL; }
/* Stream table manipulation functions */
From: Jason Gunthorpe jgg@nvidia.com
Previous patches emptied these structs out, they now only hold the two different kinds of IOTLB cache tag (vmid/asid).
As the cache tag is now fully a property of the domain, and its lifecycle is linked to the domain, remove the structs and inline it into struct arm_smmu_domain.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 20 +++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 37 ++++++++----------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 12 +----- 3 files changed, 28 insertions(+), 41 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index e35ae178a7b1..b434e63f6556 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -18,7 +18,7 @@ static int arm_smmu_realloc_s1_domain_asid(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_master_domain *master_domain; - u32 old_asid = smmu_domain->cd.asid; + u32 old_asid = smmu_domain->asid; struct arm_smmu_cd target_cd; unsigned long flags; int ret; @@ -37,7 +37,7 @@ static int arm_smmu_realloc_s1_domain_asid(struct arm_smmu_device *smmu, * and we achieve eventual consistency. For the brief period where the * old ASID is still in the CD entries it will become incoherent. */ - ret = xa_alloc(&smmu->asid_map, &smmu_domain->cd.asid, smmu_domain, + ret = xa_alloc(&smmu->asid_map, &smmu_domain->asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); if (ret) return ret; @@ -177,7 +177,7 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, }
if (!smmu_domain->btm_invalidation) { - ioasid_t asid = READ_ONCE(smmu_domain->cd.asid); + ioasid_t asid = READ_ONCE(smmu_domain->asid);
if (!size) arm_smmu_tlb_inv_asid(smmu_domain->smmu, asid); @@ -214,14 +214,14 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
/* An SVA ASID never changes, no asid_lock required */ arm_smmu_make_sva_cd(&target, master, NULL, - smmu_domain->cd.asid, + smmu_domain->asid, smmu_domain->btm_invalidation); arm_smmu_write_cd_entry(master, master_domain->ssid, cdptr, &target); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->asid); arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
@@ -388,7 +388,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, * get reassigned */ arm_smmu_make_sva_cd(&target, master, smmu_domain->domain.mm, - smmu_domain->cd.asid, + smmu_domain->asid, smmu_domain->btm_invalidation);
ret = arm_smmu_set_pasid(master, smmu_domain, id, &target); @@ -456,16 +456,16 @@ static int arm_smmu_share_asid(struct arm_smmu_device *smmu, * the S2 :( Or we simply ignore BTM entirely as we are doing now. */ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) - return xa_alloc(&smmu->asid_map, &smmu_domain->cd.asid, + return xa_alloc(&smmu->asid_map, &smmu_domain->asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
/* At this point the caller ensures we have a mmget() */ - smmu_domain->cd.asid = arm64_mm_context_get(mm); + smmu_domain->asid = arm64_mm_context_get(mm);
mutex_lock(&smmu->asid_lock); - old_s1_domain = xa_store(&smmu->asid_map, smmu_domain->cd.asid, + old_s1_domain = xa_store(&smmu->asid_map, smmu_domain->asid, smmu_domain, GFP_KERNEL); if (xa_err(old_s1_domain)) { ret = xa_err(old_s1_domain); @@ -499,7 +499,7 @@ static int arm_smmu_share_asid(struct arm_smmu_device *smmu, goto out_unlock;
out_restore_s1: - xa_store(&smmu->asid_map, smmu_domain->cd.asid, old_s1_domain, + xa_store(&smmu->asid_map, smmu_domain->asid, old_s1_domain, GFP_KERNEL); out_put_asid: arm64_mm_context_put(mm); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 69b2eaf22cba..960b070dedfd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1526,7 +1526,6 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; const struct io_pgtable_cfg *pgtbl_cfg = &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = @@ -1551,7 +1550,7 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET | - FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) + FIELD_PREP(CTXDESC_CD_0_ASID, smmu_domain->asid) );
if (master->smmu->features & ARM_SMMU_FEAT_HD) @@ -1798,7 +1797,6 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_domain *smmu_domain, bool ats_enabled) { - struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; const struct io_pgtable_cfg *pgtbl_cfg = &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr = @@ -1824,7 +1822,7 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); target->data[2] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | + FIELD_PREP(STRTAB_STE_2_S2VMID, smmu_domain->vmid) | FIELD_PREP(STRTAB_STE_2_VTCR, vtcr_val) | STRTAB_STE_2_S2AA64 | #ifdef __BIG_ENDIAN @@ -2295,10 +2293,10 @@ static void arm_smmu_tlb_inv_context(void *cookie) * careful, 007. */ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { - arm_smmu_tlb_inv_asid(smmu, READ_ONCE(smmu_domain->cd.asid)); + arm_smmu_tlb_inv_asid(smmu, READ_ONCE(smmu_domain->asid)); } else { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; - cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + cmd.tlbi.vmid = smmu_domain->vmid; arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); } arm_smmu_atc_inv_domain(smmu_domain, 0, 0); @@ -2390,10 +2388,10 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { cmd.opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA; - cmd.tlbi.asid = smmu_domain->cd.asid; + cmd.tlbi.asid = smmu_domain->asid; } else { cmd.opcode = CMDQ_OP_TLBI_S2_IPA; - cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + cmd.tlbi.vmid = smmu_domain->vmid; } __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
@@ -2511,22 +2509,22 @@ void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain)
if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1 || smmu_domain->domain.type == IOMMU_DOMAIN_SVA) && - smmu_domain->cd.asid) { - arm_smmu_tlb_inv_asid(smmu, smmu_domain->cd.asid); + smmu_domain->asid) { + arm_smmu_tlb_inv_asid(smmu, smmu_domain->asid);
/* Prevent SVA from touching the CD while we're freeing it */ mutex_lock(&smmu->asid_lock); - xa_erase(&smmu->asid_map, smmu_domain->cd.asid); + xa_erase(&smmu->asid_map, smmu_domain->asid); mutex_unlock(&smmu->asid_lock); } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && - smmu_domain->s2_cfg.vmid) { + smmu_domain->vmid) { struct arm_smmu_cmdq_ent cmd = { .opcode = CMDQ_OP_TLBI_S12_VMALL, - .tlbi.vmid = smmu_domain->s2_cfg.vmid + .tlbi.vmid = smmu_domain->vmid };
arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); - ida_free(&smmu->vmid_map, smmu_domain->s2_cfg.vmid); + ida_free(&smmu->vmid_map, smmu_domain->vmid); } }
@@ -2542,9 +2540,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; - - return xa_alloc(&smmu->asid_map, &cd->asid, smmu_domain, + return xa_alloc(&smmu->asid_map, &smmu_domain->asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); }
@@ -2552,7 +2548,6 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain) { int vmid; - struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
/* Reserve VMID 0 for stage-2 bypass STEs */ vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1, @@ -2560,7 +2555,7 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, if (vmid < 0) return vmid;
- cfg->vmid = (u16)vmid; + smmu_domain->vmid = (u16)vmid; return 0; }
@@ -3078,8 +3073,8 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, * caller set ASID under the lock in case it changed. */ cd->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_ASID); - cd->data[0] |= cpu_to_le64( - FIELD_PREP(CTXDESC_CD_0_ASID, smmu_domain->cd.asid)); + cd->data[0] |= + cpu_to_le64(FIELD_PREP(CTXDESC_CD_0_ASID, smmu_domain->asid));
arm_smmu_write_cd_entry(master, pasid, cdptr, cd); arm_smmu_update_ste(master, sid_domain, state.want_ats); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 518c5f07a09b..309d5685aa16 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -628,10 +628,6 @@ struct arm_smmu_strtab_l1_desc { dma_addr_t l2ptr_dma; };
-struct arm_smmu_ctx_desc { - u32 asid; -}; - struct arm_smmu_l1_ctx_desc { struct arm_smmu_cd *l2ptr; dma_addr_t l2ptr_dma; @@ -667,10 +663,6 @@ static inline bool arm_smmu_ssids_in_use(struct arm_smmu_ctx_desc_cfg *cd_table) return cd_table->used_ssids; }
-struct arm_smmu_s2_cfg { - u16 vmid; -}; - struct arm_smmu_strtab_cfg { union { struct arm_smmu_ste *linear; @@ -805,8 +797,8 @@ struct arm_smmu_domain {
enum arm_smmu_domain_stage stage; union { - struct arm_smmu_ctx_desc cd; - struct arm_smmu_s2_cfg s2_cfg; + u32 asid; + u16 vmid; };
struct iommu_domain domain;
From: Jason Gunthorpe jgg@nvidia.com
Add arm_smmu_domain_alloc_id(), the pair of the existing arm_smmu_domain_free_id(). Move all the xa_alloc/idr calls to here. Replace the function pointer in arm_smmu_domain_finalise() and call it from arm_smmu_share_asid().
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 52 +++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 + 3 files changed, 28 insertions(+), 31 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index b434e63f6556..6252c85b5480 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -456,10 +456,7 @@ static int arm_smmu_share_asid(struct arm_smmu_device *smmu, * the S2 :( Or we simply ignore BTM entirely as we are doing now. */ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) - return xa_alloc(&smmu->asid_map, &smmu_domain->asid, - smmu_domain, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), - GFP_KERNEL); + return arm_smmu_domain_alloc_id(smmu, smmu_domain);
/* At this point the caller ensures we have a mmget() */ smmu_domain->asid = arm64_mm_context_get(mm); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 960b070dedfd..70d60845e03b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2499,6 +2499,30 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) return &smmu_domain->domain; }
+int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain) +{ + if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1 || + smmu_domain->domain.type == IOMMU_DOMAIN_SVA)) { + return xa_alloc(&smmu->asid_map, &smmu_domain->asid, + smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), + GFP_KERNEL); + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) { + int vmid; + + /* Reserve VMID 0 for stage-2 bypass STEs */ + vmid = ida_alloc_range(&smmu->vmid_map, 1, + (1 << smmu->vmid_bits) - 1, GFP_KERNEL); + if (vmid < 0) + return vmid; + smmu_domain->vmid = vmid; + return 0; + } + WARN_ON(true); + return -EINVAL; +} + /* * Return the domain's ASID or VMID back to the allocator. All IDs in the * allocator do not have an IOTLB entries referencing them. @@ -2537,28 +2561,6 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) kfree(smmu_domain); }
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain) -{ - return xa_alloc(&smmu->asid_map, &smmu_domain->asid, smmu_domain, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); -} - -static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain) -{ - int vmid; - - /* Reserve VMID 0 for stage-2 bypass STEs */ - vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1, - GFP_KERNEL); - if (vmid < 0) - return vmid; - - smmu_domain->vmid = (u16)vmid; - return 0; -} - static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct arm_smmu_device *smmu) { @@ -2567,8 +2569,6 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, enum io_pgtable_fmt fmt; struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; - int (*finalise_stage_fn)(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain);
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2582,13 +2582,11 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, ias = min_t(unsigned long, ias, VA_BITS); oas = smmu->ias; fmt = ARM_64_LPAE_S1; - finalise_stage_fn = arm_smmu_domain_finalise_s1; break; case ARM_SMMU_DOMAIN_S2: ias = smmu->ias; oas = smmu->oas; fmt = ARM_64_LPAE_S2; - finalise_stage_fn = arm_smmu_domain_finalise_s2; break; default: return -EINVAL; @@ -2619,7 +2617,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; smmu_domain->domain.geometry.force_aperture = true;
- ret = finalise_stage_fn(smmu, smmu_domain); + ret = arm_smmu_domain_alloc_id(smmu, smmu_domain); if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 309d5685aa16..e4eef09ca1b3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -840,6 +840,8 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, void arm_smmu_remove_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid);
+int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain); void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
From: Jason Gunthorpe jgg@nvidia.com
The replacement API takes in a struct struct arm_smmu_domain that must be a S1 domain and invalidates the entire thing.
This isolates the callers from more of the invalidation machinery.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 +++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 6252c85b5480..a330c78fd9de 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -18,7 +18,6 @@ static int arm_smmu_realloc_s1_domain_asid(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_master_domain *master_domain; - u32 old_asid = smmu_domain->asid; struct arm_smmu_cd target_cd; unsigned long flags; int ret; @@ -56,9 +55,6 @@ static int arm_smmu_realloc_s1_domain_asid(struct arm_smmu_device *smmu, &target_cd); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); - - /* Clean the ASID we are about to assign to a new translation */ - arm_smmu_tlb_inv_asid(smmu, old_asid); return 0; }
@@ -180,7 +176,7 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, ioasid_t asid = READ_ONCE(smmu_domain->asid);
if (!size) - arm_smmu_tlb_inv_asid(smmu_domain->smmu, asid); + arm_smmu_tlb_inv_all_s1(smmu_domain); else arm_smmu_tlb_inv_range_asid(start, size, asid, PAGE_SIZE, false, @@ -221,7 +217,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->asid); + arm_smmu_tlb_inv_all_s1(smmu_domain); arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
@@ -488,6 +484,9 @@ static int arm_smmu_share_asid(struct arm_smmu_device *smmu, ret = arm_smmu_realloc_s1_domain_asid(smmu, old_s1_domain); if (ret) goto out_restore_s1; + + /* Clean the ASID since it was just recovered before using it */ + arm_smmu_tlb_inv_all_s1(smmu_domain); }
smmu_domain->btm_invalidation = true; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 70d60845e03b..588cf7e8448d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1183,12 +1183,13 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused }
/* Context descriptor manipulation functions */ -void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) +void arm_smmu_tlb_inv_all_s1(struct arm_smmu_domain *smmu_domain) { + struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cmdq_ent cmd = { .opcode = smmu->features & ARM_SMMU_FEAT_E2H ? CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID, - .tlbi.asid = asid, + .tlbi.asid = READ_ONCE(smmu_domain->asid), };
arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); @@ -2293,7 +2294,7 @@ static void arm_smmu_tlb_inv_context(void *cookie) * careful, 007. */ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { - arm_smmu_tlb_inv_asid(smmu, READ_ONCE(smmu_domain->asid)); + arm_smmu_tlb_inv_all_s1(smmu_domain); } else { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; cmd.tlbi.vmid = smmu_domain->vmid; @@ -2534,7 +2535,7 @@ void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain) if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1 || smmu_domain->domain.type == IOMMU_DOMAIN_SVA) && smmu_domain->asid) { - arm_smmu_tlb_inv_asid(smmu, smmu_domain->asid); + arm_smmu_tlb_inv_all_s1(smmu_domain);
/* Prevent SVA from touching the CD while we're freeing it */ mutex_lock(&smmu->asid_lock); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index e4eef09ca1b3..9d1d441c55cf 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -843,7 +843,7 @@ void arm_smmu_remove_pasid(struct arm_smmu_master *master, int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain); void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain); -void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); +void arm_smmu_tlb_inv_all_s1(struct arm_smmu_domain *smmu_domain); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain);
From: Jason Gunthorpe jgg@nvidia.com
This function automatically uses the right set of invalidation operations, using the limiting logic that SVA had. If the limit is hit then a full S1 invalidation is done instead of trying to do a range.
Implement arm_smmu_tlb_inv_all_s1() in terms of calling this new function with a large range.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 32 ++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 52 +++++++++++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 11 ++-- 3 files changed, 48 insertions(+), 47 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index a330c78fd9de..0334b0a8b5b9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -140,15 +140,6 @@ static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); }
-/* - * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this - * is used as a threshold to replace per-page TLBI commands to issue in the - * command queue with an address-space TLBI command, when SMMU w/o a range - * invalidation feature handles too many per-page TLBI commands, which will - * otherwise result in a soft lockup. - */ -#define CMDQ_MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3)) - static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, @@ -164,25 +155,12 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, * range. So do a simple translation here by calculating size correctly. */ size = end - start; - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV)) { - if (size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE) - size = 0; - } else { - if (size == ULONG_MAX) - size = 0; - } - - if (!smmu_domain->btm_invalidation) { - ioasid_t asid = READ_ONCE(smmu_domain->asid); - - if (!size) - arm_smmu_tlb_inv_all_s1(smmu_domain); - else - arm_smmu_tlb_inv_range_asid(start, size, asid, - PAGE_SIZE, false, - smmu_domain); - } + if (size == ULONG_MAX) + size = 0;
+ if (!smmu_domain->btm_invalidation) + arm_smmu_tlb_inv_range_s1(smmu_domain, start, size, PAGE_SIZE, + false); arm_smmu_atc_inv_domain(smmu_domain, start, size); }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 588cf7e8448d..df258c44131f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1183,17 +1183,6 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused }
/* Context descriptor manipulation functions */ -void arm_smmu_tlb_inv_all_s1(struct arm_smmu_domain *smmu_domain) -{ - struct arm_smmu_device *smmu = smmu_domain->smmu; - struct arm_smmu_cmdq_ent cmd = { - .opcode = smmu->features & ARM_SMMU_FEAT_E2H ? - CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID, - .tlbi.asid = READ_ONCE(smmu_domain->asid), - }; - - arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); -}
/* * Based on the value of ent report which bits of the STE the HW will access. It @@ -2376,6 +2365,29 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, arm_smmu_preempt_enable(smmu); }
+static bool arm_smmu_inv_range_too_big(struct arm_smmu_device *smmu, + size_t size, size_t granule) +{ + unsigned int max_ops; + + /* 0 size means invalidate all */ + if (!size || size == SIZE_MAX) + return true; + + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) + return false; + + /* + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, + * this is used as a threshold to replace per-page TLBI commands to + * issue in the command queue with an address-space TLBI command, when + * SMMU w/o a range invalidation feature handles too many per-page TLBI + * commands, which will otherwise result in a soft lockup. + */ + max_ops = 1 << (ilog2(granule) - 3); + return size >= max_ops * granule; +} + static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain) @@ -2403,20 +2415,28 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, arm_smmu_atc_inv_domain(smmu_domain, iova, size); }
-void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, - size_t granule, bool leaf, - struct arm_smmu_domain *smmu_domain) +void arm_smmu_tlb_inv_range_s1(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size, + size_t granule, bool leaf) { struct arm_smmu_cmdq_ent cmd = { .opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA, .tlbi = { - .asid = asid, + .asid = READ_ONCE(smmu_domain->asid), .leaf = leaf, }, };
- __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); + if (arm_smmu_inv_range_too_big(smmu_domain->smmu, size, granule)) { + cmd.opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? + CMDQ_OP_TLBI_EL2_ASID : + CMDQ_OP_TLBI_NH_ASID, + arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd); + } else { + __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, + smmu_domain); + } }
static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 9d1d441c55cf..8315a2f4e661 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -843,10 +843,13 @@ void arm_smmu_remove_pasid(struct arm_smmu_master *master, int arm_smmu_domain_alloc_id(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain); void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain); -void arm_smmu_tlb_inv_all_s1(struct arm_smmu_domain *smmu_domain); -void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, - size_t granule, bool leaf, - struct arm_smmu_domain *smmu_domain); +void arm_smmu_tlb_inv_range_s1(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size, size_t granule, + bool leaf); +static inline void arm_smmu_tlb_inv_all_s1(struct arm_smmu_domain *smmu_domain) +{ + arm_smmu_tlb_inv_range_s1(smmu_domain, 0, 0, PAGE_SIZE, false); +} int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, unsigned long iova, size_t size);
From: Jason Gunthorpe jgg@nvidia.com
This is the mirror of arm_smmu_tlb_inv_range_s1(), it is made by taking the S1 functionality out of arm_smmu_tlb_inv_range_domain() and renaming it.
This pushes the decision of s1/s2 up directly to functions which are function pointers setting the stage to just have s1/s2 specific ops and remove the s1/s2 test entirely.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with errata: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 ++++++++++++--------- 1 file changed, 32 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index df258c44131f..4332401facb1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2388,9 +2388,9 @@ static bool arm_smmu_inv_range_too_big(struct arm_smmu_device *smmu, return size >= max_ops * granule; }
-static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, - size_t granule, bool leaf, - struct arm_smmu_domain *smmu_domain) +static void arm_smmu_tlb_inv_range_s2(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size, + size_t granule, bool leaf) { struct arm_smmu_cmdq_ent cmd = { .tlbi = { @@ -2398,21 +2398,9 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, }, };
- if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { - cmd.opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? - CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA; - cmd.tlbi.asid = smmu_domain->asid; - } else { - cmd.opcode = CMDQ_OP_TLBI_S2_IPA; - cmd.tlbi.vmid = smmu_domain->vmid; - } + cmd.opcode = CMDQ_OP_TLBI_S2_IPA; + cmd.tlbi.vmid = smmu_domain->vmid; __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); - - /* - * Unfortunately, this can't be leaf-only since we may have - * zapped an entire table. - */ - arm_smmu_atc_inv_domain(smmu_domain, iova, size); }
void arm_smmu_tlb_inv_range_s1(struct arm_smmu_domain *smmu_domain, @@ -2452,7 +2440,15 @@ static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather, static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, size_t granule, void *cookie) { - arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie); + struct arm_smmu_domain *smmu_domain = cookie; + + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + arm_smmu_tlb_inv_range_s1(smmu_domain, iova, size, granule, + false); + else + arm_smmu_tlb_inv_range_s2(smmu_domain, iova, size, granule, + false); + arm_smmu_atc_inv_domain(smmu_domain, iova, size); }
static const struct iommu_flush_ops arm_smmu_flush_ops = { @@ -3263,13 +3259,24 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + size_t size = gather->end - gather->start + 1;
if (!gather->pgsize) return;
- arm_smmu_tlb_inv_range_domain(gather->start, - gather->end - gather->start + 1, - gather->pgsize, true, smmu_domain); + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + arm_smmu_tlb_inv_range_s1(smmu_domain, gather->start, size, + gather->pgsize, true); + + else + arm_smmu_tlb_inv_range_s2(smmu_domain, gather->start, size, + gather->pgsize, true); + + /* + * Unfortunately, this can't be leaf-only since we may have + * zapped an entire table. + */ + arm_smmu_atc_inv_domain(smmu_domain, gather->start, size); }
#ifdef CONFIG_HISILICON_ERRATUM_162100602 @@ -3285,7 +3292,10 @@ static int arm_smmu_iotlb_sync_map(struct iommu_domain *domain, granule_size = 1 << __ffs(smmu_domain->domain.pgsize_bitmap);
/* Add a SYNC command to sync io-pgtale to avoid errors in pgtable prefetch*/ - arm_smmu_tlb_inv_range_domain(iova, granule_size, granule_size, true, smmu_domain); + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + arm_smmu_tlb_inv_range_s1(smmu_domain, iova, granule_size, granule_size, true); + else + arm_smmu_tlb_inv_range_s2(smmu_domain, iova, granule_size, granule_size, true);
return 0; }
From: Jason Gunthorpe jgg@nvidia.com
Same design as the s1 case, issue a full VMID invalidation if the range is too big.
Consolidate all the full invalidations to use arm_smmu_tlb_inv_all_s2() similar to arm_smmu_tlb_inv_all_s1()
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 36 +++++++++++---------- 1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4332401facb1..f53d3fef92d7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -155,6 +155,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct arm_smmu_device *smmu); static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master); +static void arm_smmu_tlb_inv_all_s2(struct arm_smmu_domain *smmu_domain);
static void parse_driver_options(struct arm_smmu_device *smmu) { @@ -2272,8 +2273,6 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, static void arm_smmu_tlb_inv_context(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; - struct arm_smmu_device *smmu = smmu_domain->smmu; - struct arm_smmu_cmdq_ent cmd;
/* * NOTE: when io-pgtable is in non-strict mode, we may get here with @@ -2282,13 +2281,10 @@ static void arm_smmu_tlb_inv_context(void *cookie) * insertion to guarantee those are observed before the TLBI. Do be * careful, 007. */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) arm_smmu_tlb_inv_all_s1(smmu_domain); - } else { - cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; - cmd.tlbi.vmid = smmu_domain->vmid; - arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); - } + else + arm_smmu_tlb_inv_all_s2(smmu_domain); arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
@@ -2393,14 +2389,25 @@ static void arm_smmu_tlb_inv_range_s2(struct arm_smmu_domain *smmu_domain, size_t granule, bool leaf) { struct arm_smmu_cmdq_ent cmd = { + .opcode = CMDQ_OP_TLBI_S2_IPA, .tlbi = { + .vmid = smmu_domain->vmid, .leaf = leaf, }, };
- cmd.opcode = CMDQ_OP_TLBI_S2_IPA; - cmd.tlbi.vmid = smmu_domain->vmid; - __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); + if (arm_smmu_inv_range_too_big(smmu_domain->smmu, size, granule)) { + cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; + arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd); + } else { + __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, + smmu_domain); + } +} + +static void arm_smmu_tlb_inv_all_s2(struct arm_smmu_domain *smmu_domain) +{ + arm_smmu_tlb_inv_range_s2(smmu_domain, 0, 0, PAGE_SIZE, false); }
void arm_smmu_tlb_inv_range_s1(struct arm_smmu_domain *smmu_domain, @@ -2559,12 +2566,7 @@ void arm_smmu_domain_free_id(struct arm_smmu_domain *smmu_domain) mutex_unlock(&smmu->asid_lock); } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && smmu_domain->vmid) { - struct arm_smmu_cmdq_ent cmd = { - .opcode = CMDQ_OP_TLBI_S12_VMALL, - .tlbi.vmid = smmu_domain->vmid - }; - - arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + arm_smmu_tlb_inv_all_s2(smmu_domain); ida_free(&smmu->vmid_map, smmu_domain->vmid); } }
From: Jason Gunthorpe jgg@nvidia.com
This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known.
The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code.
Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation.
Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op.
Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with 9950bcacec7b1: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ---------------- drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 ---------------- drivers/iommu/iommu.c | 10 ---------- drivers/iommu/iommufd/vfio_compat.c | 7 +------ drivers/vfio/vfio_iommu_type1.c | 12 +----------- include/linux/iommu.h | 3 --- include/uapi/linux/vfio.h | 2 +- 7 files changed, 3 insertions(+), 63 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f53d3fef92d7..336fb3968cb1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3521,21 +3521,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; }
-static int arm_smmu_enable_nesting(struct iommu_domain *domain) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - int ret = 0; - - mutex_lock(&smmu_domain->init_mutex); - if (smmu_domain->smmu) - ret = -EPERM; - else - smmu_domain->stage = ARM_SMMU_DOMAIN_S2; - mutex_unlock(&smmu_domain->init_mutex); - - return ret; -} - #ifdef CONFIG_ARM_SMMU_V3_HTTU static int arm_smmu_split_block(struct iommu_domain *domain, unsigned long iova, size_t size) @@ -3866,7 +3851,6 @@ static struct iommu_ops arm_smmu_ops = { .iotlb_sync_map = arm_smmu_iotlb_sync_map, #endif .iova_to_phys = arm_smmu_iova_to_phys, - .enable_nesting = arm_smmu_enable_nesting, #ifdef CONFIG_ARM_SMMU_V3_HTTU .support_dirty_log = arm_smmu_support_dirty_log, .switch_dirty_log = arm_smmu_switch_dirty_log, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2b57ae0e2166..29a68dacd56b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1538,21 +1538,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; }
-static int arm_smmu_enable_nesting(struct iommu_domain *domain) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - int ret = 0; - - mutex_lock(&smmu_domain->init_mutex); - if (smmu_domain->smmu) - ret = -EPERM; - else - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED; - mutex_unlock(&smmu_domain->init_mutex); - - return ret; -} - static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks) { @@ -1635,7 +1620,6 @@ static struct iommu_ops arm_smmu_ops = { .flush_iotlb_all = arm_smmu_flush_iotlb_all, .iotlb_sync = arm_smmu_iotlb_sync, .iova_to_phys = arm_smmu_iova_to_phys, - .enable_nesting = arm_smmu_enable_nesting, .set_pgtable_quirks = arm_smmu_set_pgtable_quirks, .free = arm_smmu_domain_free, } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index cf1646026d2d..ac7a4dc8a6d3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2678,16 +2678,6 @@ static int __init iommu_init(void) } core_initcall(iommu_init);
-int iommu_enable_nesting(struct iommu_domain *domain) -{ - if (domain->type != IOMMU_DOMAIN_UNMANAGED) - return -EINVAL; - if (!domain->ops->enable_nesting) - return -EINVAL; - return domain->ops->enable_nesting(domain); -} -EXPORT_SYMBOL_GPL(iommu_enable_nesting); - int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirk) { diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c index a3ad5f0b6c59..514aacd64009 100644 --- a/drivers/iommu/iommufd/vfio_compat.c +++ b/drivers/iommu/iommufd/vfio_compat.c @@ -291,12 +291,7 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx, case VFIO_DMA_CC_IOMMU: return iommufd_vfio_cc_iommu(ictx);
- /* - * This is obsolete, and to be removed from VFIO. It was an incomplete - * idea that got merged. - * https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.... - */ - case VFIO_TYPE1_NESTING_IOMMU: + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: return 0;
/* diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9c4adf11dbbe..0b8e0571ecf3 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -73,7 +73,6 @@ struct vfio_iommu { uint64_t num_non_pinned_groups; uint64_t num_non_hwdbm_domains; bool v2; - bool nesting; bool dirty_page_tracking; struct list_head emulated_iommu_groups; bool dirty_log_get_no_clear; @@ -2448,12 +2447,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (!domain->domain) goto out_free_domain;
- if (iommu->nesting) { - ret = iommu_enable_nesting(domain->domain); - if (ret) - goto out_domain; - } - ret = iommu_attach_group(domain->domain, group->iommu_group); if (ret) goto out_domain; @@ -2801,9 +2794,7 @@ static void *vfio_iommu_type1_open(unsigned long arg) switch (arg) { case VFIO_TYPE1_IOMMU: break; - case VFIO_TYPE1_NESTING_IOMMU: - iommu->nesting = true; - fallthrough; + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: case VFIO_TYPE1v2_IOMMU: iommu->v2 = true; break; @@ -2898,7 +2889,6 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, switch (arg) { case VFIO_TYPE1_IOMMU: case VFIO_TYPE1v2_IOMMU: - case VFIO_TYPE1_NESTING_IOMMU: case VFIO_UNMAP_ALL: return 1; case VFIO_UPDATE_VADDR: diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b4613378b5d8..438b1ed02c6c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -639,7 +639,6 @@ struct iommu_ops { * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform * specific mechanisms. - * @enable_nesting: Enable nesting * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @support_dirty_log: Check whether domain supports dirty log tracking * @switch_dirty_log: Perform actions to start|stop dirty log tracking @@ -671,7 +670,6 @@ struct iommu_domain_ops { dma_addr_t iova);
bool (*enforce_cache_coherency)(struct iommu_domain *domain); - int (*enable_nesting)(struct iommu_domain *domain); int (*set_pgtable_quirks)(struct iommu_domain *domain, unsigned long quirks);
@@ -855,7 +853,6 @@ extern void iommu_group_put(struct iommu_group *group); extern int iommu_group_id(struct iommu_group *group); extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
-int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b78d03c0c15..4c5acb573bbf 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -35,7 +35,7 @@ #define VFIO_EEH 5
/* Two-stage IOMMU */ -#define VFIO_TYPE1_NESTING_IOMMU 6 /* Implies v2 */ +#define __VFIO_RESERVED_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
#define VFIO_SPAPR_TCE_v2_IOMMU 7
From: Nicolin Chen nicolinc@nvidia.com
For a virtualization case the IDR/IIDR/AIDR values of the actual SMMU instance need to be available to the VMM so it can construct an appropriate vSMMUv3 that reflects the correct HW capabilities.
For userspace page tables these values are required to constrain the valid values within the CD table and the IOPTEs.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ include/uapi/linux/iommufd.h | 35 +++++++++++++++++++++ 3 files changed, 62 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 336fb3968cb1..0a59d7981a72 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -26,6 +26,7 @@ #include <linux/pci.h> #include <linux/pci-ats.h> #include <linux/platform_device.h> +#include <uapi/linux/iommufd.h>
#include "arm-smmu-v3.h" #include "../../dma-iommu.h" @@ -2481,6 +2482,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) } }
+static void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct iommu_hw_info_arm_smmuv3 *info; + u32 __iomem *base_idr; + unsigned int i; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + base_idr = master->smmu->base + ARM_SMMU_IDR0; + for (i = 0; i <= 5; i++) + info->idr[i] = readl_relaxed(base_idr + i); + info->iidr = readl_relaxed(master->smmu->base + ARM_SMMU_IIDR); + info->aidr = readl_relaxed(master->smmu->base + ARM_SMMU_AIDR); + + *length = sizeof(*info); + *type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3; + + return info; +} + struct arm_smmu_domain *arm_smmu_domain_alloc(void) { struct arm_smmu_domain *smmu_domain; @@ -3826,6 +3850,7 @@ static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, + .hw_info = arm_smmu_hw_info, .domain_alloc_paging = arm_smmu_domain_alloc_paging, .domain_alloc_sva = arm_smmu_sva_domain_alloc, .probe_device = arm_smmu_probe_device, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 8315a2f4e661..393e8466b66c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -84,6 +84,8 @@ #define IIDR_REVISION GENMASK(15, 12) #define IIDR_IMPLEMENTER GENMASK(11, 0)
+#define ARM_SMMU_AIDR 0x1C + #define ARM_SMMU_CR0 0x20 #define CR0_ATSCHK (1 << 4) #define CR0_CMDQEN (1 << 3) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 7481cdd57027..be16f6f36474 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -482,15 +482,50 @@ struct iommu_hw_info_vtd { __aligned_u64 ecap_reg; };
+/** + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) + * + * @flags: Must be set to 0 + * @__reserved: Must be 0 + * @idr: Implemented features for ARM SMMU Non-secure programming interface + * @iidr: Information about the implementation and implementer of ARM SMMU, + * and architecture version supported + * @aidr: ARM SMMU architecture version + * + * For the details of @idr, @iidr and @aidr, please refer to the chapters + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. + * + * User space should read the underlying ARM SMMUv3 hardware information for + * the list of supported features. + * + * Note that these values reflect the raw HW capability, without any insight if + * any required kernel driver support is present. Bits may be set indicating the + * HW has functionality that is lacking kernel software support, such as BTM. If + * a VMM is using this information to construct emulated copies of these + * registers it should only forward bits that it knows it can support. + * + * In future, presence of required kernel support will be indicated in flags. + */ +struct iommu_hw_info_arm_smmuv3 { + __u32 flags; + __u32 __reserved; + __u32 idr[6]; + __u32 iidr; + __u32 aidr; +}; + /** * enum iommu_hw_info_type - IOMMU Hardware Info Types * @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that do not report hardware * info * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type + * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type */ enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_NONE, IOMMU_HW_INFO_TYPE_INTEL_VTD, + IOMMU_HW_INFO_TYPE_ARM_SMMUV3, };
/**
From: Jason Gunthorpe jgg@nvidia.com
This functionally replaces what VFIO_TYPE1_NESTING_IOMMU was trying to do, it ensures that the allocated iommu_domain will be a S2 domain suitable to be the parent of an iommufd IOMMU_DOMAIN_NESTED.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com
conflict with bbaaa756ad25c: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 44 ++++++++++++++++++++- 1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0a59d7981a72..12e18ebc277a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -100,6 +100,8 @@ static int arm_smmu_bypass_dev_domain_type(struct device *dev) } #endif
+static struct iommu_ops arm_smmu_ops; + enum arm_smmu_msi_index { EVTQ_MSI_INDEX, GERROR_MSI_INDEX, @@ -3248,6 +3250,45 @@ static struct iommu_domain arm_smmu_blocked_domain = { .ops = &arm_smmu_blocked_ops, };
+static struct iommu_domain * +arm_smmu_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_NEST_PARENT; + struct arm_smmu_domain *smmu_domain; + int ret; + + if (parent || (flags & ~PAGING_FLAGS)) + return ERR_PTR(-EOPNOTSUPP); + if (user_data) + return ERR_PTR(-EINVAL); + + smmu_domain = arm_smmu_domain_alloc(); + if (!smmu_domain) + return ERR_PTR(-ENOMEM); + + if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) { + if (!(master->smmu->features & ARM_SMMU_FEAT_TRANS_S2)) { + ret = -EOPNOTSUPP; + goto err_free; + } + smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + } + + smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; + smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops; + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + if (ret) + goto err_free; + return &smmu_domain->domain; + +err_free: + kfree(smmu_domain); + return ERR_PTR(ret); +} + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3451,8 +3492,6 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master) kfree(master->streams); }
-static struct iommu_ops arm_smmu_ops; - static struct iommu_device *arm_smmu_probe_device(struct device *dev) { int ret; @@ -3852,6 +3891,7 @@ static struct iommu_ops arm_smmu_ops = { .capable = arm_smmu_capable, .hw_info = arm_smmu_hw_info, .domain_alloc_paging = arm_smmu_domain_alloc_paging, + .domain_alloc_user = arm_smmu_domain_alloc_user, .domain_alloc_sva = arm_smmu_sva_domain_alloc, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device,
From: Jason Gunthorpe jgg@nvidia.com
For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting as the parent and a user provided STE fragment that defines the CD table and related data with addresses translated by the S2 iommu_domain.
The kernel only permits userspace to control certain allowed bits of the STE that are safe for user control.
IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2 translation, but there is no way of knowing which S1 entries refer to a range of S2. There is no option but to discard the all the S1 entries from the IOTLB and thus flush the entire ATC.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 151 +++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 22 ++- include/uapi/linux/iommufd.h | 18 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 12e18ebc277a..79292a57804c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1829,6 +1829,29 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, STRTAB_STE_3_S2TTB_MASK); }
+static void arm_smmu_make_nested_domain_ste( + struct arm_smmu_ste *target, struct arm_smmu_master *master, + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) +{ + /* + * Userspace can request a non-valid STE through the nesting interface. + * We relay that into a non-valid physical STE with the intention that + * C_BAD_STE for this SID can be delivered to userspace. + */ + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { + memset(target, 0, sizeof(*target)); + return; + } + + arm_smmu_make_s2_domain_ste(target, master, nested_domain->s2_parent, + ats_enabled); + + target->data[0] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_0_CFG, + STRTAB_STE_0_CFG_NESTED)) | + nested_domain->ste[0]; + target->data[1] |= nested_domain->ste[1]; +} + /* * This can safely directly manipulate the STE memory without a sync sequence * because the STE table has not been installed in the SMMU yet. @@ -2257,7 +2280,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd); + if (master_domain->nested_parent) { + /* + * If a S2 used as a nesting parent is changed we have + * no option but to completely flush the ATC. + */ + arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + } else { + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + }
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2399,7 +2431,8 @@ static void arm_smmu_tlb_inv_range_s2(struct arm_smmu_domain *smmu_domain, }, };
- if (arm_smmu_inv_range_too_big(smmu_domain->smmu, size, granule)) { + if (smmu_domain->nesting_parent || + arm_smmu_inv_range_too_big(smmu_domain->smmu, size, granule)) { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd); } else { @@ -2832,6 +2865,9 @@ to_smmu_domain_devices(struct iommu_domain *domain) if ((domain->type & __IOMMU_DOMAIN_PAGING) || domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); + if (domain->type == IOMMU_DOMAIN_NESTED) + return container_of(domain, struct arm_smmu_nested_domain, + domain)->s2_parent; return NULL; }
@@ -2893,6 +2929,8 @@ static int arm_smmu_attach_prepare(struct arm_smmu_master *master, return -ENOMEM; master_domain->master = master; master_domain->ssid = state->ssid; + master_domain->nested_parent = domain->type == + IOMMU_DOMAIN_NESTED;
/* * During prepare we want the current smmu_domain and new @@ -3250,6 +3288,108 @@ static struct iommu_domain arm_smmu_blocked_domain = { .ops = &arm_smmu_blocked_ops, };
+static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct attach_state state = {.ssid = IOMMU_NO_PASID}; + struct arm_smmu_ste ste; + int ret; + + if (arm_smmu_ssids_in_use(&master->cd_table) || + nested_domain->s2_parent->smmu != master->smmu) + return -EINVAL; + + mutex_lock(&master->smmu->asid_lock); + /* + * The VM has to control the actual ATS state at the PCI device because + * we forward the invalidations directly from the VM. If the VM doesn't + * think ATS is on it will not generate ATC flushes and the ATC will + * become incoherent. Since we can't access the actual virtual PCI ATS + * config bit here base this off the EATS value in the STE. If the EATS + * is set then the VM must generate ATC flushes. + */ + state.disable_ats = !nested_domain->enable_ats; + ret = arm_smmu_attach_prepare(master, domain, &state); + if (ret) { + mutex_unlock(&master->smmu->asid_lock); + return ret; + } + arm_smmu_make_nested_domain_ste(&ste, master, nested_domain, + state.want_ats); + arm_smmu_install_ste_for_dev(master, &ste); + arm_smmu_attach_commit(master, &state); + mutex_unlock(&master->smmu->asid_lock); + return 0; +} + +static void arm_smmu_domain_nested_free(struct iommu_domain *domain) +{ + kfree(container_of(domain, struct arm_smmu_nested_domain, domain)); +} + +static const struct iommu_domain_ops arm_smmu_nested_ops = { + .attach_dev = arm_smmu_attach_dev_nested, + .free = arm_smmu_domain_nested_free, +}; + +static struct iommu_domain * +arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_nested_domain *nested_domain; + struct arm_smmu_domain *smmu_parent; + struct iommu_hwpt_arm_smmuv3 arg; + unsigned int eats; + int ret; + + if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) + return ERR_PTR(-EOPNOTSUPP); + + ret = iommu_copy_struct_from_user(&arg, user_data, + IOMMU_HWPT_DATA_ARM_SMMUV3, ste); + if (ret) + return ERR_PTR(ret); + + if (flags || !(master->smmu->features & ARM_SMMU_FEAT_TRANS_S1)) + return ERR_PTR(-EOPNOTSUPP); + + if (!(parent->type & __IOMMU_DOMAIN_PAGING)) + return ERR_PTR(-EINVAL); + + smmu_parent = to_smmu_domain(parent); + if (smmu_parent->stage != ARM_SMMU_DOMAIN_S2 || + smmu_parent->smmu != master->smmu) + return ERR_PTR(-EINVAL); + + /* EIO is reserved for invalid STE data. */ + if ((arg.ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) || + (arg.ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED)) + return ERR_PTR(-EIO); + + /* Only Full ATS or ATS UR is supported */ + eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg.ste[1])); + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) + return ERR_PTR(-EIO); + + nested_domain = kzalloc(sizeof(*nested_domain), GFP_KERNEL_ACCOUNT); + if (!nested_domain) + return ERR_PTR(-ENOMEM); + + nested_domain->domain.type = IOMMU_DOMAIN_NESTED; + nested_domain->domain.ops = &arm_smmu_nested_ops; + nested_domain->s2_parent = smmu_parent; + nested_domain->enable_ats = eats == STRTAB_STE_1_EATS_TRANS; + nested_domain->ste[0] = arg.ste[0]; + nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); + + return &nested_domain->domain; +} + static struct iommu_domain * arm_smmu_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, @@ -3260,7 +3400,11 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, struct arm_smmu_domain *smmu_domain; int ret;
- if (parent || (flags & ~PAGING_FLAGS)) + if (parent) + return arm_smmu_domain_alloc_nesting(dev, flags, parent, + user_data); + + if (flags & ~PAGING_FLAGS) return ERR_PTR(-EOPNOTSUPP); if (user_data) return ERR_PTR(-EINVAL); @@ -3275,6 +3419,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, goto err_free; } smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + smmu_domain->nesting_parent = true; }
smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 393e8466b66c..9e6e4a47b2d2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -247,6 +247,7 @@ struct arm_smmu_ste { #define STRTAB_STE_0_CFG_BYPASS 4 #define STRTAB_STE_0_CFG_S1_TRANS 5 #define STRTAB_STE_0_CFG_S2_TRANS 6 +#define STRTAB_STE_0_CFG_NESTED 7
#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4) #define STRTAB_STE_0_S1FMT_LINEAR 0 @@ -297,6 +298,15 @@ struct arm_smmu_ste {
#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
+/* These bits can be controlled by userspace for STRTAB_STE_0_CFG_NESTED */ +#define STRTAB_STE_0_NESTING_ALLOWED \ + cpu_to_le64(STRTAB_STE_0_V | STRTAB_STE_0_S1FMT | \ + STRTAB_STE_0_S1CTXPTR_MASK | STRTAB_STE_0_S1CDMAX) +#define STRTAB_STE_1_NESTING_ALLOWED \ + cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ + STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS) + /* * Context descriptors. * @@ -810,13 +820,23 @@ struct arm_smmu_domain { spinlock_t devices_lock;
struct mmu_notifier mmu_notifier; - bool btm_invalidation; + bool btm_invalidation : 1; + bool nesting_parent : 1; +}; + +struct arm_smmu_nested_domain { + struct iommu_domain domain; + struct arm_smmu_domain *s2_parent; + u8 enable_ats : 1; + + __le64 ste[2]; };
struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; u16 ssid; + u8 nested_parent; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index be16f6f36474..79b7d1558e11 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -394,14 +394,32 @@ struct iommu_hwpt_vtd_s1 { __u32 __reserved; };
+/** + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 Context Descriptor Table info + * (IOMMU_HWPT_DATA_ARM_SMMUV3) + * + * @ste: The first two double words of the user space Stream Table Entry for + * a user stage-1 Context Descriptor Table. Must be little-endian. + * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) + * - word-0: V, S1Fmt, S1ContextPtr, S1CDMax + * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * + * -EIO will be returned if @ste is not legal or contains any non-allowed field. + */ +struct iommu_hwpt_arm_smmuv3 { + __aligned_le64 ste[2]; +}; + /** * enum iommu_hwpt_data_type - IOMMU HWPT Data Type * @IOMMU_HWPT_DATA_NONE: no data * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table + * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table */ enum iommu_hwpt_data_type { IOMMU_HWPT_DATA_NONE, IOMMU_HWPT_DATA_VTD_S1, + IOMMU_HWPT_DATA_ARM_SMMUV3, };
/**
From: Jason Gunthorpe jgg@nvidia.com
Add tests for some of the more common STE update operations that we expect to see, as well as some artificial STE updates to test the edges of arm_smmu_write_entry. These also serve as a record of which common operation is expected to be hitless, and how many syncs they require.
arm_smmu_write_entry implements a generic algorithm that updates an STE/CD to any other abritrary STE/CD configuration. The update requires a sequence of write+sync operations, with some invariants that must be held true after each sync. arm_smmu_write_entry lends itself well to unit-testing since the function's interaction with the STE/CD is already abstracted by input callbacks that we can hook to introspect into the sequence of operations. We can use these hooks to guarantee that invariants are held throughout the entire update operation.
Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com Signed-off-by: Michael Shavit mshavit@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/Kconfig | 12 +- drivers/iommu/arm/arm-smmu-v3/Makefile | 2 + .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 423 ++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 37 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 28 ++ 5 files changed, 476 insertions(+), 26 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index a611519528d7..0d96e414b5b9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -397,9 +397,9 @@ config ARM_SMMU_V3 Say Y here if your system includes an IOMMU device implementing the ARM SMMUv3 architecture.
+if ARM_SMMU_V3 config ARM_SMMU_V3_SVA bool "Shared Virtual Addressing support for the ARM SMMUv3" - depends on ARM_SMMU_V3 select IOMMU_SVA select IOMMU_IOPF select MMU_NOTIFIER @@ -435,6 +435,16 @@ config ARM_SMMU_V3_ECMDQ
If not sure, say no.
+config ARM_SMMU_V3_KUNIT_TEST + tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + Enable this option to unit-test arm-smmu-v3 driver functions. + + If unsure, say N. +endif + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 54feb1ecccad..014a997753a8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -3,3 +3,5 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-objs-y += arm-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) + +obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c new file mode 100644 index 000000000000..dcca393a8450 --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -0,0 +1,423 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2024 Google LLC. + */ +#include <kunit/test.h> +#include <linux/io-pgtable.h> + +#include "arm-smmu-v3.h" + +struct arm_smmu_test_writer { + struct arm_smmu_entry_writer writer; + struct kunit *test; + const __le64 *init_entry; + const __le64 *target_entry; + __le64 *entry; + + bool invalid_entry_written; + unsigned int num_syncs; +}; + +static struct arm_smmu_ste bypass_ste; +static struct arm_smmu_ste abort_ste; + +static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry, + const __le64 *used_bits, + const __le64 *target, + unsigned int length) +{ + bool differs = false; + unsigned int i; + + for (i = 0; i < length; i++) { + if ((entry[i] & used_bits[i]) != target[i]) + differs = true; + } + return differs; +} + +static void +arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer) +{ + struct arm_smmu_test_writer *test_writer = + container_of(writer, struct arm_smmu_test_writer, writer); + __le64 *entry_used_bits; + unsigned int num_entry_qwords = writer->ops->num_entry_qwords; + + entry_used_bits = kunit_kzalloc( + test_writer->test, sizeof(*entry_used_bits) * num_entry_qwords, + GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits); + + pr_debug("STE value is now set to: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, + test_writer->entry, + num_entry_qwords * sizeof(*test_writer->entry), + false); + + test_writer->num_syncs += 1; + if (!(test_writer->entry[0] & writer->ops->v_bit)) { + test_writer->invalid_entry_written = true; + } else { + /* + * At any stage in a hitless transition, the entry must be + * equivalent to either the initial entry or the target entry + * when only considering the bits used by the current + * configuration. + */ + writer->ops->get_used(test_writer->entry, entry_used_bits); + KUNIT_EXPECT_FALSE( + test_writer->test, + arm_smmu_entry_differs_in_used_bits( + test_writer->entry, entry_used_bits, + test_writer->init_entry, num_entry_qwords) && + arm_smmu_entry_differs_in_used_bits( + test_writer->entry, entry_used_bits, + test_writer->target_entry, + num_entry_qwords)); + } +} + +static void +arm_smmu_v3_test_ste_debug_print_used_bits(struct arm_smmu_entry_writer *writer, + const struct arm_smmu_ste *ste) +{ + struct arm_smmu_ste used_bits = {}; + + arm_smmu_get_ste_used(ste->data, used_bits.data); + pr_debug("STE used bits: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, used_bits.data, + sizeof(used_bits), false); +} + +static const struct arm_smmu_entry_writer_ops test_ops = { + .v_bit = cpu_to_le64(STRTAB_STE_0_V), + .num_entry_qwords = sizeof(struct arm_smmu_ste) / sizeof(u64), + .sync = arm_smmu_test_writer_record_syncs, + .get_used = arm_smmu_get_ste_used, +}; + +static void arm_smmu_v3_test_ste_expect_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, int num_syncs_expected, bool hitless) +{ + struct arm_smmu_ste cur_copy = *cur; + struct arm_smmu_test_writer test_writer = { + .writer = { + .ops = &test_ops, + }, + .test = test, + .init_entry = cur->data, + .target_entry = target->data, + .entry = cur_copy.data, + .num_syncs = 0, + .invalid_entry_written = false, + + }; + + pr_debug("STE initial value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data, + sizeof(cur_copy), false); + arm_smmu_v3_test_ste_debug_print_used_bits(&test_writer.writer, cur); + pr_debug("STE target value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, target->data, + sizeof(cur_copy), false); + arm_smmu_v3_test_ste_debug_print_used_bits(&test_writer.writer, target); + + arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data); + + KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless); + KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected); + KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); +} + +static void arm_smmu_v3_test_ste_expect_non_hitless_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, int num_syncs_expected) +{ + arm_smmu_v3_test_ste_expect_transition(test, cur, target, + num_syncs_expected, false); +} + +static void arm_smmu_v3_test_ste_expect_hitless_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, int num_syncs_expected) +{ + arm_smmu_v3_test_ste_expect_transition(test, cur, target, + num_syncs_expected, true); +} + +static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0; + +static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, + unsigned int s1dss, + const dma_addr_t dma_addr) +{ + struct arm_smmu_device smmu = { .features = ARM_SMMU_FEAT_STALLS }; + struct arm_smmu_master master = { + .cd_table.cdtab_dma = dma_addr, + .cd_table.s1cdmax = 0xFF, + .cd_table.s1fmt = STRTAB_STE_0_S1FMT_64K_L2, + .smmu = &smmu, + }; + + arm_smmu_make_cdtable_ste(ste, &master, true, s1dss); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) +{ + /* + * Bypass STEs has used bits in the first two Qwords, while abort STEs + * only have used bits in the first QWord. Transitioning from bypass to + * abort requires two syncs: the first to set the first qword and make + * the STE into an abort, the second to clean up the second qword. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &abort_ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_abort_to_bypass(struct kunit *test) +{ + /* + * Transitioning from abort to bypass also requires two syncs: the first + * to set the second qword data required by the bypass STE, and the + * second to set the first qword and switch to bypass. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &abort_ste, &bypass_ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &abort_ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &abort_ste, &ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &bypass_ste, + /*num_syncs_expected=*/3); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &ste, + /*num_syncs_expected=*/3); +} + +static void arm_smmu_v3_write_ste_test_cdtable_s1dss_change(struct kunit *test) +{ + struct arm_smmu_ste ste; + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + + /* + * Flipping s1dss on a CD table STE only involves changes to the second + * qword of an STE and can be done in a single write. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &s1dss_bypass, + /*num_syncs_expected=*/1); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s1dss_bypass, &ste, + /*num_syncs_expected=*/1); +} + +static void +arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass(struct kunit *test) +{ + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s1dss_bypass, &bypass_ste, + /*num_syncs_expected=*/2); +} + +static void +arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass(struct kunit *test) +{ + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &s1dss_bypass, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, + bool ats_enabled) +{ + struct arm_smmu_device smmu = { .features = ARM_SMMU_FEAT_STALLS }; + struct arm_smmu_master master = { + .smmu = &smmu, + }; + struct io_pgtable io_pgtable = {}; + struct arm_smmu_domain smmu_domain = { + .pgtbl_ops = &io_pgtable.ops, + }; + + io_pgtable.cfg.arm_lpae_s2_cfg.vttbr = 0xdaedbeefdeadbeefULL; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.ps = 1; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tg = 2; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sh = 3; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.orgn = 1; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.irgn = 2; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4; + + arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain, ats_enabled); +} + +static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &abort_ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_abort_to_s2(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &abort_ste, &ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_s2_to_bypass(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &bypass_ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &ste, + /*num_syncs_expected=*/2); +} + +static void arm_smmu_v3_write_ste_test_s1_to_s2(struct kunit *test) +{ + struct arm_smmu_ste s1_ste; + struct arm_smmu_ste s2_ste; + + arm_smmu_test_make_cdtable_ste(&s1_ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_s2_ste(&s2_ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s1_ste, &s2_ste, + /*num_syncs_expected=*/3); +} + +static void arm_smmu_v3_write_ste_test_s2_to_s1(struct kunit *test) +{ + struct arm_smmu_ste s1_ste; + struct arm_smmu_ste s2_ste; + + arm_smmu_test_make_cdtable_ste(&s1_ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_s2_ste(&s2_ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s2_ste, &s1_ste, + /*num_syncs_expected=*/3); +} + +static void arm_smmu_v3_write_ste_test_non_hitless(struct kunit *test) +{ + struct arm_smmu_ste ste; + struct arm_smmu_ste ste_2; + + /* + * Although no flow resembles this in practice, one way to force an STE + * update to be non-hitless is to change its CD table pointer as well as + * s1 dss field in the same update. + */ + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste_2, STRTAB_STE_1_S1DSS_BYPASS, + 0x4B4B4b4B4B); + arm_smmu_v3_test_ste_expect_non_hitless_transition( + test, &ste, &ste_2, + /*num_syncs_expected=*/3); +} + +static struct kunit_case arm_smmu_v3_test_cases[] = { + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_s1dss_change), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s1_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_s1), + KUNIT_CASE(arm_smmu_v3_write_ste_test_non_hitless), + {}, +}; + +static int arm_smmu_v3_test_suite_init(struct kunit_suite *test) +{ + arm_smmu_make_bypass_ste(&bypass_ste); + arm_smmu_make_abort_ste(&abort_ste); + return 0; +} + +static struct kunit_suite arm_smmu_v3_test_module = { + .name = "arm-smmu-v3-kunit-test", + .suite_init = arm_smmu_v3_test_suite_init, + .test_cases = arm_smmu_v3_test_cases, +}; +kunit_test_suites(&arm_smmu_v3_test_module); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 79292a57804c..e5405d95979e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -109,19 +109,6 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
-struct arm_smmu_entry_writer_ops; -struct arm_smmu_entry_writer { - const struct arm_smmu_entry_writer_ops *ops; - struct arm_smmu_master *master; -}; - -struct arm_smmu_entry_writer_ops { - unsigned int num_entry_qwords; - __le64 v_bit; - void (*get_used)(const __le64 *entry, __le64 *used); - void (*sync)(struct arm_smmu_entry_writer *writer); -}; - #define NUM_ENTRY_QWORDS \ (max(sizeof(struct arm_smmu_ste), sizeof(struct arm_smmu_cd)) / \ sizeof(u64)) @@ -1193,7 +1180,7 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused * would be nice if this was complete according to the spec, but minimally it * has to capture the bits this driver uses. */ -static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) { unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
@@ -1323,8 +1310,8 @@ static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry, * V=0 process. This relies on the IGNORED behavior described in the * specification. */ -static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, - __le64 *entry, const __le64 *target) +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry, + const __le64 *target) { unsigned int num_entry_qwords = writer->ops->num_entry_qwords; __le64 unused_update[NUM_ENTRY_QWORDS]; @@ -1713,7 +1700,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, } }
-static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1721,7 +1708,7 @@ static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); }
-static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) +void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1731,9 +1718,9 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); }
-static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, - bool ats_enabled, unsigned int s1dss) +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, bool ats_enabled, + unsigned int s1dss) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1786,10 +1773,10 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, } }
-static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain, - bool ats_enabled) +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, + bool ats_enabled) { const struct io_pgtable_cfg *pgtbl_cfg = &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 9e6e4a47b2d2..35154a4d68d1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -839,6 +839,34 @@ struct arm_smmu_master_domain { u8 nested_parent; };
+/* The following are exposed for testing purposes. */ +struct arm_smmu_entry_writer_ops; +struct arm_smmu_entry_writer { + const struct arm_smmu_entry_writer_ops *ops; + struct arm_smmu_master *master; +}; + +struct arm_smmu_entry_writer_ops { + unsigned int num_entry_qwords; + __le64 v_bit; + void (*get_used)(const __le64 *entry, __le64 *used); + void (*sync)(struct arm_smmu_entry_writer *writer); +}; + +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits); +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur, + const __le64 *target); + +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); +void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target); +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, bool ats_enabled, + unsigned int s1dss); +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, + bool ats_enabled); + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain);
From: Jason Gunthorpe jgg@nvidia.com
Since v5.12 the rbtree has gained some simplifying helpers aimed at making rb tree users write less convoluted boiler plate code. Instead the caller provides a single comparison function and the helpers generate the prior open-coded stuff.
Update smmu->streams to use rb_find_add() and rb_find().
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 68 ++++++++++----------- 1 file changed, 31 insertions(+), 37 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e5405d95979e..afec6d45fbf2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1885,26 +1885,37 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return 0; }
+static int arm_smmu_streams_cmp_key(const void *lhs, const struct rb_node *rhs) +{ + struct arm_smmu_stream *stream_rhs = + rb_entry(rhs, struct arm_smmu_stream, node); + const u32 *sid_lhs = lhs; + + if (*sid_lhs < stream_rhs->id) + return -1; + if (*sid_lhs > stream_rhs->id) + return 1; + return 0; +} + +static int arm_smmu_streams_cmp_node(struct rb_node *lhs, + const struct rb_node *rhs) +{ + return arm_smmu_streams_cmp_key( + &rb_entry(lhs, struct arm_smmu_stream, node)->id, rhs); +} + static struct arm_smmu_master * arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid) { struct rb_node *node; - struct arm_smmu_stream *stream;
lockdep_assert_held(&smmu->streams_mutex);
- node = smmu->streams.rb_node; - while (node) { - stream = rb_entry(node, struct arm_smmu_stream, node); - if (stream->id < sid) - node = node->rb_right; - else if (stream->id > sid) - node = node->rb_left; - else - return stream->master; - } - - return NULL; + node = rb_find(&sid, &smmu->streams, arm_smmu_streams_cmp_key); + if (!node) + return NULL; + return rb_entry(node, struct arm_smmu_stream, node)->master; }
/* IRQ and event handlers */ @@ -3550,8 +3561,6 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu, { int i; int ret = 0; - struct arm_smmu_stream *new_stream, *cur_stream; - struct rb_node **new_node, *parent_node = NULL; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams), @@ -3562,9 +3571,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
mutex_lock(&smmu->streams_mutex); for (i = 0; i < fwspec->num_ids; i++) { + struct arm_smmu_stream *new_stream = &master->streams[i]; u32 sid = fwspec->ids[i];
- new_stream = &master->streams[i]; new_stream->id = sid; new_stream->master = master;
@@ -3573,28 +3582,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu, break;
/* Insert into SID tree */ - new_node = &(smmu->streams.rb_node); - while (*new_node) { - cur_stream = rb_entry(*new_node, struct arm_smmu_stream, - node); - parent_node = *new_node; - if (cur_stream->id > new_stream->id) { - new_node = &((*new_node)->rb_left); - } else if (cur_stream->id < new_stream->id) { - new_node = &((*new_node)->rb_right); - } else { - dev_warn(master->dev, - "stream %u already in tree\n", - cur_stream->id); - ret = -EINVAL; - break; - } - } - if (ret) + if (rb_find_add(&new_stream->node, &smmu->streams, + arm_smmu_streams_cmp_node)) { + dev_warn(master->dev, "stream %u already in tree\n", + sid); + ret = -EINVAL; break; - - rb_link_node(&new_stream->node, parent_node, new_node); - rb_insert_color(&new_stream->node, &smmu->streams); + } }
if (ret) {
From: Robin Murphy robin.murphy@arm.com
Currently, iommu-dma is the only place outside of IOMMUFD and drivers which might need to be aware of the stage 2 domain encapsulated within a nested domain. This would be in the legacy-VFIO-style case where we're using host-managed MSIs with an identity mapping at stage 1, where it is the underlying stage 2 domain which owns an MSI cookie and holds the corresponding dynamic mappings. Hook up the new op to resolve what we need from a nested domain.
Signed-off-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/dma-iommu.c | 18 ++++++++++++++++-- include/linux/iommu.h | 4 ++++ 2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 037fcf826407..f6d904cdcf97 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1796,6 +1796,20 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; }
+/* + * Nested domains may not have an MSI cookie or accept mappings, but they may + * be related to a domain which does, so we let them tell us what they need. + */ +static struct iommu_domain *iommu_dma_get_msi_mapping_domain(struct device *dev) +{ + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + + if (domain && domain->type == IOMMU_DOMAIN_NESTED && + domain->ops->get_msi_mapping_domain) + domain = domain->ops->get_msi_mapping_domain(domain); + return domain; +} + /** * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain * @desc: MSI descriptor, will store the MSI page @@ -1806,7 +1820,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { struct device *dev = msi_desc_to_dev(desc); - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev); struct iommu_dma_msi_page *msi_page; static DEFINE_MUTEX(msi_prepare_lock); /* see below */
@@ -1839,7 +1853,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) { struct device *dev = msi_desc_to_dev(desc); - const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev); const struct iommu_dma_msi_page *msi_page;
msi_page = msi_desc_get_iommu_cookie(desc); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 438b1ed02c6c..dd6b261eccde 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -645,6 +645,8 @@ struct iommu_ops { * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap * @free: Release the domain after use. + * @get_msi_mapping_domain: Return the related iommu_domain that should hold the + * MSI cookie and accept mapping(s). */ struct iommu_domain_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); @@ -689,6 +691,8 @@ struct iommu_domain_ops { unsigned long *bitmap, unsigned long base_iova, unsigned long bitmap_pgshift); void (*free)(struct iommu_domain *domain); + struct iommu_domain * + (*get_msi_mapping_domain)(struct iommu_domain *domain); };
/**
From: Nicolin Chen nicolinc@nvidia.com
In a 1-stage translation setup, a device is attached to a paging domain. In a 2-stage translation setup, a device is attached to a nested domain, which does not have the mappings for the MSI page but only an s2_parent paging domain pointer that holds the mappings.
Add arm_smmu_get_msi_mapping_domain in arm_smmu_nested_ops to return the correct paging domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index afec6d45fbf2..41439d817fae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3328,9 +3328,19 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain) kfree(container_of(domain, struct arm_smmu_nested_domain, domain)); }
+static struct iommu_domain * +arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain) +{ + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + + return &nested_domain->s2_parent->domain; +} + static const struct iommu_domain_ops arm_smmu_nested_ops = { .attach_dev = arm_smmu_attach_dev_nested, .free = arm_smmu_domain_nested_free, + .get_msi_mapping_domain = arm_smmu_get_msi_mapping_domain, };
static struct iommu_domain *
From: Nicolin Chen nicolinc@nvidia.com
With a nested translation setup, a stage-1 Context Descriptor table can be managed by a guest OS in the user space. So, the kernel driver should not assume that the guest OS will use a user space device driver that doesn't support TLBI_NH_VAA and TLBI_NH_ALL commands.
Add them in the arm_smmu_cmdq_build_cmd(), to prepare for support of these two TLBI invalidation requests from the guest level.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 +++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 41439d817fae..00c2654f3253 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -350,8 +350,17 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); break; case CMDQ_OP_TLBI_NH_VA: - cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); fallthrough; + case CMDQ_OP_TLBI_NH_VAA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg); + cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; + break; case CMDQ_OP_TLBI_EL2_VA: cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num); cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale); @@ -373,6 +382,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) case CMDQ_OP_TLBI_NH_ASID: cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); fallthrough; + case CMDQ_OP_TLBI_NH_ALL: case CMDQ_OP_TLBI_S12_VMALL: cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); break; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 35154a4d68d1..6856dde2273d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -508,8 +508,10 @@ struct arm_smmu_cmdq_ent { }; } cfgi;
+ #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID 0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 + #define CMDQ_OP_TLBI_NH_VAA 0x13 #define CMDQ_OP_TLBI_EL2_ALL 0x20 #define CMDQ_OP_TLBI_EL2_ASID 0x21 #define CMDQ_OP_TLBI_EL2_VA 0x22
From: Nicolin Chen nicolinc@nvidia.com
Add a queue_ncmds() to return the number of pending commands in the queue. This will be used in the following patch to kcalloc memory for user CMDQ.
This helper was partially the existing queue_has_space(), so simplify it.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 00c2654f3253..0c858ab70d30 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -162,17 +162,22 @@ static void parse_driver_options(struct arm_smmu_device *smmu) }
/* Low-level queue manipulation functions */ -static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n) +static int queue_ncmds(struct arm_smmu_ll_queue *q) { - u32 space, prod, cons; + u32 prod, cons;
prod = Q_IDX(q, q->prod); cons = Q_IDX(q, q->cons);
if (Q_WRP(q, q->prod) == Q_WRP(q, q->cons)) - space = (1 << q->max_n_shift) - (prod - cons); + return prod - cons; else - space = cons - prod; + return (1 << q->max_n_shift) - cons + prod; +} + +static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n) +{ + u32 space = (1 << q->max_n_shift) - queue_ncmds(q);
return space >= n; }
From: Nicolin Chen nicolinc@nvidia.com
A nested_domain is per user Stream Table Entry in SMMUv3 case. So, user space must provide the user Stream ID to the user Stream Table Entry in the user data together, to link the user Stream ID to the corresponding physical Stream ID. It will be used to replace user Stream IDs in cache invalidation commands with physical Stream IDs in the following patch.
Add a streams_user xarray to log all the links between user Stream IDs and physical Stream IDs. Add id_user in struct arm_smmu_nested_domain, so its free op can unlink the id_user and clean it up. Also, store the same id_user in struct arm_smmu_stream to verify the ledger, since the master pointer is no longer available in arm_smmu_domain_nested_free().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 ++++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ++ include/uapi/linux/iommufd.h | 2 ++ 3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0c858ab70d30..4ddb6d48dde7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3340,7 +3340,30 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
static void arm_smmu_domain_nested_free(struct iommu_domain *domain) { - kfree(container_of(domain, struct arm_smmu_nested_domain, domain)); + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + u32 id_user = nested_domain->sid_user; + + if (!WARN_ON(!id_user)) { + struct arm_smmu_device *smmu = nested_domain->s2_parent->smmu; + struct arm_smmu_stream *stream; + + xa_lock(&smmu->streams_user); + stream = __xa_erase(&smmu->streams_user, id_user); + /* + * A mismatched id_user is unlikely to happen, but restore the + * stream back to the xarray for its actual owner to carray on. + */ + if (unlikely(stream->id_user != id_user)) { + WARN_ON(__xa_alloc(&smmu->streams_user, &id_user, + stream, XA_LIMIT(id_user, id_user), + GFP_KERNEL_ACCOUNT)); + } else + stream->id_user = 0; + xa_unlock(&smmu->streams_user); + } + + kfree(nested_domain); }
static struct iommu_domain * @@ -3373,6 +3396,9 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) return ERR_PTR(-EOPNOTSUPP);
+ if (master->streams[0].id_user) + return ERR_PTR(-EEXIST); + ret = iommu_copy_struct_from_user(&arg, user_data, IOMMU_HWPT_DATA_ARM_SMMUV3, ste); if (ret) @@ -3403,12 +3429,22 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, if (!nested_domain) return ERR_PTR(-ENOMEM);
+ ret = xa_alloc(&smmu_parent->smmu->streams_user, &arg.sid, + &master->streams[0], XA_LIMIT(arg.sid, arg.sid), + GFP_KERNEL_ACCOUNT); + if (ret) { + kfree(nested_domain); + return ERR_PTR(ret); + } + master->streams[0].id_user = arg.sid; + nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; nested_domain->s2_parent = smmu_parent; nested_domain->enable_ats = eats == STRTAB_STE_1_EATS_TRANS; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); + nested_domain->sid_user = arg.sid;
return &nested_domain->domain; } @@ -4358,6 +4394,7 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
mutex_init(&smmu->streams_mutex); smmu->streams = RB_ROOT; + xa_init_flags(&smmu->streams_user, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
ret = arm_smmu_init_queues(smmu); if (ret) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 6856dde2273d..c68779303b0d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -772,10 +772,12 @@ struct arm_smmu_device { struct mutex streams_mutex;
bool bypass; + struct xarray streams_user; };
struct arm_smmu_stream { u32 id; + u32 id_user; struct arm_smmu_master *master; struct rb_node node; }; @@ -832,6 +834,7 @@ struct arm_smmu_nested_domain { u8 enable_ats : 1;
__le64 ste[2]; + u32 sid_user; };
struct arm_smmu_master_domain { diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 79b7d1558e11..cf13ab623b4a 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -403,11 +403,13 @@ struct iommu_hwpt_vtd_s1 { * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) * - word-0: V, S1Fmt, S1ContextPtr, S1CDMax * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * @sid: The user space Stream ID to index the user Stream Table Entry @ste * * -EIO will be returned if @ste is not legal or contains any non-allowed field. */ struct iommu_hwpt_arm_smmuv3 { __aligned_le64 ste[2]; + __u32 sid; };
/**
From: Nicolin Chen nicolinc@nvidia.com
Add arm_smmu_cache_invalidate_user() function for user space to invalidate TLB entries and Context Descriptors, since either an stage-1 IO page table entry or a Context Descriptor Table entry in user space is still cached by the hardware.
Add struct iommu_hwpt_arm_smmuv3_invalidate defining the format of all the entries in an invalidation request array. This format is simply the native format of a 128-bit TLB/Cache invalidation command. Each command should be scanned against the supported command list, and fixed at the SID and VMID fields by replacing with the correct physical values.
Also, a guest CMDQ might not be prepared by a Linux kernel driver, so the handler must take care of situations that might be impacted by an erratum.
Co-developed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Eric Auger eric.auger@redhat.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 149 ++++++++++++++++++++ include/uapi/linux/iommufd.h | 20 +++ 2 files changed, 169 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4ddb6d48dde7..ef0fe6d07e86 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3375,10 +3375,159 @@ arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain) return &nested_domain->s2_parent->domain; }
+static int arm_smmu_fix_user_cmd(struct arm_smmu_nested_domain *nested_domain, + u64 *cmd) +{ + struct arm_smmu_device *smmu = nested_domain->s2_parent->smmu; + struct arm_smmu_stream *stream; + + cmd[0] = le64_to_cpu(cmd[0]); + cmd[1] = le64_to_cpu(cmd[1]); + + switch (*cmd & CMDQ_0_OP) { + case CMDQ_OP_TLBI_NSNH_ALL: + *cmd &= ~CMDQ_0_OP; + *cmd |= CMDQ_OP_TLBI_NH_ALL; + fallthrough; + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + *cmd &= ~CMDQ_TLBI_0_VMID; + *cmd |= FIELD_PREP(CMDQ_TLBI_0_VMID, + nested_domain->s2_parent->vmid); + break; + case CMDQ_OP_ATC_INV: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + xa_lock(&smmu->streams_user); + stream = xa_load(&smmu->streams_user, + FIELD_GET(CMDQ_CFGI_0_SID, *cmd)); + xa_unlock(&smmu->streams_user); + if (!stream) + return -ENODEV; + *cmd &= ~CMDQ_CFGI_0_SID; + *cmd |= FIELD_PREP(CMDQ_CFGI_0_SID, stream->id); + break; + default: + return -EINVAL; + } + pr_debug("Fixed user CMD: %016llx : %016llx\n", cmd[1], cmd[0]); + + return 0; +} + +/* + * A helper function to detect if the current cmd hits Arm erratum before being + * added to a batch that might already contain a leaf TLBI or a CFGI command. + * + * If there is a hit, it will update the corresponding boolean flag, by assuming + * that the caller must issue the batch without this cmd. So, the two flags then + * will reflect the status of the new batch that contains the current cmd only. + */ +static bool arm_smmu_cmdq_hit_errata(struct arm_smmu_device *smmu, u64 *cmd, + bool *batch_has_leaf, bool *batch_has_cfgi) +{ + bool cmd_has_leaf; + + if (!(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) + return false; + + switch (*cmd & CMDQ_0_OP) { + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + *batch_has_cfgi = true; + break; + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + cmd_has_leaf = FIELD_GET(CMDQ_TLBI_1_LEAF, cmd[1]); + /* Erratum 2812531 -- a non-leaf tlbi following a leaf tlbi */ + if (!cmd_has_leaf && *batch_has_leaf) { + *batch_has_leaf = false; + return true; + } + if (cmd_has_leaf) + *batch_has_leaf = true; + fallthrough; + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + /* Erratum 2812531 -- a tlbi following a cfgi */ + if (*batch_has_cfgi) { + *batch_has_cfgi = false; + return true; + } + break; + } + + return false; +} + +static int arm_smmu_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array) +{ + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + struct arm_smmu_device *smmu = nested_domain->s2_parent->smmu; + bool has_leaf = false, has_cfgi = false; + int data_idx, n = 0, ret; + u64 *cmds; + + if (!smmu) + return -EINVAL; + if (!array->entry_num) + return -EINVAL; + + cmds = kcalloc(array->entry_num, sizeof(*cmds) * 2, GFP_KERNEL); + if (!cmds) + return -ENOMEM; + + for (data_idx = 0; data_idx < array->entry_num; data_idx++) { + struct iommu_hwpt_arm_smmuv3_invalidate *inv = + (struct iommu_hwpt_arm_smmuv3_invalidate *)&cmds[n * 2]; + + ret = iommu_copy_struct_from_user_array(inv, array, + IOMMU_HWPT_DATA_ARM_SMMUV3, + data_idx, cmd); + if (ret) + goto out; + + ret = arm_smmu_fix_user_cmd(nested_domain, inv->cmd); + if (ret) + goto out; + + if (arm_smmu_cmdq_hit_errata(smmu, inv->cmd, &has_leaf, &has_cfgi)) { + /* WAR is to issue the batch prior, with a CMD_SYNC */ + ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmds, n, true); + if (ret) + goto out; + /* Then move the cmd to the head for a new batch */ + cmds[1] = cmds[n * 2 + 1]; + cmds[0] = cmds[n * 2]; + n = 1; + continue; + } + + if (++n == CMDQ_BATCH_ENTRIES - 1) { + ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmds, n, true); + if (ret) + goto out; + n = 0; + } + } + + if (n) + ret = arm_smmu_cmdq_issue_cmdlist(smmu, cmds, n, true); +out: + array->entry_num = data_idx; + kfree(cmds); + return ret; +} + static const struct iommu_domain_ops arm_smmu_nested_ops = { .attach_dev = arm_smmu_attach_dev_nested, .free = arm_smmu_domain_nested_free, .get_msi_mapping_domain = arm_smmu_get_msi_mapping_domain, + .cache_invalidate_user = arm_smmu_cache_invalidate_user, };
static struct iommu_domain * diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cf13ab623b4a..aa57a6195d89 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -721,6 +721,26 @@ struct iommu_hwpt_vtd_s1_invalidate { __u32 __reserved; };
+/** + * struct iommu_hwpt_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation + * (IOMMU_HWPT_DATA_ARM_SMMUV3) + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. + * Must be little-endian. + * + * Supported command list: + * CMDQ_OP_TLBI_NSNH_ALL + * CMDQ_OP_TLBI_NH_VA + * CMDQ_OP_TLBI_NH_VAA + * CMDQ_OP_TLBI_NH_ALL + * CMDQ_OP_TLBI_NH_ASID + * CMDQ_OP_ATC_INV + * CMDQ_OP_CFGI_CD + * CMDQ_OP_CFGI_CD_ALL + */ +struct iommu_hwpt_arm_smmuv3_invalidate { + __aligned_u64 cmd[2]; +}; + /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate)
From: Nicolin Chen nicolinc@nvidia.com
We should likely figure out something else to enable ATS.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
Conflicts: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ef0fe6d07e86..2bbf5304198b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3265,10 +3265,15 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct device *dev) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_ste ste;
arm_smmu_make_bypass_ste(&ste); arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS); + if (dev_is_pci(master->dev) && master->ssid_bits && !master->ats_enabled) { + master->ats_enabled = true; + arm_smmu_enable_ats(master); + } return 0; }
From: Liu Yi L yi.l.liu@intel.com
Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the PF PASID configuration is shared by its VFs. VFs must not implement their own PASID Capability.
Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If the PF implements PRI, it is shared by the VFs.
On bare metal, it has been fixed by below efforts. to PASID/PRI are https://lkml.org/lkml/2019/9/5/996 https://lkml.org/lkml/2019/9/5/995
This patch adds emulated PASID/PRI capabilities for VFs when assigned to VMs via vfio-pci driver. This is required for enabling vSVA on pass-through VFs as VFs have no PASID/PRI capability structure in its configure space.
Cc: Kevin Tian kevin.tian@intel.com CC: Jacob Pan jacob.jun.pan@linux.intel.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Eric Auger eric.auger@redhat.com Cc: Jean-Philippe Brucker jean-philippe@linaro.org Signed-off-by: Liu Yi L yi.l.liu@intel.com [Shameer- Rebased] Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/vfio/pci/vfio_pci_config.c | 325 ++++++++++++++++++++++++++++- 1 file changed, 323 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 7e2e62ab0869..cfa15c033081 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1606,11 +1606,304 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev) return 0; }
+static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_core_device *vdev, + int offset, uint8_t *data, int size) +{ + int ret = 0, data_offset = 0; + + while (size) { + int filled; + + if (size >= 4 && !(offset % 4)) { + __le32 *dwordp = (__le32 *)&vdev->vconfig[offset]; + u32 dword; + + memcpy(&dword, data + data_offset, 4); + *dwordp = cpu_to_le32(dword); + filled = 4; + } else if (size >= 2 && !(offset % 2)) { + __le16 *wordp = (__le16 *)&vdev->vconfig[offset]; + u16 word; + + memcpy(&word, data + data_offset, 2); + *wordp = cpu_to_le16(word); + filled = 2; + } else { + u8 *byte = &vdev->vconfig[offset]; + + memcpy(byte, data + data_offset, 1); + filled = 1; + } + + offset += filled; + data_offset += filled; + size -= filled; + } + + return ret; +} + +static int vfio_pci_get_ecap_content(struct pci_dev *pdev, + int cap, int cap_len, u8 *content) +{ + int pos, offset, len = cap_len, ret = 0; + + pos = pci_find_ext_capability(pdev, cap); + if (!pos) + return -EINVAL; + + offset = 0; + while (len) { + int fetched; + + if (len >= 4 && !(pos % 4)) { + u32 *dwordp = (u32 *) (content + offset); + u32 dword; + __le32 *dwptr = (__le32 *) &dword; + + ret = pci_read_config_dword(pdev, pos, &dword); + if (ret) + return ret; + *dwordp = le32_to_cpu(*dwptr); + fetched = 4; + } else if (len >= 2 && !(pos % 2)) { + u16 *wordp = (u16 *) (content + offset); + u16 word; + __le16 *wptr = (__le16 *) &word; + + ret = pci_read_config_word(pdev, pos, &word); + if (ret) + return ret; + *wordp = le16_to_cpu(*wptr); + fetched = 2; + } else { + u8 *byte = (u8 *) (content + offset); + + ret = pci_read_config_byte(pdev, pos, byte); + if (ret) + return ret; + fetched = 1; + } + + pos += fetched; + offset += fetched; + len -= fetched; + } + + return ret; +} + +struct vfio_pci_pasid_cap_data { + u32 id:16; + u32 version:4; + u32 next:12; + union { + u16 cap_reg_val; + struct { + u16 rsv1:1; + u16 execs:1; + u16 prvs:1; + u16 rsv2:5; + u16 pasid_bits:5; + u16 rsv3:3; + }; + } cap_reg; + union { + u16 control_reg_val; + struct { + u16 paside:1; + u16 exece:1; + u16 prve:1; + u16 rsv4:13; + }; + } control_reg; +}; + +static int vfio_pci_add_pasid_cap(struct vfio_pci_core_device *vdev, + struct pci_dev *pdev, + u16 epos, u16 *next, __le32 **prevp) +{ + u8 *map = vdev->pci_config_map; + int ecap = PCI_EXT_CAP_ID_PASID; + int len = pci_ext_cap_length[ecap]; + struct vfio_pci_pasid_cap_data pasid_cap; + struct vfio_pci_pasid_cap_data vpasid_cap; + int ret; + + /* + * If no cap filled in this function, should make sure the next + * pointer points to current epos. + */ + *next = epos; + + if (!len) { + pr_info("%s: VF %s hiding PASID capability\n", + __func__, dev_name(&vdev->pdev->dev)); + ret = 0; + goto out; + } + + /* Add PASID capability*/ + ret = vfio_pci_get_ecap_content(pdev, ecap, + len, (u8 *)&pasid_cap); + if (ret) + goto out; + + if (!pasid_cap.control_reg.paside) { + pr_debug("%s: its PF's PASID capability is not enabled\n", + dev_name(&vdev->pdev->dev)); + ret = 0; + goto out; + } + + memcpy(&vpasid_cap, &pasid_cap, len); + + vpasid_cap.id = 0x18; + vpasid_cap.next = 0; + /* clear the control reg for guest */ + memset(&vpasid_cap.control_reg, 0x0, + sizeof(vpasid_cap.control_reg)); + + memset(map + epos, vpasid_cap.id, len); + ret = vfio_fill_custom_vconfig_bytes(vdev, epos, + (u8 *)&vpasid_cap, len); + if (!ret) { + /* + * Successfully filled in PASID cap, update + * the next offset in previous cap header, + * and also update caller about the offset + * of next cap if any. + */ + u32 val = epos; + **prevp &= cpu_to_le32(~(0xffcU << 20)); + **prevp |= cpu_to_le32(val << 20); + *prevp = (__le32 *)&vdev->vconfig[epos]; + *next = epos + len; + } + +out: + return ret; +} + +struct vfio_pci_pri_cap_data { + u32 id:16; + u32 version:4; + u32 next:12; + union { + u16 control_reg_val; + struct { + u16 enable:1; + u16 reset:1; + u16 rsv1:14; + }; + } control_reg; + union { + u16 status_reg_val; + struct { + u16 rf:1; + u16 uprgi:1; + u16 rsv2:6; + u16 stop:1; + u16 rsv3:6; + u16 pasid_required:1; + }; + } status_reg; + u32 prq_capacity; + u32 prq_quota; +}; + +static int vfio_pci_add_pri_cap(struct vfio_pci_core_device *vdev, + struct pci_dev *pdev, + u16 epos, u16 *next, __le32 **prevp) +{ + u8 *map = vdev->pci_config_map; + int ecap = PCI_EXT_CAP_ID_PRI; + int len = pci_ext_cap_length[ecap]; + struct vfio_pci_pri_cap_data pri_cap; + struct vfio_pci_pri_cap_data vpri_cap; + int ret; + + /* + * If no cap filled in this function, should make sure the next + * pointer points to current epos. + */ + *next = epos; + + if (!len) { + pr_info("%s: VF %s hiding PRI capability\n", + __func__, dev_name(&vdev->pdev->dev)); + ret = 0; + goto out; + } + + /* Add PASID capability*/ + ret = vfio_pci_get_ecap_content(pdev, ecap, + len, (u8 *)&pri_cap); + if (ret) + goto out; + + if (!pri_cap.control_reg.enable) { + pr_debug("%s: its PF's PRI capability is not enabled\n", + dev_name(&vdev->pdev->dev)); + ret = 0; + goto out; + } + + memcpy(&vpri_cap, &pri_cap, len); + + vpri_cap.id = 0x19; + vpri_cap.next = 0; + /* clear the control reg for guest */ + memset(&vpri_cap.control_reg, 0x0, + sizeof(vpri_cap.control_reg)); + + memset(map + epos, vpri_cap.id, len); + ret = vfio_fill_custom_vconfig_bytes(vdev, epos, + (u8 *)&vpri_cap, len); + if (!ret) { + /* + * Successfully filled in PASID cap, update + * the next offset in previous cap header, + * and also update caller about the offset + * of next cap if any. + */ + u32 val = epos; + **prevp &= cpu_to_le32(~(0xffcU << 20)); + **prevp |= cpu_to_le32(val << 20); + *prevp = (__le32 *)&vdev->vconfig[epos]; + *next = epos + len; + } + +out: + return ret; +} + +static int vfio_pci_add_emulated_cap_for_vf(struct vfio_pci_core_device *vdev, + struct pci_dev *pdev, u16 start_epos, __le32 *prev) +{ + __le32 *__prev = prev; + u16 epos = start_epos, epos_next = start_epos; + int ret = 0; + + /* Add PASID capability*/ + ret = vfio_pci_add_pasid_cap(vdev, pdev, epos, + &epos_next, &__prev); + if (ret) + return ret; + + /* Add PRI capability */ + epos = epos_next; + ret = vfio_pci_add_pri_cap(vdev, pdev, epos, + &epos_next, &__prev); + + return ret; +} + static int vfio_ecap_init(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; u8 *map = vdev->pci_config_map; - u16 epos; + u16 epos, epos_max; __le32 *prev = NULL; int loops, ret, ecaps = 0;
@@ -1618,6 +1911,7 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev) return 0;
epos = PCI_CFG_SPACE_SIZE; + epos_max = PCI_CFG_SPACE_SIZE;
loops = (pdev->cfg_size - PCI_CFG_SPACE_SIZE) / PCI_CAP_SIZEOF;
@@ -1642,6 +1936,9 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev) } }
+ if (epos_max <= (epos + len)) + epos_max = epos + len; + if (!len) { pci_dbg(pdev, "%s: hiding ecap %#x@%#x\n", __func__, ecap, epos); @@ -1701,6 +1998,18 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev) if (!ecaps) *(u32 *)&vdev->vconfig[PCI_CFG_SPACE_SIZE] = 0;
+#ifdef CONFIG_PCI_ATS + if (pdev->is_virtfn) { + struct pci_dev *physfn = pdev->physfn; + + ret = vfio_pci_add_emulated_cap_for_vf(vdev, + physfn, epos_max, prev); + if (ret) + pr_info("%s, failed to add special caps for VF %s\n", + __func__, dev_name(&vdev->pdev->dev)); + } +#endif + return 0; }
@@ -1859,6 +2168,17 @@ static size_t vfio_pci_cap_remaining_dword(struct vfio_pci_core_device *vdev, return i; }
+static bool vfio_pci_need_virt_perm(struct pci_dev *pdev, u8 cap_id) +{ +#ifdef CONFIG_PCI_ATS + return (pdev->is_virtfn && + (cap_id == PCI_EXT_CAP_ID_PASID || + cap_id == PCI_EXT_CAP_ID_PRI)); +#else + return false; +#endif +} + static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { @@ -1892,7 +2212,8 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user if (cap_id == PCI_CAP_ID_INVALID) { perm = &unassigned_perms; cap_start = *ppos; - } else if (cap_id == PCI_CAP_ID_INVALID_VIRT) { + } else if (cap_id == PCI_CAP_ID_INVALID_VIRT || + vfio_pci_need_virt_perm(pdev, cap_id)) { perm = &virt_perms; cap_start = *ppos; } else {
From: Shameer Kolothum shameerali.kolothum.thodi@huawei.com
This is already spotted in ML, and will get rectified by Liu Yi L in the next version of his series.
Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/vfio/pci/vfio_pci_config.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index cfa15c033081..d80a4d4aea43 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1758,7 +1758,6 @@ static int vfio_pci_add_pasid_cap(struct vfio_pci_core_device *vdev,
memcpy(&vpasid_cap, &pasid_cap, len);
- vpasid_cap.id = 0x18; vpasid_cap.next = 0; /* clear the control reg for guest */ memset(&vpasid_cap.control_reg, 0x0, @@ -1851,7 +1850,6 @@ static int vfio_pci_add_pri_cap(struct vfio_pci_core_device *vdev,
memcpy(&vpri_cap, &pri_cap, len);
- vpri_cap.id = 0x19; vpri_cap.next = 0; /* clear the control reg for guest */ memset(&vpri_cap.control_reg, 0x0,
From: Liu Yi L yi.l.liu@intel.com
This patch exposes PCIe PASID capability to userspace and where to emulate this capability if wants to further expose it to VM.
And this patch only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. While for VFs, user space still unable to see this capability as SR-IOV spec forbides VF to implement PASID capability extended structure. It is a TODO in future. Related discussion can be found in below link:
https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/
v7 -> v8: *) refine the commit message and the subject.
v5 -> v6: *) add review-by from Eric Auger.
v1 -> v2: *) added in v2, but it was sent in a separate patchseries before
Cc: Kevin Tian kevin.tian@intel.com CC: Jacob Pan jacob.jun.pan@linux.intel.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Eric Auger eric.auger@redhat.com Cc: Jean-Philippe Brucker jean-philippe@linaro.org Cc: Joerg Roedel joro@8bytes.org Cc: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Liu Yi L yi.l.liu@intel.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index d80a4d4aea43..333d74b8938a 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = { [PCI_EXT_CAP_ID_LTR] = PCI_EXT_CAP_LTR_SIZEOF, [PCI_EXT_CAP_ID_SECPCI] = 0, /* not yet */ [PCI_EXT_CAP_ID_PMUX] = 0, /* not yet */ - [PCI_EXT_CAP_ID_PASID] = 0, /* not yet */ + [PCI_EXT_CAP_ID_PASID] = PCI_EXT_CAP_PASID_SIZEOF, /* not yet */ [PCI_EXT_CAP_ID_DVSEC] = 0xFF, };
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5048 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5048 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...