From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 7da51af9125c624318c8099de13c5ddefd47e9e8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As the comment in arm_smmu_write_strtab_ent() explains, this routine has been limited to only work correctly in certain scenarios that the caller must ensure. Generally the caller must put the STE into ABORT or BYPASS before attempting to program it to something else.
The iommu core APIs would ideally expect the driver to do a hitless change of iommu_domain in a number of cases:
- RESV_DIRECT support wants IDENTITY -> DMA -> IDENTITY to be hitless for the RESV ranges
- PASID upgrade has IDENTIY on the RID with no PASID then a PASID paging domain installed. The RID should not be impacted
- PASID downgrade has IDENTIY on the RID and all PASID's removed. The RID should not be impacted
- RID does PAGING -> BLOCKING with active PASID, PASID's should not be impacted
- NESTING -> NESTING for carrying all the above hitless cases in a VM into the hypervisor. To comprehensively emulate the HW in a VM we should assume the VM OS is running logic like this and expecting hitless updates to be relayed to real HW.
For CD updates arm_smmu_write_ctx_desc() has a similar comment explaining how limited it is, and the driver does have a need for hitless CD updates:
- SMMUv3 BTM S1 ASID re-label
- SVA mm release should change the CD to answert not-present to all requests without allowing logging (EPD0)
The next patches/series are going to start removing some of this logic from the callers, and add more complex state combinations than currently. At the end everything that can be hitless will be hitless, including all of the above.
Introduce arm_smmu_write_ste() which will run through the multi-qword programming sequence to avoid creating an incoherent 'torn' STE in the HW caches. It automatically detects which of two algorithms to use:
1) The disruptive V=0 update described in the spec which disrupts the entry and does three syncs to make the change: - Write V=0 to QWORD 0 - Write the entire STE except QWORD 0 - Write QWORD 0
2) A hitless update algorithm that follows the same rational that the driver already uses. It is safe to change IGNORED bits that HW doesn't use: - Write the target value into all currently unused bits - Write a single QWORD, this makes the new STE live atomically - Ensure now unused bits are 0
The detection of which path to use and the implementation of the hitless update rely on a "used bitmask" describing what bits the HW is actually using based on the V/CFG/etc bits. This flows from the spec language, typically indicated as IGNORED.
Knowing which bits the HW is using we can update the bits it does not use and then compute how many QWORDS need to be changed. If only one qword needs to be updated the hitless algorithm is possible.
Later patches will include CD updates in this mechanism so make the implementation generic using a struct arm_smmu_entry_writer and struct arm_smmu_entry_writer_ops to abstract the differences between STE and CD to be plugged in.
At this point it generates the same sequence of updates as the current code, except that zeroing the VMID on entry to BYPASS/ABORT will do an extra sync (this seems to be an existing bug).
Going forward this will use a V=0 transition instead of cycling through ABORT if a hitfull change is required. This seems more appropriate as ABORT will fail DMAs without any logging, but dropping a DMA due to transient V=0 is probably signaling a bug, so the C_BAD_STE is valuable.
Add STRTAB_STE_1_SHCFG_INCOMING to s2_cfg, this was editing the STE in place and subtly inherited the value of data[1] from abort/bypass.
Signed-off-by: Michael Shavit mshavit@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 275 +++++++++++++++----- 1 file changed, 211 insertions(+), 64 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 230368d05892..21cb2e47f96e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -110,6 +110,9 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
+static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, + ioasid_t sid); + static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { ARM_SMMU_EVTQ_IRQ_CFG0, @@ -1190,6 +1193,199 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); }
+/* + * Based on the value of ent report which bits of the STE the HW will access. It + * would be nice if this was complete according to the spec, but minimally it + * has to capture the bits this driver uses. + */ +static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, + struct arm_smmu_ste *used_bits) +{ + unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0])); + + used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V); + if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V))) + return; + + used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG); + + /* S1 translates */ + if (cfg & BIT(0)) { + used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | + STRTAB_STE_0_S1CTXPTR_MASK | + STRTAB_STE_0_S1CDMAX); + used_bits->data[1] |= + cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | + STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | + STRTAB_STE_1_EATS); + used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + } + + /* S2 translates */ + if (cfg & BIT(1)) { + used_bits->data[1] |= + cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); + used_bits->data[2] |= + cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | + STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | + STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R); + used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); + } + + if (cfg == STRTAB_STE_0_CFG_BYPASS) + used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); +} + +/* + * Figure out if we can do a hitless update of entry to become target. Returns a + * bit mask where 1 indicates that qword needs to be set disruptively. + * unused_update is an intermediate value of entry that has unused bits set to + * their new values. + */ +static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target, + struct arm_smmu_ste *unused_update) +{ + struct arm_smmu_ste target_used = {}; + struct arm_smmu_ste cur_used = {}; + u8 used_qword_diff = 0; + unsigned int i; + + arm_smmu_get_ste_used(entry, &cur_used); + arm_smmu_get_ste_used(target, &target_used); + + for (i = 0; i != ARRAY_SIZE(target_used.data); i++) { + /* + * Check that masks are up to date, the make functions are not + * allowed to set a bit to 1 if the used function doesn't say it + * is used. + */ + WARN_ON_ONCE(target->data[i] & ~target_used.data[i]); + + /* Bits can change because they are not currently being used */ + unused_update->data[i] = (entry->data[i] & cur_used.data[i]) | + (target->data[i] & ~cur_used.data[i]); + /* + * Each bit indicates that a used bit in a qword needs to be + * changed after unused_update is applied. + */ + if ((unused_update->data[i] & target_used.data[i]) != + target->data[i]) + used_qword_diff |= 1 << i; + } + return used_qword_diff; +} + +static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, + struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target, unsigned int start, + unsigned int len) +{ + bool changed = false; + unsigned int i; + + for (i = start; len != 0; len--, i++) { + if (entry->data[i] != target->data[i]) { + WRITE_ONCE(entry->data[i], target->data[i]); + changed = true; + } + } + + if (changed) + arm_smmu_sync_ste_for_sid(smmu, sid); + return changed; +} + +/* + * Update the STE/CD to the target configuration. The transition from the + * current entry to the target entry takes place over multiple steps that + * attempts to make the transition hitless if possible. This function takes care + * not to create a situation where the HW can perceive a corrupted entry. HW is + * only required to have a 64 bit atomicity with stores from the CPU, while + * entries are many 64 bit values big. + * + * The difference between the current value and the target value is analyzed to + * determine which of three updates are required - disruptive, hitless or no + * change. + * + * In the most general disruptive case we can make any update in three steps: + * - Disrupting the entry (V=0) + * - Fill now unused qwords, execpt qword 0 which contains V + * - Make qword 0 have the final value and valid (V=1) with a single 64 + * bit store + * + * However this disrupts the HW while it is happening. There are several + * interesting cases where a STE/CD can be updated without disturbing the HW + * because only a small number of bits are changing (S1DSS, CONFIG, etc) or + * because the used bits don't intersect. We can detect this by calculating how + * many 64 bit values need update after adjusting the unused bits and skip the + * V=0 process. This relies on the IGNORED behavior described in the + * specification. + */ +static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, + struct arm_smmu_ste *entry, + const struct arm_smmu_ste *target) +{ + unsigned int num_entry_qwords = ARRAY_SIZE(target->data); + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_ste unused_update; + u8 used_qword_diff; + + used_qword_diff = + arm_smmu_entry_qword_diff(entry, target, &unused_update); + if (hweight8(used_qword_diff) == 1) { + /* + * Only one qword needs its used bits to be changed. This is a + * hitless update, update all bits the current STE is ignoring + * to their new values, then update a single "critical qword" to + * change the STE and finally 0 out any bits that are now unused + * in the target configuration. + */ + unsigned int critical_qword_index = ffs(used_qword_diff) - 1; + + /* + * Skip writing unused bits in the critical qword since we'll be + * writing it in the next step anyways. This can save a sync + * when the only change is in that qword. + */ + unused_update.data[critical_qword_index] = + entry->data[critical_qword_index]; + entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords); + entry_set(smmu, sid, entry, target, critical_qword_index, 1); + entry_set(smmu, sid, entry, target, 0, num_entry_qwords); + } else if (used_qword_diff) { + /* + * At least two qwords need their inuse bits to be changed. This + * requires a breaking update, zero the V bit, write all qwords + * but 0, then set qword 0 + */ + unused_update.data[0] = entry->data[0] & (~STRTAB_STE_0_V); + entry_set(smmu, sid, entry, &unused_update, 0, 1); + entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); + entry_set(smmu, sid, entry, target, 0, 1); + } else { + /* + * No inuse bit changed. Sanity check that all unused bits are 0 + * in the entry. The target was already sanity checked by + * compute_qword_diff(). + */ + WARN_ON_ONCE( + entry_set(smmu, sid, entry, target, 0, num_entry_qwords)); + } + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { + struct arm_smmu_cmdq_ent + prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + } }; + + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + } +} + static void arm_smmu_sync_cd(struct arm_smmu_master *master, int ssid, bool leaf) { @@ -1483,34 +1679,12 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { - /* - * This is hideously complicated, but we only really care about - * three cases at the moment: - * - * 1. Invalid (all zero) -> bypass/fault (init) - * 2. Bypass/fault -> translation/bypass (attach) - * 3. Translation/bypass -> bypass/fault (detach) - * - * Given that we can't update the STE atomically and the SMMU - * doesn't read the thing in a defined order, that leaves us - * with the following maintenance requirements: - * - * 1. Update Config, return (init time STEs aren't live) - * 2. Write everything apart from dword 0, sync, write dword 0, sync - * 3. Update Config, sync - */ - u64 val = le64_to_cpu(dst->data[0]); - bool ste_live = false; + u64 val; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = NULL; struct arm_smmu_s2_cfg *s2_cfg = NULL; struct arm_smmu_domain *smmu_domain = master->domain; - struct arm_smmu_cmdq_ent prefetch_cmd = { - .opcode = CMDQ_OP_PREFETCH_CFG, - .prefetch = { - .sid = sid, - }, - }; + struct arm_smmu_ste target = {};
if (smmu_domain) { switch (smmu_domain->stage) { @@ -1525,22 +1699,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, } }
- if (val & STRTAB_STE_0_V) { - switch (FIELD_GET(STRTAB_STE_0_CFG, val)) { - case STRTAB_STE_0_CFG_BYPASS: - break; - case STRTAB_STE_0_CFG_S1_TRANS: - case STRTAB_STE_0_CFG_S2_TRANS: - ste_live = true; - break; - case STRTAB_STE_0_CFG_ABORT: - BUG_ON(!disable_bypass); - break; - default: - BUG(); /* STE corruption */ - } - } - /* Nuke the existing STE_0 value, as we're going to rewrite it */ val = STRTAB_STE_0_V;
@@ -1551,16 +1709,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, else val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
- dst->data[0] = cpu_to_le64(val); - dst->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + target.data[0] = cpu_to_le64(val); + target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); - dst->data[2] = 0; /* Nuke the VMID */ - /* - * The SMMU can perform negative caching, so we must sync - * the STE regardless of whether the old value was live. - */ - if (smmu) - arm_smmu_sync_ste_for_sid(smmu, sid); + target.data[2] = 0; /* Nuke the VMID */ + arm_smmu_write_ste(master, sid, dst, &target); return; }
@@ -1568,8 +1721,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
- BUG_ON(ste_live); - dst->data[1] = cpu_to_le64( + target.data[1] = cpu_to_le64( FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | @@ -1578,7 +1730,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
if (smmu->features & ARM_SMMU_FEAT_STALLS && !master->stall_enabled) - dst->data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); + target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | @@ -1587,8 +1739,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, }
if (s2_cfg) { - BUG_ON(ste_live); - dst->data[2] = cpu_to_le64( + target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + target.data[2] = cpu_to_le64( FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | #ifdef __BIG_ENDIAN @@ -1597,23 +1750,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2R);
- dst->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); + target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); }
if (master->ats_enabled) - dst->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, + target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, STRTAB_STE_1_EATS_TRANS));
- arm_smmu_sync_ste_for_sid(smmu, sid); - /* See comment in arm_smmu_write_ctx_desc() */ - WRITE_ONCE(dst->data[0], cpu_to_le64(val)); - arm_smmu_sync_ste_for_sid(smmu, sid); - - /* It's likely that we'll want to use the new STE soon */ - if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) - arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + target.data[0] = cpu_to_le64(val); + arm_smmu_write_ste(master, sid, dst, &target); }
static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 7686aa5f8d61388eeaa64730363ffc8df20a481d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This allows writing the flow of arm_smmu_write_strtab_ent() around abort and bypass domains more naturally.
Note that the core code no longer supplies NULL domains, though there is still a flow in the driver that end up in arm_smmu_write_strtab_ent() with NULL. A later patch will remove it.
Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes() and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now simply invoke arm_smmu_make_bypass_ste() directly.
Rename arm_smmu_init_bypass_stes() to arm_smmu_init_initial_stes() to better reflect its purpose.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++--------- 1 file changed, 55 insertions(+), 42 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 21cb2e47f96e..fdd0b737ab99 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1676,6 +1676,24 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); }
+static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) +{ + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); +} + +static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) +{ + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS)); + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { @@ -1686,37 +1704,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_domain *smmu_domain = master->domain; struct arm_smmu_ste target = {};
- if (smmu_domain) { - switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: - cd_table = &master->cd_table; - break; - case ARM_SMMU_DOMAIN_S2: - s2_cfg = &smmu_domain->s2_cfg; - break; - default: - break; - } - } - - /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = STRTAB_STE_0_V; - - /* Bypass/fault */ - if (!smmu_domain || !(cd_table || s2_cfg)) { - if (!smmu_domain && disable_bypass) - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); + if (!smmu_domain) { + if (disable_bypass) + arm_smmu_make_abort_ste(&target); else - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS); + arm_smmu_make_bypass_ste(&target); + arm_smmu_write_ste(master, sid, dst, &target); + return; + }
- target.data[0] = cpu_to_le64(val); - target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, - STRTAB_STE_1_SHCFG_INCOMING)); - target.data[2] = 0; /* Nuke the VMID */ + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: + cd_table = &master->cd_table; + break; + case ARM_SMMU_DOMAIN_S2: + s2_cfg = &smmu_domain->s2_cfg; + break; + case ARM_SMMU_DOMAIN_BYPASS: + arm_smmu_make_bypass_ste(&target); arm_smmu_write_ste(master, sid, dst, &target); return; }
+ /* Nuke the existing STE_0 value, as we're going to rewrite it */ + val = STRTAB_STE_0_V; + if (cd_table) { u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; @@ -1763,22 +1775,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, arm_smmu_write_ste(master, sid, dst, &target); }
-static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab, - unsigned int nent, bool force) +/* + * This can safely directly manipulate the STE memory without a sync sequence + * because the STE table has not been installed in the SMMU yet. + */ +static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, + unsigned int nent) { unsigned int i; - u64 val = STRTAB_STE_0_V; - - if (disable_bypass && !force) - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); - else - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
for (i = 0; i < nent; ++i) { - strtab->data[0] = cpu_to_le64(val); - strtab->data[1] = cpu_to_le64(FIELD_PREP( - STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); - strtab->data[2] = 0; + if (disable_bypass) + arm_smmu_make_abort_ste(strtab); + else + arm_smmu_make_bypass_ste(strtab); strtab++; } } @@ -1806,7 +1816,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false); + arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); return 0; } @@ -3738,7 +3748,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); cfg->strtab_base_cfg = reg;
- arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false); + arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); return 0; }
@@ -4912,7 +4922,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
list_for_each_entry(e, &rmr_list, list) { - struct arm_smmu_ste *step; struct iommu_iort_rmr_data *rmr; int ret, i;
@@ -4925,8 +4934,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) continue; }
- step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]); - arm_smmu_init_bypass_stes(step, 1, true); + /* + * STE table is not programmed to HW, see + * arm_smmu_initial_bypass_stes() + */ + arm_smmu_make_bypass_ste( + arm_smmu_get_step_for_sid(smmu, rmr->sids[i])); } }
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit efe15df08727d483bd247ff905a828f0de955de6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This is preparation to move the STE calculation higher up in to the call chain and remove arm_smmu_write_strtab_ent(). These new functions will be called directly from attach_dev.
Reviewed-by: Moritz Fischer mdf@kernel.org Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 138 ++++++++++++-------- 1 file changed, 83 insertions(+), 55 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index fdd0b737ab99..ebd97c1fb07a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1694,13 +1694,89 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); }
+static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master) +{ + struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; + struct arm_smmu_device *smmu = master->smmu; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | + FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) | + (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | + FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | + ((smmu->features & ARM_SMMU_FEAT_STALLS && + !master->stall_enabled) ? + STRTAB_STE_1_S1STALLD : + 0) | + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + + if (smmu->features & ARM_SMMU_FEAT_E2H) { + /* + * To support BTM the streamworld needs to match the + * configuration of the CPU so that the ASID broadcasts are + * properly matched. This means either S/NS-EL2-E2H (hypervisor) + * or NS-EL1 (guest). Since an SVA domain can be installed in a + * PASID this should always use a BTM compatible configuration + * if the HW supports it. + */ + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2)); + } else { + target->data[1] |= cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1)); + + /* + * VMID 0 is reserved for stage-2 bypass EL1 STEs, see + * arm_smmu_domain_alloc_id() + */ + target->data[2] = + cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0)); + } +} + +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; + + memset(target, 0, sizeof(*target)); + target->data[0] = cpu_to_le64( + STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS)); + + target->data[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_EATS, + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | + FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + + target->data[2] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | + FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | + STRTAB_STE_2_S2AA64 | +#ifdef __BIG_ENDIAN + STRTAB_STE_2_S2ENDI | +#endif + STRTAB_STE_2_S2PTW | + STRTAB_STE_2_S2R); + + target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, struct arm_smmu_ste *dst) { - u64 val; - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ctx_desc_cfg *cd_table = NULL; - struct arm_smmu_s2_cfg *s2_cfg = NULL; struct arm_smmu_domain *smmu_domain = master->domain; struct arm_smmu_ste target = {};
@@ -1715,63 +1791,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: - cd_table = &master->cd_table; + arm_smmu_make_cdtable_ste(&target, master); break; case ARM_SMMU_DOMAIN_S2: - s2_cfg = &smmu_domain->s2_cfg; + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); break; case ARM_SMMU_DOMAIN_BYPASS: arm_smmu_make_bypass_ste(&target); - arm_smmu_write_ste(master, sid, dst, &target); - return; - } - - /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = STRTAB_STE_0_V; - - if (cd_table) { - u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? - STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; - - target.data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | - FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | - FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | - FIELD_PREP(STRTAB_STE_1_STRW, strw)); - - if (smmu->features & ARM_SMMU_FEAT_STALLS && - !master->stall_enabled) - target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); - - val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | - FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | - FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) | - FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt); - } - - if (s2_cfg) { - target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, - STRTAB_STE_1_SHCFG_INCOMING)); - target.data[2] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | - FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | -#ifdef __BIG_ENDIAN - STRTAB_STE_2_S2ENDI | -#endif - STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | - STRTAB_STE_2_S2R); - - target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); - - val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); + break; } - - if (master->ats_enabled) - target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, - STRTAB_STE_1_EATS_TRANS)); - - target.data[0] = cpu_to_le64(val); arm_smmu_write_ste(master, sid, dst, &target); }
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 71b0aa10b18dd589a899449457e5ab7c1da00d01 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Half the code was living in arm_smmu_domain_finalise_s2(), just move it here and take the values directly from the pgtbl_ops instead of storing copies.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 ++++++++++++--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 -- 2 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ebd97c1fb07a..ba66017a4c07 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1749,6 +1749,11 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; + const struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; + typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr = + &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; + u64 vtcr_val;
memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1761,9 +1766,16 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
+ vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); target->data[2] = cpu_to_le64( FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | - FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | + FIELD_PREP(STRTAB_STE_2_VTCR, vtcr_val) | STRTAB_STE_2_S2AA64 | #ifdef __BIG_ENDIAN STRTAB_STE_2_S2ENDI | @@ -1771,7 +1783,8 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
- target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); + target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr & + STRTAB_STE_3_S2TTB_MASK); }
static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, @@ -2524,7 +2537,6 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, int vmid; struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; - typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
/* Reserve VMID 0 for stage-2 bypass STEs */ vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1, @@ -2532,16 +2544,7 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, if (vmid < 0) return vmid;
- vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; cfg->vmid = (u16)vmid; - cfg->vttbr = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; - cfg->vtcr = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | - FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); return 0; }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 93bb08bd7404..d8dbebb69f89 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -661,8 +661,6 @@ struct arm_smmu_ctx_desc_cfg {
struct arm_smmu_s2_cfg { u16 vmid; - u64 vttbr; - u64 vtcr; };
struct arm_smmu_strtab_cfg {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 9f7c68911579bc15c57d227d021ccd253da2b635 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The BTM support wants to be able to change the ASID of any smmu_domain. When it goes to do this it holds the arm_smmu_asid_lock and iterates over the target domain's devices list.
During attach of a S1 domain we must ensure that the devices list and CD are in sync, otherwise we could miss CD updates or a parallel CD update could push an out of date CD.
This is pretty complicated, and almost works today because arm_smmu_detach_dev() removes the master from the linked list before working on the CD entries, preventing parallel update of the CD.
However, it does have an issue where the CD can remain programed while the domain appears to be unattached. arm_smmu_share_asid() will then not clear any CD entriess and install its own CD entry with the same ASID concurrently. This creates a small race window where the IOMMU can see two ASIDs pointing to different translations.
CPU0 CPU1 arm_smmu_attach_dev() arm_smmu_detach_dev() spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_mmu_notifier_get() arm_smmu_alloc_shared_cd() arm_smmu_share_asid(): // Does nothing due to list_del above arm_smmu_update_ctx_desc_devices() arm_smmu_tlb_inv_asid() arm_smmu_write_ctx_desc() ** Now the ASID is in two CDs with different translation
arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
Solve this by wrapping most of the attach flow in the arm_smmu_asid_lock. This locks more than strictly needed to prepare for the next patch which will reorganize the order of the linked list, STE and CD changes.
Move arm_smmu_detach_dev() till after we have initialized the domain so the lock can be held for less time.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ba66017a4c07..39d3ff232c4d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2820,8 +2820,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -EBUSY; }
- arm_smmu_detach_dev(master); - mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { @@ -2836,6 +2834,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) if (ret) return ret;
+ /* + * Prevent arm_smmu_share_asid() from trying to change the ASID + * of either the old or new domain while we are working on it. + * This allows the STE and the smmu_domain->devices list to + * be inconsistent during this routine. + */ + mutex_lock(&arm_smmu_asid_lock); + + arm_smmu_detach_dev(master); + master->domain = smmu_domain;
/* @@ -2861,13 +2869,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } }
- /* - * Prevent SVA from concurrently modifying the CD or writing to - * the CD entry - */ - mutex_lock(&arm_smmu_asid_lock); ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - mutex_unlock(&arm_smmu_asid_lock); if (ret) { master->domain = NULL; goto out_list_del; @@ -2877,13 +2879,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_install_ste_for_dev(master);
arm_smmu_enable_ats(master); - return 0; + goto out_unlock;
out_list_del: spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+out_unlock: + mutex_unlock(&arm_smmu_asid_lock); return ret; }
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 65547275d76965c3106fbcd4a9244242eb88224c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently arm_smmu_install_ste_for_dev() iterates over every SID and computes from scratch an identical STE. Every SID should have the same STE contents. Turn this inside out so that the STE is supplied by the caller and arm_smmu_install_ste_for_dev() simply installs it to every SID.
This is possible now that the STE generation does not inform what sequence should be used to program it.
This allows splitting the STE calculation up according to the call site, which following patches will make use of, and removes the confusing NULL domain special case that only supported arm_smmu_detach_dev().
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++++++++------------- 1 file changed, 22 insertions(+), 35 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 39d3ff232c4d..9dbe8bf06257 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1787,35 +1787,6 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, STRTAB_STE_3_S2TTB_MASK); }
-static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, - struct arm_smmu_ste *dst) -{ - struct arm_smmu_domain *smmu_domain = master->domain; - struct arm_smmu_ste target = {}; - - if (!smmu_domain) { - if (disable_bypass) - arm_smmu_make_abort_ste(&target); - else - arm_smmu_make_bypass_ste(&target); - arm_smmu_write_ste(master, sid, dst, &target); - return; - } - - switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: - arm_smmu_make_cdtable_ste(&target, master); - break; - case ARM_SMMU_DOMAIN_S2: - arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); - break; - case ARM_SMMU_DOMAIN_BYPASS: - arm_smmu_make_bypass_ste(&target); - break; - } - arm_smmu_write_ste(master, sid, dst, &target); -} - /* * This can safely directly manipulate the STE memory without a sync sequence * because the STE table has not been installed in the SMMU yet. @@ -2647,7 +2618,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) } }
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) +static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, + const struct arm_smmu_ste *target) { int i, j; struct arm_smmu_device *smmu = master->smmu; @@ -2664,7 +2636,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) if (j < i) continue;
- arm_smmu_write_strtab_ent(master, sid, step); + arm_smmu_write_ste(master, sid, step, target); } }
@@ -2771,6 +2743,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static void arm_smmu_detach_dev(struct arm_smmu_master *master) { unsigned long flags; + struct arm_smmu_ste target; struct arm_smmu_domain *smmu_domain = master->domain;
if (!smmu_domain) @@ -2784,7 +2757,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - arm_smmu_install_ste_for_dev(master); + if (disable_bypass) + arm_smmu_make_abort_ste(&target); + else + arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); /* * Clearing the CD entry isn't strictly required to detach the domain * since the table is uninstalled anyway, but it helps avoid confusion @@ -2799,6 +2776,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret = 0; unsigned long flags; + struct arm_smmu_ste target; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2860,7 +2838,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) list_add(&master->domain_head, &smmu_domain->devices); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); if (ret) { @@ -2874,9 +2853,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->domain = NULL; goto out_list_del; } - }
- arm_smmu_install_ste_for_dev(master); + arm_smmu_make_cdtable_ste(&target, master); + break; + case ARM_SMMU_DOMAIN_S2: + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + break; + case ARM_SMMU_DOMAIN_BYPASS: + arm_smmu_make_bypass_ste(&target); + break; + } + arm_smmu_install_ste_for_dev(master, &target);
arm_smmu_enable_ats(master); goto out_unlock;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 8c73c32c83ce7f3c31864cb044abd3daefacd996 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This was needed because the STE code required the STE to be in ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code can automatically handle all transitions we can remove this step from the attach_dev flow.
A few small bugs exist because of this:
1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false then there will be a moment where the STE points at BYPASS. Since this can be done by VFIO/IOMMUFD it is a small security race.
2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT regions will temporarily become BLOCKED. We'd like drivers to work in a way that allows IOMMU_RESV_DIRECT to be continuously functional during these transitions.
Make arm_smmu_release_device() put the STE back to the correct ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on this path.
As noted before the reordering of the linked list/STE/CD changes is OK against concurrent arm_smmu_share_asid() because of the arm_smmu_asid_lock.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9dbe8bf06257..d939febe615c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2743,7 +2743,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static void arm_smmu_detach_dev(struct arm_smmu_master *master) { unsigned long flags; - struct arm_smmu_ste target; struct arm_smmu_domain *smmu_domain = master->domain;
if (!smmu_domain) @@ -2757,11 +2756,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - if (disable_bypass) - arm_smmu_make_abort_ste(&target); - else - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); /* * Clearing the CD entry isn't strictly required to detach the domain * since the table is uninstalled anyway, but it helps avoid confusion @@ -3131,9 +3125,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) static void arm_smmu_release_device(struct device *dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_ste target;
if (WARN_ON(arm_smmu_master_sva_enabled(master))) iopf_queue_remove_device(master->smmu->evtq.iopf, dev); + + /* Put the STE back to what arm_smmu_init_strtab() sets */ + if (disable_bypass && !dev->iommu->require_direct) + arm_smmu_make_abort_ste(&target); + else + arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); + arm_smmu_detach_dev(master); arm_smmu_disable_pasid(master); arm_smmu_remove_master(master);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit d2e053d73247b68144c7f44d002ebf56acaf2d48 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Get closer to the IOMMU API ideal that changes between domains can be hitless. The ordering for the CD table entry is not entirely clean from this perspective.
When switching away from a STE with a CD table programmed in it we should write the new STE first, then clear any old data in the CD entry.
If we are programming a CD table for the first time to a STE then the CD entry should be programmed before the STE is loaded.
If we are replacing a CD table entry when the STE already points at the CD entry then we just need to do the make/break sequence.
Lift this code out of arm_smmu_detach_dev() so it can all be sequenced properly. The only other caller is arm_smmu_release_device() and it is going to free the cdtable anyhow, so it doesn't matter what is in it.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/8-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++------- 1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d939febe615c..e306cbc2fe94 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2756,14 +2756,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
master->domain = NULL; master->ats_enabled = false; - /* - * Clearing the CD entry isn't strictly required to detach the domain - * since the table is uninstalled anyway, but it helps avoid confusion - * in the call to arm_smmu_write_ctx_desc on the next attach (which - * expects the entry to be empty). - */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) @@ -2840,6 +2832,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->domain = NULL; goto out_list_del; } + } else { + /* + * arm_smmu_write_ctx_desc() relies on the entry being + * invalid to work, clear any existing entry. + */ + ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); + if (ret) { + master->domain = NULL; + goto out_list_del; + } }
ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); @@ -2849,15 +2852,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
arm_smmu_make_cdtable_ste(&target, master); + arm_smmu_install_ste_for_dev(master, &target); break; case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + arm_smmu_install_ste_for_dev(master, &target); + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); break; case ARM_SMMU_DOMAIN_BYPASS: arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, + NULL); break; } - arm_smmu_install_ste_for_dev(master, &target);
arm_smmu_enable_ats(master); goto out_unlock;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit d550ddc5b789f258cb5ce3bfe74af6d5383589b5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The caller already has the domain, just pass it in. A following patch will remove master->domain.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/9-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e306cbc2fe94..d91bb0cbc046 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2655,12 +2655,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master) return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); }
-static void arm_smmu_enable_ats(struct arm_smmu_master *master) +static void arm_smmu_enable_ats(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) { size_t stu; struct pci_dev *pdev; struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_domain *smmu_domain = master->domain;
/* Don't enable ATS at the endpoint if it's not enabled in the STE */ if (!master->ats_enabled) @@ -2676,10 +2676,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
-static void arm_smmu_disable_ats(struct arm_smmu_master *master) +static void arm_smmu_disable_ats(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_domain *smmu_domain = master->domain; - if (!master->ats_enabled) return;
@@ -2748,7 +2747,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) if (!smmu_domain) return;
- arm_smmu_disable_ats(master); + arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); @@ -2870,7 +2869,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) break; }
- arm_smmu_enable_ats(master); + arm_smmu_enable_ats(master, smmu_domain); goto out_unlock;
out_list_del:
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 1b50017d39f650d78a0066734d6fe05920a8c9e8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Introducing global statics which are of type struct iommu_domain, not struct arm_smmu_domain makes it difficult to retain arm_smmu_master->domain, as it can no longer point to an IDENTITY or BLOCKED domain.
The only place that uses the value is arm_smmu_detach_dev(). Change things to work like other drivers and call iommu_get_domain_for_dev() to obtain the current domain.
The master->domain is subtly protecting the master->domain_head against being unused as only PAGING domains will set master->domain and only paging domains use the master->domain_head. To make it simple keep the master->domain_head initialized so that the list_del() logic just does nothing for attached non-PAGING domains.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/10-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/coda/coda.c | 6 ++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - 3 files changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/coda/coda.c b/drivers/coda/coda.c index 4d2969c1a9d3..7e88da67478c 100644 --- a/drivers/coda/coda.c +++ b/drivers/coda/coda.c @@ -327,6 +327,8 @@ static int virtcca_secure_dev_ste_create(struct arm_smmu_device *smmu, struct arm_smmu_master *master, u32 sid) { struct tmi_smmu_ste_params *params_ptr; + struct iommu_domain *domain; + struct arm_smmu_domain *smmu_domain;
params_ptr = kzalloc(sizeof(*params_ptr), GFP_KERNEL); if (!params_ptr) @@ -335,7 +337,9 @@ static int virtcca_secure_dev_ste_create(struct arm_smmu_device *smmu, /* Sync Level 2 STE to TMM */ params_ptr->sid = sid; params_ptr->smmu_id = smmu->s_smmu_id; - params_ptr->smmu_vmid = master->domain->s2_cfg.vmid; + domain = iommu_get_domain_for_dev(master->dev); + smmu_domain = to_smmu_domain(domain); + params_ptr->smmu_vmid = smmu_domain->s2_cfg.vmid;
if (tmi_smmu_ste_create(__pa(params_ptr)) != 0) { kfree(params_ptr); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d91bb0cbc046..d01c7a860c9c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2741,19 +2741,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static void arm_smmu_detach_dev(struct arm_smmu_master *master) { + struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_domain *smmu_domain; unsigned long flags; - struct arm_smmu_domain *smmu_domain = master->domain;
- if (!smmu_domain) + if (!domain) return;
+ smmu_domain = to_smmu_domain(domain); arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del(&master->domain_head); + list_del_init(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- master->domain = NULL; master->ats_enabled = false; }
@@ -2807,8 +2808,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
arm_smmu_detach_dev(master);
- master->domain = smmu_domain; - /* * The SMMU does not support enabling ATS with bypass. When the STE is * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and @@ -2827,10 +2826,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) case ARM_SMMU_DOMAIN_S1: if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - } } else { /* * arm_smmu_write_ctx_desc() relies on the entry being @@ -2838,17 +2835,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) */ ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - } }
ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - if (ret) { - master->domain = NULL; + if (ret) goto out_list_del; - }
arm_smmu_make_cdtable_ste(&target, master); arm_smmu_install_ste_for_dev(master, &target); @@ -2874,7 +2867,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
out_list_del: spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del(&master->domain_head); + list_del_init(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
out_unlock: @@ -3097,6 +3090,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) master->dev = dev; master->smmu = smmu; INIT_LIST_HEAD(&master->bonds); + INIT_LIST_HEAD(&master->domain_head); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index d8dbebb69f89..40aa61c7b522 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -773,7 +773,6 @@ struct arm_smmu_stream { struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; - struct arm_smmu_domain *domain; struct list_head domain_head; struct arm_smmu_stream *streams; /* Locked by the iommu core using the group mutex */
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit ae91f6552c301e5e8569667e9d5440d5f75a90c4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The SVA code only works if the RID domain is a S1 domain and has already installed the cdtable.
Originally the check for this was in arm_smmu_sva_bind() but when the op was removed the test didn't get copied over to the new arm_smmu_sva_set_dev_pasid().
Without the test wrong usage usually will hit a WARN_ON() in arm_smmu_write_ctx_desc() due to a missing ctx table.
However, the next patches wil change things so that an IDENTITY domain is not a struct arm_smmu_domain and this will get into memory corruption if the struct is wrongly casted.
Fail in arm_smmu_sva_set_dev_pasid() if the STE does not have a S1, which is a proxy for the STE having a pointer to the CD table. Write it in a way that will be compatible with the next patches.
Fixes: 386fa64fd52b ("arm-smmu-v3/sva: Add SVA domain support") Reported-by: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Closes: https://lore.kernel.org/linux-iommu/2a828e481416405fb3a4cceb9e075a59@huawei.... Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/11-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index fd93039038a6..53c06c85b0e9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -365,7 +365,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_domain *smmu_domain; + + if (!(domain->type & __IOMMU_DOMAIN_PAGING)) + return -ENODEV; + smmu_domain = to_smmu_domain(domain); + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -ENODEV;
if (!master || !master->sva_enabled) return -ENODEV;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 12dacfb5b938cdd90ced0109165eee9cb27061d9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Move to the new static global for identity domains. Move all the logic out of arm_smmu_attach_dev into an identity only function.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/12-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 +++++++++++++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - 2 files changed, 58 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d01c7a860c9c..c9f988b4703a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2421,8 +2421,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) return arm_smmu_sva_domain_alloc();
if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_IDENTITY) + type != IOMMU_DOMAIN_DMA) return NULL;
/* @@ -2531,11 +2530,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu;
- if (domain->type == IOMMU_DOMAIN_IDENTITY) { - smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS; - return 0; - } - /* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) smmu_domain->stage = ARM_SMMU_DOMAIN_S2; @@ -2745,7 +2739,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) struct arm_smmu_domain *smmu_domain; unsigned long flags;
- if (!domain) + if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) return;
smmu_domain = to_smmu_domain(domain); @@ -2808,15 +2802,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
arm_smmu_detach_dev(master);
- /* - * The SMMU does not support enabling ATS with bypass. When the STE is - * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and - * Translated transactions are denied as though ATS is disabled for the - * stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and - * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). - */ - if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS) - master->ats_enabled = arm_smmu_ats_supported(master); + master->ats_enabled = arm_smmu_ats_supported(master);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_add(&master->domain_head, &smmu_domain->devices); @@ -2853,13 +2839,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); break; - case ARM_SMMU_DOMAIN_BYPASS: - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); - break; }
arm_smmu_enable_ats(master, smmu_domain); @@ -2875,6 +2854,60 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return ret; }
+static int arm_smmu_attach_dev_ste(struct device *dev, + struct arm_smmu_ste *ste) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + if (arm_smmu_master_sva_enabled(master)) + return -EBUSY; + + /* + * Do not allow any ASID to be changed while are working on the STE, + * otherwise we could miss invalidations. + */ + mutex_lock(&arm_smmu_asid_lock); + + /* + * The SMMU does not support enabling ATS with bypass/abort. When the + * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests + * and Translated transactions are denied as though ATS is disabled for + * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and + * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). + */ + arm_smmu_detach_dev(master); + + arm_smmu_install_ste_for_dev(master, ste); + mutex_unlock(&arm_smmu_asid_lock); + + /* + * This has to be done after removing the master from the + * arm_smmu_domain->devices to avoid races updating the same context + * descriptor from arm_smmu_share_asid(). + */ + if (master->cd_table.cdtab) + arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); + return 0; +} + +static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_ste ste; + + arm_smmu_make_bypass_ste(&ste); + return arm_smmu_attach_dev_ste(dev, &ste); +} + +static const struct iommu_domain_ops arm_smmu_identity_ops = { + .attach_dev = arm_smmu_attach_dev_identity, +}; + +static struct iommu_domain arm_smmu_identity_domain = { + .type = IOMMU_DOMAIN_IDENTITY, + .ops = &arm_smmu_identity_ops, +}; + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3494,6 +3527,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) }
static struct iommu_ops arm_smmu_ops = { + .identity_domain = &arm_smmu_identity_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 40aa61c7b522..8f16c3560913 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -790,7 +790,6 @@ struct arm_smmu_master { enum arm_smmu_domain_stage { ARM_SMMU_DOMAIN_S1 = 0, ARM_SMMU_DOMAIN_S2, - ARM_SMMU_DOMAIN_BYPASS, };
struct arm_smmu_domain {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 352bd64cd8288c5c6808735d52a75809dfef8635 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Using the same design as the IDENTITY domain install an STRTAB_STE_0_CFG_ABORT STE.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/13-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index c9f988b4703a..cc6104fe4f5d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2908,6 +2908,24 @@ static struct iommu_domain arm_smmu_identity_domain = { .ops = &arm_smmu_identity_ops, };
+static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_ste ste; + + arm_smmu_make_abort_ste(&ste); + return arm_smmu_attach_dev_ste(dev, &ste); +} + +static const struct iommu_domain_ops arm_smmu_blocked_ops = { + .attach_dev = arm_smmu_attach_dev_blocked, +}; + +static struct iommu_domain arm_smmu_blocked_domain = { + .type = IOMMU_DOMAIN_BLOCKED, + .ops = &arm_smmu_blocked_ops, +}; + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3528,6 +3546,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, + .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit d36464f40f29c984984168ea89e08629bdba41df category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Consolidate some more code by having release call arm_smmu_attach_dev_identity/blocked() instead of open coding this.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/14-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index cc6104fe4f5d..07a0bf5ca188 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3180,19 +3180,16 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) static void arm_smmu_release_device(struct device *dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct arm_smmu_ste target;
if (WARN_ON(arm_smmu_master_sva_enabled(master))) iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
/* Put the STE back to what arm_smmu_init_strtab() sets */ if (disable_bypass && !dev->iommu->require_direct) - arm_smmu_make_abort_ste(&target); + arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev); else - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); + arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev);
- arm_smmu_detach_dev(master); arm_smmu_disable_pasid(master); arm_smmu_remove_master(master); if (master->cd_table.cdtab)
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit d8cd200609cf6a404cda73794f0c8c4fd74c568c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Instead of putting container_of() casts in the internals, use the proper type in this call chain. This makes it easier to check that the two global static domains are not leaking into call chains they should not.
Passing the smmu avoids the only caller from having to set it and unset it in the error path.
Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/15-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
--------------------------------
Conflicts: virtcca_smmu_set_stage --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 36 +++++++++++---------- 1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 07a0bf5ca188..7ef4d2120072 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -152,6 +152,9 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, };
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_device *smmu); + static void parse_driver_options(struct arm_smmu_device *smmu) { int i = 0; @@ -2463,12 +2466,12 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) kfree(smmu_domain); }
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, +static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg) { int ret; u32 asid; - struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
@@ -2501,11 +2504,11 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, return ret; }
-static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, +static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg) { int vmid; - struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
/* Reserve VMID 0 for stage-2 bypass STEs */ @@ -2518,17 +2521,17 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, return 0; }
-static int arm_smmu_domain_finalise(struct iommu_domain *domain) +static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_device *smmu) { int ret; unsigned long ias, oas; enum io_pgtable_fmt fmt; struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; - int (*finalise_stage_fn)(struct arm_smmu_domain *, - struct io_pgtable_cfg *); - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - struct arm_smmu_device *smmu = smmu_domain->smmu; + int (*finalise_stage_fn)(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg);
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2537,6 +2540,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
#ifdef CONFIG_HISI_VIRTCCA_CODA + struct iommu_domain *domain = &smmu_domain->domain; virtcca_smmu_set_stage(domain, smmu_domain); #endif
@@ -2579,17 +2583,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) if (!pgtbl_ops) return -ENOMEM;
- domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; - domain->geometry.force_aperture = true; + smmu_domain->domain.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; + smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; + smmu_domain->domain.geometry.force_aperture = true;
- ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg); + ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg); if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); return ret; }
smmu_domain->pgtbl_ops = pgtbl_ops; + smmu_domain->smmu = smmu; return 0; }
@@ -2781,10 +2786,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { - smmu_domain->smmu = smmu; - ret = arm_smmu_domain_finalise(domain); - if (ret) - smmu_domain->smmu = NULL; + ret = arm_smmu_domain_finalise(smmu_domain, smmu); } else if (smmu_domain->smmu != smmu) ret = -EINVAL;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 327e10b47ae99f76ac53f0b8b73a0539f390d2d2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now that the BLOCKED and IDENTITY behaviors are managed with their own domains change to the domain_alloc_paging() op.
For now SVA remains using the old interface, eventually it will get its own op that can pass in the device and mm_struct which will let us have a sane lifetime for the mmu_notifier.
Call arm_smmu_domain_finalise() early if dev is available.
Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/16-v6-96275f25c39d+2d4-smmuv3_newapi_p1_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 7ef4d2120072..a45dfc7aeea9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2418,14 +2418,15 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) { - struct arm_smmu_domain *smmu_domain;
if (type == IOMMU_DOMAIN_SVA) return arm_smmu_sva_domain_alloc(); + return ERR_PTR(-EOPNOTSUPP); +}
- if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA) - return NULL; +static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) +{ + struct arm_smmu_domain *smmu_domain;
/* * Allocate the domain and initialise some of its data structures. @@ -2434,13 +2435,23 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) */ smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); if (!smmu_domain) - return NULL; + return ERR_PTR(-ENOMEM);
mutex_init(&smmu_domain->init_mutex); INIT_LIST_HEAD(&smmu_domain->devices); spin_lock_init(&smmu_domain->devices_lock); INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+ if (dev) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + int ret; + + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + if (ret) { + kfree(smmu_domain); + return ERR_PTR(ret); + } + } return &smmu_domain->domain; }
@@ -3548,6 +3559,7 @@ static struct iommu_ops arm_smmu_ops = { .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, + .domain_alloc_paging = arm_smmu_domain_alloc_paging, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group,
From: Mostafa Saleh smostafa@google.com
mainline inclusion from mainline-v6.9-rc2 commit ec9098d6bffea6e82d63640134c123a3d96e0781 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
STE attributes(NSCFG, PRIVCFG, INSTCFG) use value 0 for "Use Icomming", for some reason SHCFG doesn't follow that, and it is defined as "0b01".
Currently the driver sets SHCFG to Use Incoming for stage-2 and bypass domains.
However according to the User Manual (ARM IHI 0070 F.b): When SMMU_IDR1.ATTR_TYPES_OVR == 0, this field is RES0 and the incoming Shareability attribute is used.
This patch adds a condition for writing SHCFG to Use incoming to be compliant with the architecture, and defines ATTR_TYPE_OVR as a new feature discovered from IDR1. This also required to propagate the SMMU through some functions args.
There is no need to add similar condition for the newly introduced function arm_smmu_get_ste_used() as the values of the STE are the same before and after any transition, so this will not trigger any change. (we already do the same for the VMID).
Although this is a misconfiguration from the driver, this has been there for a long time, so probably no HW running Linux is affected by it.
Reported-by: Will Deacon will@kernel.org Closes: https://lore.kernel.org/all/20240215134952.GA690@willie-the-truck/
Signed-off-by: Mostafa Saleh smostafa@google.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20240323134658.464743-1-smostafa@google.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
--------------------------------
Conflicts: ARM_SMMU_FEAT_* bit --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 ++++++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 +++++---- 2 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a45dfc7aeea9..5771e1ecc047 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1687,14 +1687,17 @@ static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); }
-static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target) +static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, + struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( STRTAB_STE_0_V | FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS)); - target->data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); + + if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) + target->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); }
static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, @@ -1757,6 +1760,7 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; u64 vtcr_val; + struct arm_smmu_device *smmu = master->smmu;
memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1765,9 +1769,11 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
target->data[1] = cpu_to_le64( FIELD_PREP(STRTAB_STE_1_EATS, - master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) | - FIELD_PREP(STRTAB_STE_1_SHCFG, - STRTAB_STE_1_SHCFG_INCOMING)); + master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + + if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) + target->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING));
vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | @@ -1794,7 +1800,8 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, * This can safely directly manipulate the STE memory without a sync sequence * because the STE table has not been installed in the SMMU yet. */ -static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, +static void arm_smmu_init_initial_stes(struct arm_smmu_device *smmu, + struct arm_smmu_ste *strtab, unsigned int nent) { unsigned int i; @@ -1803,7 +1810,7 @@ static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, if (disable_bypass) arm_smmu_make_abort_ste(strtab); else - arm_smmu_make_bypass_ste(strtab); + arm_smmu_make_bypass_ste(smmu, strtab); strtab++; } } @@ -1831,7 +1838,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); + arm_smmu_init_initial_stes(smmu, desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); return 0; } @@ -2907,8 +2914,9 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct device *dev) { struct arm_smmu_ste ste; + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
- arm_smmu_make_bypass_ste(&ste); + arm_smmu_make_bypass_ste(master->smmu, &ste); return arm_smmu_attach_dev_ste(dev, &ste); }
@@ -3841,7 +3849,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); cfg->strtab_base_cfg = reg;
- arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); + arm_smmu_init_initial_stes(smmu, strtab, cfg->num_l1_ents); return 0; }
@@ -4678,6 +4686,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) smmu->features |= ARM_SMMU_FEAT_ECMDQ; #endif
+ if (reg & IDR1_ATTR_TYPES_OVR) + smmu->features |= ARM_SMMU_FEAT_ATTR_TYPES_OVR; + /* Queue sizes, capped to ensure natural alignment */ smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT, FIELD_GET(IDR1_CMDQS, reg)); @@ -5031,7 +5042,7 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) * STE table is not programmed to HW, see * arm_smmu_initial_bypass_stes() */ - arm_smmu_make_bypass_ste( + arm_smmu_make_bypass_ste(smmu, arm_smmu_get_step_for_sid(smmu, rmr->sids[i])); } } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 8f16c3560913..76487e5d55cc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -52,6 +52,7 @@ #define IDR1_TABLES_PRESET (1 << 30) #define IDR1_QUEUES_PRESET (1 << 29) #define IDR1_REL (1 << 28) +#define IDR1_ATTR_TYPES_OVR (1 << 27) #define IDR1_CMDQS GENMASK(25, 21) #define IDR1_EVTQS GENMASK(20, 16) #define IDR1_PRIQS GENMASK(15, 11) @@ -698,12 +699,13 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_BTM (1 << 16) #define ARM_SMMU_FEAT_SVA (1 << 17) #define ARM_SMMU_FEAT_E2H (1 << 18) -#define ARM_SMMU_FEAT_HA (1 << 19) -#define ARM_SMMU_FEAT_HD (1 << 20) -#define ARM_SMMU_FEAT_NESTING (1 << 21) -#define ARM_SMMU_FEAT_BBML1 (1 << 22) -#define ARM_SMMU_FEAT_BBML2 (1 << 23) -#define ARM_SMMU_FEAT_ECMDQ (1 << 24) +#define ARM_SMMU_FEAT_NESTING (1 << 19) +#define ARM_SMMU_FEAT_ATTR_TYPES_OVR (1 << 20) +#define ARM_SMMU_FEAT_HA (1 << 21) +#define ARM_SMMU_FEAT_HD (1 << 22) +#define ARM_SMMU_FEAT_BBML1 (1 << 23) +#define ARM_SMMU_FEAT_BBML2 (1 << 24) +#define ARM_SMMU_FEAT_ECMDQ (1 << 25) u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
From: Vasant Hegde vasant.hegde@amd.com
mainline inclusion from mainline-v6.9-rc1 commit bf8aff2945ba4091f503df673b9df33002546e6a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add function to check iommu group mutex lock. So that device drivers can rely on group mutex lock instead of adding another driver level lock before modifying driver specific device data structure.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20240205115615.6053-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel jroedel@suse.de --- drivers/iommu/iommu.c | 19 +++++++++++++++++++ include/linux/iommu.h | 8 ++++++++ 2 files changed, 27 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 28f63ad432de..b9c274820882 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1262,6 +1262,25 @@ void iommu_group_remove_device(struct device *dev) } EXPORT_SYMBOL_GPL(iommu_group_remove_device);
+#if IS_ENABLED(CONFIG_LOCKDEP) && IS_ENABLED(CONFIG_IOMMU_API) +/** + * iommu_group_mutex_assert - Check device group mutex lock + * @dev: the device that has group param set + * + * This function is called by an iommu driver to check whether it holds + * group mutex lock for the given device or not. + * + * Note that this function must be called after device group param is set. + */ +void iommu_group_mutex_assert(struct device *dev) +{ + struct iommu_group *group = dev->iommu_group; + + lockdep_assert_held(&group->mutex); +} +EXPORT_SYMBOL_GPL(iommu_group_mutex_assert); +#endif + static struct device *iommu_group_first_dev(struct iommu_group *group) { lockdep_assert_held(&group->mutex); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f7c3f7b839c0..f5ea2ef21dd0 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1557,6 +1557,14 @@ static inline ioasid_t iommu_alloc_global_pasid(struct device *dev) static inline void iommu_free_global_pasid(ioasid_t pasid) {} #endif /* CONFIG_IOMMU_API */
+#if IS_ENABLED(CONFIG_LOCKDEP) && IS_ENABLED(CONFIG_IOMMU_API) +void iommu_group_mutex_assert(struct device *dev); +#else +static inline void iommu_group_mutex_assert(struct device *dev) +{ +} +#endif + /** * iommu_map_sgtable - Map the given buffer to the IOMMU domain * @domain: The IOMMU domain to perform the mapping
From: Yi Liu yi.l.liu@intel.com
mainline inclusion from mainline-v6.10-rc1 commit b025dea63cded0d82bccd591fa105d39efc6435d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
There is no error handling now in __iommu_set_group_pasid(), it relies on its caller to loop all the devices to undo the pasid attachment. This is not self-contained and has drawbacks. It would result in unnecessary remove_dev_pasid() calls on the devices that have not been attached to the new domain. But the remove_dev_pasid() callback would get the new domain from the group->pasid_array. So for such devices, the iommu driver won't find the attachment under the domain, hence unable to do cleanup. This may not be a real problem today. But it depends on the implementation of the underlying iommu driver. e.g. the intel iommu driver would warn for such devices. Such warnings are unnecessary.
To solve the above problem, it is necessary to handle the error within __iommu_set_group_pasid(). It only loops the devices that have attached to the new domain, and undo it.
Fixes: 16603704559c ("iommu: Add attach/detach_dev_pasid iommu interfaces") Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240328122958.83332-2-yi.l.liu@intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommu.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index b9c274820882..54caa60b431f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3523,15 +3523,26 @@ EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed); static int __iommu_set_group_pasid(struct iommu_domain *domain, struct iommu_group *group, ioasid_t pasid) { - struct group_device *device; - int ret = 0; + struct group_device *device, *last_gdev; + int ret;
for_each_group_device(group, device) { ret = domain->ops->set_dev_pasid(domain, device->dev, pasid); if (ret) - break; + goto err_revert; }
+ return 0; + +err_revert: + last_gdev = device; + for_each_group_device(group, device) { + const struct iommu_ops *ops = dev_iommu_ops(device->dev); + + if (device == last_gdev) + break; + ops->remove_dev_pasid(device->dev, pasid); + } return ret; }
@@ -3580,10 +3591,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, }
ret = __iommu_set_group_pasid(domain, group, pasid); - if (ret) { - __iommu_remove_group_pasid(group, pasid); + if (ret) xa_erase(&group->pasid_array, pasid); - } out_unlock: mutex_unlock(&group->mutex); return ret;
From: Yi Liu yi.l.liu@intel.com
mainline inclusion from mainline-v6.10-rc1 commit d2f85a263883b679f87ed8f911746105658e9c47 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Existing remove_dev_pasid() callbacks of the underlying iommu drivers get the attached domain from the group->pasid_array. However, the domain stored in group->pasid_array is not always correct in all scenarios. A wrong domain may result in failure in remove_dev_pasid() callback. To avoid such problems, it is more reliable to pass the domain to the remove_dev_pasid() op.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240328122958.83332-3-yi.l.liu@intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 ++------- drivers/iommu/intel/iommu.c | 11 +++-------- drivers/iommu/iommu.c | 9 +++++---- include/linux/iommu.h | 3 ++- 4 files changed, 12 insertions(+), 20 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5771e1ecc047..969e9aa56f94 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3551,14 +3551,9 @@ static int arm_smmu_def_domain_type(struct device *dev) return ret; }
-static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain) { - struct iommu_domain *domain; - - domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); - if (WARN_ON(IS_ERR(domain)) || !domain) - return; - arm_smmu_sva_remove_dev_pasid(domain, dev, pasid); }
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index dc9526ce91be..81a74cd47208 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4658,19 +4658,15 @@ static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, return 0; }
-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain) { struct device_domain_info *info = dev_iommu_priv_get(dev); + struct dmar_domain *dmar_domain = to_dmar_domain(domain); struct dev_pasid_info *curr, *dev_pasid = NULL; struct intel_iommu *iommu = info->iommu; - struct dmar_domain *dmar_domain; - struct iommu_domain *domain; unsigned long flags;
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); - if (WARN_ON_ONCE(!domain)) - goto out_tear_down; - /* * The SVA implementation needs to handle its own stuffs like the mm * notification. Before consolidating that code into iommu core, let @@ -4681,7 +4677,6 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) goto out_tear_down; }
- dmar_domain = to_dmar_domain(domain); spin_lock_irqsave(&dmar_domain->lock, flags); list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) { if (curr->dev == dev && curr->pasid == pasid) { diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 54caa60b431f..8bc9743c03ef 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3541,20 +3541,21 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
if (device == last_gdev) break; - ops->remove_dev_pasid(device->dev, pasid); + ops->remove_dev_pasid(device->dev, pasid, domain); } return ret; }
static void __iommu_remove_group_pasid(struct iommu_group *group, - ioasid_t pasid) + ioasid_t pasid, + struct iommu_domain *domain) { struct group_device *device; const struct iommu_ops *ops;
for_each_group_device(group, device) { ops = dev_iommu_ops(device->dev); - ops->remove_dev_pasid(device->dev, pasid); + ops->remove_dev_pasid(device->dev, pasid, domain); } }
@@ -3615,7 +3616,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, struct iommu_group *group = dev->iommu_group;
mutex_lock(&group->mutex); - __iommu_remove_group_pasid(group, pasid); + __iommu_remove_group_pasid(group, pasid, domain); WARN_ON(xa_erase(&group->pasid_array, pasid) != domain); mutex_unlock(&group->mutex); } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f5ea2ef21dd0..4ad69f10fffd 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -630,7 +630,8 @@ struct iommu_ops { struct iommu_page_response *msg);
int (*def_domain_type)(struct device *dev); - void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid); + void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain);
const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc2 commit 0493e739ccc60a3e0870847f1a12d6d79b86a1fc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
STRTAB_STE_0_V is a CPU value, it needs conversion for sparse to be clean.
The missing annotation was a mistake introduced by splitting the ops out from the STE writer.
Fixes: 7da51af9125c ("iommu/arm-smmu-v3: Make STE programming independent of the callers") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202403011441.5WqGrYjp-lkp@intel.com/ Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/0-v1-98b23ebb0c84+9f-smmu_cputole_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 969e9aa56f94..e0193cfe433d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1363,7 +1363,8 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * requires a breaking update, zero the V bit, write all qwords * but 0, then set qword 0 */ - unused_update.data[0] = entry->data[0] & (~STRTAB_STE_0_V); + unused_update.data[0] = entry->data[0] & + cpu_to_le64(~STRTAB_STE_0_V); entry_set(smmu, sid, entry, &unused_update, 0, 1); entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); entry_set(smmu, sid, entry, target, 0, 1);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit c404f55c26fc23c70a9f2262f3f36a69fc46289b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The SVA code checks that the PASID is valid for the device when assigning the PASID to the MM, but the normal PAGING related path does not check it.
Devices that don't support PASID or PASID values too large for the device should not invoke the driver callback. The drivers should rely on the core code for this enforcement.
Fixes: 16603704559c7a68 ("iommu: Add attach/detach_dev_pasid iommu interfaces") Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Yi Liu yi.l.liu@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/0-v1-460705442b30+659-iommu_check_pasid_jgg@nvidia... Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommu.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8bc9743c03ef..ddcf0a3c03d3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3572,6 +3572,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; + struct group_device *device; void *curr; int ret;
@@ -3581,10 +3582,18 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, if (!group) return -ENODEV;
- if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) + if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner || + pasid == IOMMU_NO_PASID) return -EINVAL;
mutex_lock(&group->mutex); + for_each_group_device(group, device) { + if (pasid >= device->dev->iommu->max_pasids) { + ret = -EINVAL; + goto out_unlock; + } + } + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); if (curr) { ret = xa_err(curr) ? : -EBUSY;
From: Robin Murphy robin.murphy@arm.com
mainline inclusion from mainline-v6.10-rc1 commit 734554fdfce6731b22f0777ec3f1e4a853354883 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The disable_bypass parameter has been mostly meaningless for a long time since the introduction of default domains. Its original intent is now fulfilled by the controls users have over the default domain type, and its remaining effect in the brief window between Stream Table initialisation and default domain creation hardly seems worth the complication. Furthermore, thanks to 2-level Stream Tables, disabling disable_bypass (there's another reason not to like it right there) has never guaranteed that any particular StreamID *will* bypass anyway - any device which might actually care about that wants RMRs - so there's not really much lost by taking away that option (which has already been non-default for nearing 6 years now).
As part of this, also remove the weird behaviour where we "successfully" probe and register a non-functional SMMU if the DT "#iommu-cells" property is wrong. I have no memory of what possessed me to think that was a good idea at the time, and by now I suspect it's likely to break things worse than simply failing probe would.
Signed-off-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Mostafa Saleh smostafa@google.com Link: https://lore.kernel.org/r/ea3ac4cd595a81b5511729601b2f7d4668178438.171233592... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
--------------------------------
Conflicts: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c fce2deba0d2d4afd99cdaa700b0eacc035e71de0 (iommu/arm-smmu-v3: Add suspend and resume support) ed007ade0cef81fb47328683b8a7eec5edfa8494 (virtcca feature: secure smmu init) --- drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c | 24 +++-------- drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.h | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 ++++++------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 - 4 files changed, 19 insertions(+), 52 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c index 34e0060420af..29d4c0c71041 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c @@ -608,25 +608,14 @@ EXPORT_SYMBOL_GPL(virtcca_smmu_map_init); * arm_s_smmu_device_enable - Enable the smmu secure state * @smmu: An SMMUv3 instance * @enables: The smmu attribute need to enable - * @bypass: Bypass smmu - * @disable_bypass: Global bypass smmu */ static void arm_s_smmu_device_enable(struct arm_smmu_device *smmu, - u32 enables, bool bypass, bool disable_bypass) + u32 enables) { int ret = 0;
- /* Enable the SMMU interface, or ensure bypass */ - if (!bypass || disable_bypass) { - enables |= CR0_SMMUEN; - } else { - ret = virtcca_smmu_update_gbpa(smmu, 0, S_GBPA_ABORT); - if (ret) { - dev_err(smmu->dev, "S_SMMU: failed to update s gbpa!\n"); - smmu->s_smmu_id = ARM_S_SMMU_INVALID_ID; - return; - } - } + /* Enable the SMMU interface */ + enables |= CR0_SMMUEN; /* Mask BIT1 and BIT4 which are RES0 in SMMU_S_CRO */ ret = virtcca_smmu_write_reg_sync(smmu, enables & ~SMMU_S_CR0_RESERVED, enables & ~SMMU_S_CR0_RESERVED, ARM_SMMU_S_CR0, ARM_SMMU_S_CR0ACK); @@ -680,10 +669,9 @@ static bool arm_s_smmu_idr1_support_secure(struct arm_smmu_device *smmu) * @smmu: An SMMUv3 instance * @ioaddr: SMMU address * @resume: Resume or not - * @disable_bypass: Global disable smmu bypass */ void virtcca_smmu_device_init(struct platform_device *pdev, struct arm_smmu_device *smmu, - resource_size_t ioaddr, bool resume, bool disable_bypass) + resource_size_t ioaddr, bool resume) { u64 rv; int ret, irq; @@ -843,6 +831,6 @@ void virtcca_smmu_device_init(struct platform_device *pdev, struct arm_smmu_devi /* Enable the secure irqs */ virtcca_smmu_setup_irqs(smmu, resume);
- /* Enable the secure smmu interface, or ensure bypass */ - arm_s_smmu_device_enable(smmu, enables, smmu->bypass, disable_bypass); + /* Enable the secure smmu interface */ + arm_s_smmu_device_enable(smmu, enables); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.h index 3d1786db7508..997c45a4f99a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.h @@ -165,7 +165,7 @@ void virtcca_smmu_cmdq_write_entries(struct arm_smmu_device *smmu, u64 *cmds, bool virtcca_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg); void _arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg); void virtcca_smmu_device_init(struct platform_device *pdev, - struct arm_smmu_device *smmu, resource_size_t ioaddr, bool resume, bool disable_bypass); + struct arm_smmu_device *smmu, resource_size_t ioaddr, bool resume);
static inline bool virtcca_smmu_enable(struct arm_smmu_device *smmu) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e0193cfe433d..f0768df45d75 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -34,11 +34,6 @@ #include "arm-s-smmu-v3.h" #endif
-static bool disable_bypass = true; -module_param(disable_bypass, bool, 0444); -MODULE_PARM_DESC(disable_bypass, - "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); - static bool disable_msipolling; module_param(disable_msipolling, bool, 0444); MODULE_PARM_DESC(disable_msipolling, @@ -1801,17 +1796,13 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, * This can safely directly manipulate the STE memory without a sync sequence * because the STE table has not been installed in the SMMU yet. */ -static void arm_smmu_init_initial_stes(struct arm_smmu_device *smmu, - struct arm_smmu_ste *strtab, +static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, unsigned int nent) { unsigned int i;
for (i = 0; i < nent; ++i) { - if (disable_bypass) - arm_smmu_make_abort_ste(strtab); - else - arm_smmu_make_bypass_ste(smmu, strtab); + arm_smmu_make_abort_ste(strtab); strtab++; } } @@ -1839,7 +1830,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_initial_stes(smmu, desc->l2ptr, 1 << STRTAB_SPLIT); + arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); return 0; } @@ -3207,10 +3198,10 @@ static void arm_smmu_release_device(struct device *dev) iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
/* Put the STE back to what arm_smmu_init_strtab() sets */ - if (disable_bypass && !dev->iommu->require_direct) - arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev); - else + if (dev->iommu->require_direct) arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev); + else + arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev);
arm_smmu_disable_pasid(master); arm_smmu_remove_master(master); @@ -3845,7 +3836,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); cfg->strtab_base_cfg = reg;
- arm_smmu_init_initial_stes(smmu, strtab, cfg->num_l1_ents); + arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); return 0; }
@@ -4203,7 +4194,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume) reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); if (reg & CR0_SMMUEN) { dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); - WARN_ON(is_kdump_kernel() && !disable_bypass); arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); }
@@ -4312,14 +4302,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume) if (is_kdump_kernel()) enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
- /* Enable the SMMU interface, or ensure bypass */ - if (!smmu->bypass || disable_bypass) { - enables |= CR0_SMMUEN; - } else { - ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT); - if (ret) - return ret; - } + /* Enable the SMMU interface */ + enables |= CR0_SMMUEN; ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK); if (ret) { @@ -5063,12 +5047,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev) ret = arm_smmu_device_dt_probe(pdev, smmu); } else { ret = arm_smmu_device_acpi_probe(pdev, smmu); - if (ret == -ENODEV) - return ret; } - - /* Set bypass mode according to firmware probing result */ - smmu->bypass = !!ret; + if (ret) + return ret;
/* Base address */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -5137,7 +5118,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return ret;
#ifdef CONFIG_HISI_VIRTCCA_CODA - virtcca_smmu_device_init(pdev, smmu, ioaddr, false, disable_bypass); + virtcca_smmu_device_init(pdev, smmu, ioaddr, false); #endif
/* And we're up. Go go go! */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 76487e5d55cc..e4864337054d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -755,8 +755,6 @@ struct arm_smmu_device { struct rb_root streams; struct mutex streams_mutex;
- bool bypass; - #ifdef CONFIG_HISI_VIRTCCA_CODA int s_evtq_irq; int s_gerr_irq;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit fdc69d39e77f88264ee6e8174ff9aaf0953aecd9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The SVA code is wired to assume that the SVA is programmed onto the mm->pasid. The current core code always does this, so it is fine.
Add a check for clarity.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 53c06c85b0e9..15ba13078a10 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -571,6 +571,9 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, int ret = 0; struct mm_struct *mm = domain->mm;
+ if (mm_get_enqcmd_pasid(mm) != id) + return -EINVAL; + mutex_lock(&sva_lock); ret = __arm_smmu_sva_bind(dev, id, mm); mutex_unlock(&sva_lock);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit 86e5ca098dd9f8c5b80a6205395aea0535018837 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
At this point we know which master we are going to change the PCI config on, this is the only device we need to invalidate. Switch arm_smmu_atc_inv_domain() for arm_smmu_atc_inv_master().
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Moritz Fischer moritzf@google.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f0768df45d75..25ba51bbadad 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2680,7 +2680,10 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master, pdev = to_pci_dev(master->dev);
atomic_inc(&smmu_domain->nr_ats_masters); - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, 0, 0); + /* + * ATC invalidation of PASID 0 causes the entire ATC to be flushed. + */ + arm_smmu_atc_inv_master(master); if (pci_enable_ats(pdev, stu)) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit e8e4398d53f98be7ac48e0bda9ea6e26df24136d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Instead of passing a naked __le16 * around to represent a CD table entry wrap it in a "struct arm_smmu_cd" with an array of the correct size. This makes it much clearer which functions will comprise the "CD API".
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia... Signed-off-by: Will Deacon will@kernel.org signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 ++++++++++----------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++++- 2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 25ba51bbadad..f044332e2d8b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1436,7 +1436,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) +static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { __le64 *l1ptr; unsigned int idx; @@ -1445,7 +1446,8 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) - return cd_table->cdtab + ssid * CTXDESC_CD_DWORDS; + return (struct arm_smmu_cd *)(cd_table->cdtab + + ssid * CTXDESC_CD_DWORDS);
idx = ssid >> CTXDESC_SPLIT; l1_desc = &cd_table->l1_desc[idx]; @@ -1459,7 +1461,7 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) arm_smmu_sync_cd(master, ssid, false); } idx = ssid & (CTXDESC_L2_ENTRIES - 1); - return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; + return &l1_desc->l2ptr[idx]; }
int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, @@ -1478,7 +1480,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, */ u64 val; bool cd_live; - __le64 *cdptr; + struct arm_smmu_cd *cdptr; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
@@ -1489,7 +1491,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (!cdptr) return -ENOMEM;
- val = le64_to_cpu(cdptr[0]); + val = le64_to_cpu(cdptr->data[0]); cd_live = !!(val & CTXDESC_CD_0_V);
if (!cd) { /* (5) */ @@ -1506,16 +1508,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, * this substream's traffic */ } else { /* (1) and (2) */ - u64 tcr = cd->tcr; - - cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - cdptr[2] = 0; - cdptr[3] = cpu_to_le64(cd->mair); + cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); + cdptr->data[2] = 0; + cdptr->data[3] = cpu_to_le64(cd->mair);
if (!(smmu->features & ARM_SMMU_FEAT_HD)) - tcr &= ~CTXDESC_CD_0_TCR_HD; + cd->tcr &= ~CTXDESC_CD_0_TCR_HD; if (!(smmu->features & ARM_SMMU_FEAT_HA)) - tcr &= ~CTXDESC_CD_0_TCR_HA; + cd->tcr &= ~CTXDESC_CD_0_TCR_HA;
/* * STE may be live, and the SMMU might read dwords of this CD in any @@ -1524,7 +1524,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, */ arm_smmu_sync_cd(master, ssid, true);
- val = tcr | + val = cd->tcr | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | #endif @@ -1547,7 +1547,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, * field within an aligned 64-bit span of a structure can be altered * without first making the structure invalid. */ - WRITE_ONCE(cdptr[0], cpu_to_le64(val)); + WRITE_ONCE(cdptr->data[0], cpu_to_le64(val)); arm_smmu_sync_cd(master, ssid, true); return 0; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index e4864337054d..f638c24a3c54 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -318,6 +318,11 @@ struct arm_smmu_ste { #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
#define CTXDESC_CD_DWORDS 8 + +struct arm_smmu_cd { + __le64 data[CTXDESC_CD_DWORDS]; +}; + #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6) #define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8) @@ -644,7 +649,7 @@ struct arm_smmu_ctx_desc { };
struct arm_smmu_l1_ctx_desc { - __le64 *l2ptr; + struct arm_smmu_cd *l2ptr; dma_addr_t l2ptr_dma; };
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit 65442507026a79e2de8014880e6ba376b8b44584 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Make a new op that receives the device and the mm_struct that the SVA domain should be created for. Unlike domain_alloc_paging() the dev argument is never NULL here.
This allows drivers to fully initialize the SVA domain and allocate the mmu_notifier during allocation. It allows the notifier lifetime to follow the lifetime of the iommu_domain.
Since we have only one call site, upgrade the new op to return ERR_PTR instead of NULL.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Tina Zhang tina.zhang@intel.com Link: https://lore.kernel.org/r/20240311090843.133455-15-vasant.hegde@amd.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240416080656.60968-12-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommu-sva.c | 16 +++++++++++----- include/linux/iommu.h | 3 +++ 2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 640acc804e8c..18a35e798b72 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -108,8 +108,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
/* Allocate a new domain and set it on device pasid. */ domain = iommu_sva_domain_alloc(dev, mm); - if (!domain) { - ret = -ENOMEM; + if (IS_ERR(domain)) { + ret = PTR_ERR(domain); goto out_free_handle; }
@@ -283,9 +283,15 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, const struct iommu_ops *ops = dev_iommu_ops(dev); struct iommu_domain *domain;
- domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); - if (!domain) - return NULL; + if (ops->domain_alloc_sva) { + domain = ops->domain_alloc_sva(dev, mm); + if (IS_ERR(domain)) + return domain; + } else { + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); + if (!domain) + return ERR_PTR(-ENOMEM); + }
domain->type = IOMMU_DOMAIN_SVA; mmgrab(mm); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 4ad69f10fffd..468e329da33f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -569,6 +569,7 @@ static inline int __iommu_copy_struct_from_user_array( * Upon failure, ERR_PTR must be returned. * @domain_alloc_paging: Allocate an iommu_domain that can be used for * UNMANAGED, DMA, and DMA_FQ domain types. + * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -609,6 +610,8 @@ struct iommu_ops { struct device *dev, u32 flags, struct iommu_domain *parent, const struct iommu_user_data *user_data); struct iommu_domain *(*domain_alloc_paging)(struct device *dev); + struct iommu_domain *(*domain_alloc_sva)(struct device *dev, + struct mm_struct *mm);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit de31c355541286aa4c938c982dfcafbf062fcb93 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Prepare to put the CD code into the same mechanism. Add an ops indirection around all the STE specific code and make the worker functions independent of the entry content being processed.
get_used and sync ops are provided to hook the correct code.
Signed-off-by: Michael Shavit mshavit@google.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 176 ++++++++++++-------- 1 file changed, 104 insertions(+), 72 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f044332e2d8b..93b79e667483 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -105,8 +105,19 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, - ioasid_t sid); +struct arm_smmu_entry_writer_ops; +struct arm_smmu_entry_writer { + const struct arm_smmu_entry_writer_ops *ops; + struct arm_smmu_master *master; +}; + +struct arm_smmu_entry_writer_ops { + void (*get_used)(const __le64 *entry, __le64 *used); + void (*sync)(struct arm_smmu_entry_writer *writer); +}; + +#define NUM_ENTRY_QWORDS 8 +static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { @@ -1196,43 +1207,42 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) * would be nice if this was complete according to the spec, but minimally it * has to capture the bits this driver uses. */ -static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, - struct arm_smmu_ste *used_bits) +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) { - unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0])); + unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
- used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V); - if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V))) + used_bits[0] = cpu_to_le64(STRTAB_STE_0_V); + if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V))) return;
- used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG); + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
/* S1 translates */ if (cfg & BIT(0)) { - used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | - STRTAB_STE_0_S1CTXPTR_MASK | - STRTAB_STE_0_S1CDMAX); - used_bits->data[1] |= + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | + STRTAB_STE_0_S1CTXPTR_MASK | + STRTAB_STE_0_S1CDMAX); + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | STRTAB_STE_1_EATS); - used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); }
/* S2 translates */ if (cfg & BIT(1)) { - used_bits->data[1] |= + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); - used_bits->data[2] |= + used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R); - used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); + used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); }
if (cfg == STRTAB_STE_0_CFG_BYPASS) - used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); }
/* @@ -1241,57 +1251,55 @@ static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, * unused_update is an intermediate value of entry that has unused bits set to * their new values. */ -static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target, - struct arm_smmu_ste *unused_update) +static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, + const __le64 *entry, const __le64 *target, + __le64 *unused_update) { - struct arm_smmu_ste target_used = {}; - struct arm_smmu_ste cur_used = {}; + __le64 target_used[NUM_ENTRY_QWORDS] = {}; + __le64 cur_used[NUM_ENTRY_QWORDS] = {}; u8 used_qword_diff = 0; unsigned int i;
- arm_smmu_get_ste_used(entry, &cur_used); - arm_smmu_get_ste_used(target, &target_used); + writer->ops->get_used(entry, cur_used); + writer->ops->get_used(target, target_used);
- for (i = 0; i != ARRAY_SIZE(target_used.data); i++) { + for (i = 0; i != NUM_ENTRY_QWORDS; i++) { /* * Check that masks are up to date, the make functions are not * allowed to set a bit to 1 if the used function doesn't say it * is used. */ - WARN_ON_ONCE(target->data[i] & ~target_used.data[i]); + WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */ - unused_update->data[i] = (entry->data[i] & cur_used.data[i]) | - (target->data[i] & ~cur_used.data[i]); + unused_update[i] = (entry[i] & cur_used[i]) | + (target[i] & ~cur_used[i]); /* * Each bit indicates that a used bit in a qword needs to be * changed after unused_update is applied. */ - if ((unused_update->data[i] & target_used.data[i]) != - target->data[i]) + if ((unused_update[i] & target_used[i]) != target[i]) used_qword_diff |= 1 << i; } return used_qword_diff; }
-static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, - struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target, unsigned int start, +static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry, + const __le64 *target, unsigned int start, unsigned int len) { bool changed = false; unsigned int i;
for (i = start; len != 0; len--, i++) { - if (entry->data[i] != target->data[i]) { - WRITE_ONCE(entry->data[i], target->data[i]); + if (entry[i] != target[i]) { + WRITE_ONCE(entry[i], target[i]); changed = true; } }
if (changed) - arm_smmu_sync_ste_for_sid(smmu, sid); + writer->ops->sync(writer); return changed; }
@@ -1321,24 +1329,21 @@ static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, * V=0 process. This relies on the IGNORED behavior described in the * specification. */ -static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, - struct arm_smmu_ste *entry, - const struct arm_smmu_ste *target) +static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, + __le64 *entry, const __le64 *target) { - unsigned int num_entry_qwords = ARRAY_SIZE(target->data); - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ste unused_update; + __le64 unused_update[NUM_ENTRY_QWORDS]; u8 used_qword_diff;
used_qword_diff = - arm_smmu_entry_qword_diff(entry, target, &unused_update); + arm_smmu_entry_qword_diff(writer, entry, target, unused_update); if (hweight8(used_qword_diff) == 1) { /* * Only one qword needs its used bits to be changed. This is a - * hitless update, update all bits the current STE is ignoring - * to their new values, then update a single "critical qword" to - * change the STE and finally 0 out any bits that are now unused - * in the target configuration. + * hitless update, update all bits the current STE/CD is + * ignoring to their new values, then update a single "critical + * qword" to change the STE/CD and finally 0 out any bits that + * are now unused in the target configuration. */ unsigned int critical_qword_index = ffs(used_qword_diff) - 1;
@@ -1347,22 +1352,21 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * writing it in the next step anyways. This can save a sync * when the only change is in that qword. */ - unused_update.data[critical_qword_index] = - entry->data[critical_qword_index]; - entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords); - entry_set(smmu, sid, entry, target, critical_qword_index, 1); - entry_set(smmu, sid, entry, target, 0, num_entry_qwords); + unused_update[critical_qword_index] = + entry[critical_qword_index]; + entry_set(writer, entry, unused_update, 0, NUM_ENTRY_QWORDS); + entry_set(writer, entry, target, critical_qword_index, 1); + entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS); } else if (used_qword_diff) { /* * At least two qwords need their inuse bits to be changed. This * requires a breaking update, zero the V bit, write all qwords * but 0, then set qword 0 */ - unused_update.data[0] = entry->data[0] & - cpu_to_le64(~STRTAB_STE_0_V); - entry_set(smmu, sid, entry, &unused_update, 0, 1); - entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); - entry_set(smmu, sid, entry, target, 0, 1); + unused_update[0] = 0; + entry_set(writer, entry, unused_update, 0, 1); + entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1); + entry_set(writer, entry, target, 0, 1); } else { /* * No inuse bit changed. Sanity check that all unused bits are 0 @@ -1370,18 +1374,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, * compute_qword_diff(). */ WARN_ON_ONCE( - entry_set(smmu, sid, entry, target, 0, num_entry_qwords)); - } - - /* It's likely that we'll want to use the new STE soon */ - if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { - struct arm_smmu_cmdq_ent - prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, - .prefetch = { - .sid = sid, - } }; - - arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS)); } }
@@ -1662,17 +1655,56 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) +struct arm_smmu_ste_writer { + struct arm_smmu_entry_writer writer; + u32 sid; +}; + +static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer) { + struct arm_smmu_ste_writer *ste_writer = + container_of(writer, struct arm_smmu_ste_writer, writer); struct arm_smmu_cmdq_ent cmd = { .opcode = CMDQ_OP_CFGI_STE, .cfgi = { - .sid = sid, + .sid = ste_writer->sid, .leaf = true, }, };
- arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd); +} + +static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = { + .sync = arm_smmu_ste_writer_sync_entry, + .get_used = arm_smmu_get_ste_used, +}; + +static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, + struct arm_smmu_ste *ste, + const struct arm_smmu_ste *target) +{ + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_ste_writer ste_writer = { + .writer = { + .ops = &arm_smmu_ste_writer_ops, + .master = master, + }, + .sid = sid, + }; + + arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data); + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { + struct arm_smmu_cmdq_ent + prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + } }; + + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + } }
static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit e2279e89b9e38042b301b2c06045e915089b36e2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
CD table entries and STE's have the same essential programming sequence, just with different types. Use the new ops indirection to link CD programming to the common writer.
In a few more patches all CD writers will call an appropriate make function and then directly call arm_smmu_write_cd_entry(). arm_smmu_write_ctx_desc() will be removed.
Until then lightly tweak arm_smmu_write_ctx_desc() to also use the new programmer by using the same logic as right now to build the target CD on the stack, sanitizing it to meet the used rules, and then using the writer.
Sanitizing is necessary because the writer expects that the currently programmed CD follows the used rules. Next patches add new make functions and new direct calls to arm_smmu_write_cd_entry() which will require this.
Signed-off-by: Michael Shavit mshavit@google.com Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Moritz Fischer moritzf@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 ++++++++++++++++----- 1 file changed, 67 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 93b79e667483..6603ca28bdb9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -118,6 +118,7 @@ struct arm_smmu_entry_writer_ops {
#define NUM_ENTRY_QWORDS 8 static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64)); +static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { [EVTQ_MSI_INDEX] = { @@ -1457,6 +1458,59 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, return &l1_desc->l2ptr[idx]; }
+struct arm_smmu_cd_writer { + struct arm_smmu_entry_writer writer; + unsigned int ssid; +}; + +static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) +{ + used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V); + if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V))) + return; + memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd)); + + /* + * If EPD0 is set by the make function it means + * T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED + */ + if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) { + used_bits[0] &= ~cpu_to_le64( + CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | + CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | + CTXDESC_CD_0_TCR_SH0); + used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); + } +} + +static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer) +{ + struct arm_smmu_cd_writer *cd_writer = + container_of(writer, struct arm_smmu_cd_writer, writer); + + arm_smmu_sync_cd(writer->master, cd_writer->ssid, true); +} + +static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { + .sync = arm_smmu_cd_writer_sync_entry, + .get_used = arm_smmu_get_cd_used, +}; + +static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target) +{ + struct arm_smmu_cd_writer cd_writer = { + .writer = { + .ops = &arm_smmu_cd_writer_ops, + .master = master, + }, + .ssid = ssid, + }; + + arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); +} + int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, struct arm_smmu_ctx_desc *cd) { @@ -1473,26 +1527,34 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, */ u64 val; bool cd_live; - struct arm_smmu_cd *cdptr; + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr = ⌖ + struct arm_smmu_cd *cd_table_entry; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) return -E2BIG;
- cdptr = arm_smmu_get_cd_ptr(master, ssid); - if (!cdptr) + cd_table_entry = arm_smmu_get_cd_ptr(master, ssid); + if (!cd_table_entry) return -ENOMEM;
+ target = *cd_table_entry; val = le64_to_cpu(cdptr->data[0]); cd_live = !!(val & CTXDESC_CD_0_V);
if (!cd) { /* (5) */ + memset(cdptr, 0, sizeof(*cdptr)); val = 0; } else if (cd == &quiet_cd) { /* (4) */ + val &= ~(CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | + CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | + CTXDESC_CD_0_TCR_SH0); if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R); val |= CTXDESC_CD_0_TCR_EPD0; + cdptr->data[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); } else if (cd_live) { /* (3) */ val &= ~CTXDESC_CD_0_ASID; val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid); @@ -1510,13 +1572,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (!(smmu->features & ARM_SMMU_FEAT_HA)) cd->tcr &= ~CTXDESC_CD_0_TCR_HA;
- /* - * STE may be live, and the SMMU might read dwords of this CD in any - * order. Ensure that it observes valid values before reading - * V=1. - */ - arm_smmu_sync_cd(master, ssid, true); - val = cd->tcr | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | @@ -1530,18 +1585,8 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (cd_table->stall_enabled) val |= CTXDESC_CD_0_S; } - - /* - * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3 - * "Configuration structures and configuration invalidation completion" - * - * The size of single-copy atomic reads made by the SMMU is - * IMPLEMENTATION DEFINED but must be at least 64 bits. Any single - * field within an aligned 64-bit span of a structure can be altered - * without first making the structure invalid. - */ - WRITE_ONCE(cdptr->data[0], cpu_to_le64(val)); - arm_smmu_sync_cd(master, ssid, true); + cdptr->data[0] = cpu_to_le64(val); + arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target); return 0; }
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit de31c355541286aa4c938c982dfcafbf062fcb93 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain, and reorganize all the places programming S1 domain CD table entries to call it.
Split arm_smmu_update_s1_domain_cd_entry() from arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call chain separate from the unrelated SVA path.
arm_smmu_update_s1_domain_cd_entry() only works on S1 domains attached to RIDs and refreshes all their CDs. Remove case (3) from arm_smmu_write_ctx_desc() as it is now handled by directly calling arm_smmu_write_cd_entry().
Remove the forced clear of the CD during S1 domain attach, arm_smmu_write_cd_entry() will do this automatically if necessary.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... [will: Drop unused arm_smmu_clean_cd_entry() function] Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 25 +++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 76 ++++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++ 3 files changed, 81 insertions(+), 29 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 15ba13078a10..9adff5fcb2c5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -53,6 +53,29 @@ static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); }
+static void +arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_master *master; + struct arm_smmu_cd target_cd; + unsigned long flags; + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + struct arm_smmu_cd *cdptr; + + /* S1 domains only support RID attachment right now */ + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + if (WARN_ON(!cdptr)) + continue; + + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + &target_cd); + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); +} + /* * Check if the CPU ASID is available on the SMMU side. If a private context * descriptor is using it, try to replace it. @@ -96,7 +119,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) * be some overlap between use of both ASIDs, until we invalidate the * TLB. */ - arm_smmu_update_ctx_desc_devices(smmu_domain, IOMMU_NO_PASID, cd); + arm_smmu_update_s1_domain_cd_entry(smmu_domain);
/* Invalidate TLB entries previously associated with that context */ arm_smmu_tlb_inv_asid(smmu, asid); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6603ca28bdb9..fcf174566c7e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1430,8 +1430,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, - u32 ssid) +struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { __le64 *l1ptr; unsigned int idx; @@ -1496,9 +1496,9 @@ static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { .get_used = arm_smmu_get_cd_used, };
-static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, - struct arm_smmu_cd *cdptr, - const struct arm_smmu_cd *target) +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target) { struct arm_smmu_cd_writer cd_writer = { .writer = { @@ -1511,6 +1511,37 @@ static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); }
+void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; + + memset(target, 0, sizeof(*target)); + + target->data[0] = cpu_to_le64( + cd->tcr | +#ifdef __BIG_ENDIAN + CTXDESC_CD_0_ENDI | +#endif + CTXDESC_CD_0_V | + CTXDESC_CD_0_AA64 | + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | + CTXDESC_CD_0_R | + CTXDESC_CD_0_A | + CTXDESC_CD_0_ASET | + FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) + ); + + if (!(master->smmu->features & ARM_SMMU_FEAT_HD)) + target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (!(master->smmu->features & ARM_SMMU_FEAT_HA)) + target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HA); + + target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); + target->data[3] = cpu_to_le64(cd->mair); +} + int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, struct arm_smmu_ctx_desc *cd) { @@ -1519,14 +1550,11 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, * * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0). * (2) Install a secondary CD, for SID+SSID traffic. - * (3) Update ASID of a CD. Atomically write the first 64 bits of the - * CD, then invalidate the old entry and mappings. * (4) Quiesce the context without clearing the valid bit. Disable * translation, and ignore any translation fault. * (5) Remove a secondary CD. */ u64 val; - bool cd_live; struct arm_smmu_cd target; struct arm_smmu_cd *cdptr = ⌖ struct arm_smmu_cd *cd_table_entry; @@ -1542,7 +1570,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
target = *cd_table_entry; val = le64_to_cpu(cdptr->data[0]); - cd_live = !!(val & CTXDESC_CD_0_V);
if (!cd) { /* (5) */ memset(cdptr, 0, sizeof(*cdptr)); @@ -1555,13 +1582,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R); val |= CTXDESC_CD_0_TCR_EPD0; cdptr->data[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); - } else if (cd_live) { /* (3) */ - val &= ~CTXDESC_CD_0_ASID; - val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid); - /* - * Until CD+TLB invalidation, both ASIDs may be used for tagging - * this substream's traffic - */ } else { /* (1) and (2) */ cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); cdptr->data[2] = 0; @@ -2901,29 +2921,29 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: + case ARM_SMMU_DOMAIN_S1: { + struct arm_smmu_cd target_cd; + struct arm_smmu_cd *cdptr; + if (!master->cd_table.cdtab) { ret = arm_smmu_alloc_cd_tables(master); if (ret) goto out_list_del; - } else { - /* - * arm_smmu_write_ctx_desc() relies on the entry being - * invalid to work, clear any existing entry. - */ - ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); - if (ret) - goto out_list_del; }
- ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); - if (ret) + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + if (!cdptr) { + ret = -ENOMEM; goto out_list_del; + }
+ arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + &target_cd); arm_smmu_make_cdtable_ste(&target, master); arm_smmu_install_ste_for_dev(master, &target); break; + } case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); arm_smmu_install_ste_for_dev(master, &target); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f638c24a3c54..f0f377c7862a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -832,6 +832,15 @@ extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; extern struct arm_smmu_ctx_desc quiet_cd;
+struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, + u32 ssid); +void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain); +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, + struct arm_smmu_cd *cdptr, + const struct arm_smmu_cd *target); + int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, struct arm_smmu_ctx_desc *cd); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit af8f0b83ea2bcc7cd365c32044f31bdadc07c351 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
A cleared entry is all 0's. Make arm_smmu_clear_cd() do this sequence.
If we are clearing an entry and for some reason it is not already allocated in the CD table then something has gone wrong.
Remove case (5) from arm_smmu_write_ctx_desc().
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Moritz Fischer moritzf@google.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 + 3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 9adff5fcb2c5..e90aa1dd71fd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -571,7 +571,7 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
mutex_lock(&sva_lock);
- arm_smmu_write_ctx_desc(master, id, NULL); + arm_smmu_clear_cd(master, id);
list_for_each_entry(t, &master->bonds, list) { if (t->mm == mm) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index fcf174566c7e..b4e54fb27f9b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1542,6 +1542,19 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, target->data[3] = cpu_to_le64(cd->mair); }
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) +{ + struct arm_smmu_cd target = {}; + struct arm_smmu_cd *cdptr; + + if (!master->cd_table.cdtab) + return; + cdptr = arm_smmu_get_cd_ptr(master, ssid); + if (WARN_ON(!cdptr)) + return; + arm_smmu_write_cd_entry(master, ssid, cdptr, &target); +} + int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, struct arm_smmu_ctx_desc *cd) { @@ -1552,7 +1565,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, * (2) Install a secondary CD, for SID+SSID traffic. * (4) Quiesce the context without clearing the valid bit. Disable * translation, and ignore any translation fault. - * (5) Remove a secondary CD. */ u64 val; struct arm_smmu_cd target; @@ -1571,10 +1583,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, target = *cd_table_entry; val = le64_to_cpu(cdptr->data[0]);
- if (!cd) { /* (5) */ - memset(cdptr, 0, sizeof(*cdptr)); - val = 0; - } else if (cd == &quiet_cd) { /* (4) */ + if (cd == &quiet_cd) { /* (4) */ val &= ~(CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | CTXDESC_CD_0_TCR_SH0); @@ -2947,9 +2956,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) case ARM_SMMU_DOMAIN_S2: arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); arm_smmu_install_ste_for_dev(master, &target); - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, - NULL); + arm_smmu_clear_cd(master, IOMMU_NO_PASID); break; }
@@ -2997,8 +3004,7 @@ static int arm_smmu_attach_dev_ste(struct device *dev, * arm_smmu_domain->devices to avoid races updating the same context * descriptor from arm_smmu_share_asid(). */ - if (master->cd_table.cdtab) - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); + arm_smmu_clear_cd(master, IOMMU_NO_PASID); return 0; }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f0f377c7862a..adb29c96e819 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -832,6 +832,7 @@ extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; extern struct arm_smmu_ctx_desc quiet_cd;
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid); void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit b2f4c0fcf094dacd2d1fb96a6fd6598919501589 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Only the attach callers can perform an allocation for the CD table entry, the other callers must not do so, they do not have the correct locking and they cannot sleep. Split up the functions so this is clear.
arm_smmu_get_cd_ptr() will return pointer to a CD table entry without doing any kind of allocation.
arm_smmu_alloc_cd_ptr() will allocate the table and any required leaf.
A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr() once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never called in the wrong context.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 59 +++++++++++++-------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +- 2 files changed, 39 insertions(+), 23 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b4e54fb27f9b..8698d68d7fdb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -161,6 +161,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct arm_smmu_device *smmu); +static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
static void parse_driver_options(struct arm_smmu_device *smmu) { @@ -1433,29 +1434,51 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) { - __le64 *l1ptr; - unsigned int idx; struct arm_smmu_l1_ctx_desc *l1_desc; - struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+ if (!cd_table->cdtab) + return NULL; + if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) return (struct arm_smmu_cd *)(cd_table->cdtab + ssid * CTXDESC_CD_DWORDS);
- idx = ssid >> CTXDESC_SPLIT; - l1_desc = &cd_table->l1_desc[idx]; - if (!l1_desc->l2ptr) { - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES]; + if (!l1_desc->l2ptr) + return NULL; + return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES]; +} + +static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, + u32 ssid) +{ + struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; + struct arm_smmu_device *smmu = master->smmu; + + if (!cd_table->cdtab) { + if (arm_smmu_alloc_cd_tables(master)) return NULL; + } + + if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) { + unsigned int idx = ssid / CTXDESC_L2_ENTRIES; + struct arm_smmu_l1_ctx_desc *l1_desc; + + l1_desc = &cd_table->l1_desc[idx]; + if (!l1_desc->l2ptr) { + __le64 *l1ptr; + + if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + return NULL;
- l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; - arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); - /* An invalid L1CD can be cached */ - arm_smmu_sync_cd(master, ssid, false); + l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; + arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); + /* An invalid L1CD can be cached */ + arm_smmu_sync_cd(master, ssid, false); + } } - idx = ssid & (CTXDESC_L2_ENTRIES - 1); - return &l1_desc->l2ptr[idx]; + return arm_smmu_get_cd_ptr(master, ssid); }
struct arm_smmu_cd_writer { @@ -1576,7 +1599,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) return -E2BIG;
- cd_table_entry = arm_smmu_get_cd_ptr(master, ssid); + cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid); if (!cd_table_entry) return -ENOMEM;
@@ -2934,13 +2957,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct arm_smmu_cd target_cd; struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab) { - ret = arm_smmu_alloc_cd_tables(master); - if (ret) - goto out_list_del; - } - - cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID); if (!cdptr) { ret = -ENOMEM; goto out_list_del; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index adb29c96e819..a353b68a95f3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -310,8 +310,7 @@ struct arm_smmu_ste { * 2lvl: at most 1024 L1 entries, * 1024 lazy entries per table. */ -#define CTXDESC_SPLIT 10 -#define CTXDESC_L2_ENTRIES (1 << CTXDESC_SPLIT) +#define CTXDESC_L2_ENTRIES 1024
#define CTXDESC_L1_DESC_DWORDS 1 #define CTXDESC_L1_DESC_V (1UL << 0)
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit 13abe4faac4348da0cf1c4eeb2b1b39fcfdb4b8f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Avoid arm_smmu_attach_dev() having to undo the changes to the smmu_domain->devices list, acquire the cdptr earlier so we don't need to handle that error.
Now there is a clear break in arm_smmu_attach_dev() where all the prep-work has been done non-disruptively and we commit to making the HW change, which cannot fail.
This completes transforming arm_smmu_attach_dev() so that it does not disturb the HW if it fails.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 +++++++-------------- 1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8698d68d7fdb..605a81c1f2e6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2908,6 +2908,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_master *master; + struct arm_smmu_cd *cdptr;
if (!fwspec) return -ENOENT; @@ -2936,6 +2937,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) if (ret) return ret;
+ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID); + if (!cdptr) + return -ENOMEM; + } + /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2955,13 +2962,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: { struct arm_smmu_cd target_cd; - struct arm_smmu_cd *cdptr; - - cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID); - if (!cdptr) { - ret = -ENOMEM; - goto out_list_del; - }
arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, @@ -2978,16 +2978,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
arm_smmu_enable_ats(master, smmu_domain); - goto out_unlock; - -out_list_del: - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del_init(&master->domain_head); - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); - -out_unlock: mutex_unlock(&arm_smmu_asid_lock); - return ret; + return 0; }
static int arm_smmu_attach_dev_ste(struct device *dev,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit 7b87c93c8b86d9d9b9567d83f0ca3d3046fdfc5a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Pull all the calculations for building the CD table entry for a mmu_struct into arm_smmu_make_sva_cd().
Call it in the two places installing the SVA CD table entry.
Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove the function.
Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the locking assertions to arm_smmu_alloc_cd_ptr() since arm_smmu_update_ctx_desc_devices() was the last problematic caller.
Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates the same value.
The behavior of quiet_cd changes slightly, the old implementation edited the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD entry. This version generates a full CD entry with a 0 TTB0 and relies on arm_smmu_write_cd_entry() to install it hitlessly.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 162 +++++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 +-------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +- 3 files changed, 112 insertions(+), 139 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index e90aa1dd71fd..7c65abff434a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -34,25 +34,6 @@ struct arm_smmu_bond {
static DEFINE_MUTEX(sva_lock);
-/* - * Write the CD to the CD tables for all masters that this domain is attached - * to. Note that this is only used to update existing CD entries in the target - * CD table, for which it's assumed that arm_smmu_write_ctx_desc can't fail. - */ -static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain, - int ssid, - struct arm_smmu_ctx_desc *cd) -{ - struct arm_smmu_master *master; - unsigned long flags; - - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { - arm_smmu_write_ctx_desc(master, ssid, cd); - } - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); -} - static void arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { @@ -128,11 +109,90 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) return NULL; }
+static u64 page_size_to_cd(void) +{ + static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K || + PAGE_SIZE == SZ_64K); + if (PAGE_SIZE == SZ_64K) + return ARM_LPAE_TCR_TG0_64K; + if (PAGE_SIZE == SZ_16K) + return ARM_LPAE_TCR_TG0_16K; + return ARM_LPAE_TCR_TG0_4K; +} + +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, + struct mm_struct *mm, u16 asid) +{ + u64 par; + + memset(target, 0, sizeof(*target)); + + par = cpuid_feature_extract_unsigned_field( + read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1), + ID_AA64MMFR0_EL1_PARANGE_SHIFT); + + target->data[0] = cpu_to_le64( + CTXDESC_CD_0_TCR_EPD1 | +#ifdef __BIG_ENDIAN + CTXDESC_CD_0_ENDI | +#endif + CTXDESC_CD_0_V | + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) | + CTXDESC_CD_0_AA64 | + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | + CTXDESC_CD_0_R | + CTXDESC_CD_0_A | + CTXDESC_CD_0_ASET | + FIELD_PREP(CTXDESC_CD_0_ASID, asid)); + + if (master->smmu->features & ARM_SMMU_FEAT_HD) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (master->smmu->features & ARM_SMMU_FEAT_HA) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA); + + /* + * If no MM is passed then this creates a SVA entry that faults + * everything. arm_smmu_write_cd_entry() can hitlessly go between these + * two entries types since TTB0 is ignored by HW when EPD0 is set. + */ + if (mm) { + target->data[0] |= cpu_to_le64( + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, + 64ULL - vabits_actual) | + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) | + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, + ARM_LPAE_TCR_RGN_WBWA) | + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, + ARM_LPAE_TCR_RGN_WBWA) | + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS)); + + target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) & + CTXDESC_CD_1_TTB0_MASK); + } else { + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0); + + /* + * Disable stall and immediately generate an abort if stall + * disable is permitted. This speeds up cleanup for an unclean + * exit if the device is still doing a lot of DMA. + */ + if (!(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) + target->data[0] &= + cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R)); + } + + /* + * MAIR value is pretty much constant and global, so we can just get it + * from the current CPU register + */ + target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); +} + static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) { u16 asid; int err = 0; - u64 tcr, par, reg; struct arm_smmu_ctx_desc *cd; struct arm_smmu_ctx_desc *ret = NULL;
@@ -166,41 +226,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) if (err) goto out_free_asid;
- /* HA and HD will be filtered out later if not supported by the SMMU */ - tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) | - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) | - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) | - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) | - CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; - - switch (PAGE_SIZE) { - case SZ_4K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K); - break; - case SZ_16K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K); - break; - case SZ_64K: - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K); - break; - default: - WARN_ON(1); - err = -EINVAL; - goto out_free_asid; - } - - reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); - par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_EL1_PARANGE_SHIFT); - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par); - - cd->ttbr = virt_to_phys(mm->pgd); - cd->tcr = tcr; - /* - * MAIR value is pretty much constant and global, so we can just get it - * from the current CPU register - */ - cd->mair = read_sysreg(mair_el1); cd->asid = asid; cd->mm = mm;
@@ -278,6 +303,8 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_master *master; + unsigned long flags;
mutex_lock(&sva_lock); if (smmu_mn->cleared) { @@ -289,8 +316,19 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events, * but disable translation. */ - arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm), - &quiet_cd); + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr; + + cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + if (WARN_ON(!cdptr)) + continue; + arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid); + arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr, + &target); + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); @@ -385,6 +423,8 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, struct mm_struct *mm) { int ret; + struct arm_smmu_cd target; + struct arm_smmu_cd *cdptr; struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); @@ -411,9 +451,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, goto err_free_bond; }
- ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd); - if (ret) + cdptr = arm_smmu_alloc_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + if (!cdptr) { + ret = -ENOMEM; goto err_put_notifier; + } + arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); + arm_smmu_write_cd_entry(master, pasid, cdptr, &target);
list_add(&bond->list, &master->bonds); return 0; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 605a81c1f2e6..eec336eb24f8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -146,12 +146,6 @@ struct arm_smmu_option_prop { DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa); DEFINE_MUTEX(arm_smmu_asid_lock);
-/* - * Special value used by SVA when a process dies, to quiesce a CD without - * disabling it. - */ -struct arm_smmu_ctx_desc quiet_cd = { 0 }; - static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, @@ -1427,7 +1421,7 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
- /* See comment in arm_smmu_write_ctx_desc() */ + /* The HW has 64 bit atomicity with stores to the L2 CD table */ WRITE_ONCE(*dst, cpu_to_le64(val)); }
@@ -1450,12 +1444,15 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES]; }
-static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, - u32 ssid) +struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu;
+ might_sleep(); + iommu_group_mutex_assert(master->dev); + if (!cd_table->cdtab) { if (arm_smmu_alloc_cd_tables(master)) return NULL; @@ -1578,70 +1575,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) arm_smmu_write_cd_entry(master, ssid, cdptr, &target); }
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, - struct arm_smmu_ctx_desc *cd) -{ - /* - * This function handles the following cases: - * - * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0). - * (2) Install a secondary CD, for SID+SSID traffic. - * (4) Quiesce the context without clearing the valid bit. Disable - * translation, and ignore any translation fault. - */ - u64 val; - struct arm_smmu_cd target; - struct arm_smmu_cd *cdptr = ⌖ - struct arm_smmu_cd *cd_table_entry; - struct arm_smmu_device *smmu = master->smmu; - struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; - - if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) - return -E2BIG; - - cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid); - if (!cd_table_entry) - return -ENOMEM; - - target = *cd_table_entry; - val = le64_to_cpu(cdptr->data[0]); - - if (cd == &quiet_cd) { /* (4) */ - val &= ~(CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | - CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | - CTXDESC_CD_0_TCR_SH0); - if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) - val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R); - val |= CTXDESC_CD_0_TCR_EPD0; - cdptr->data[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); - } else { /* (1) and (2) */ - cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - cdptr->data[2] = 0; - cdptr->data[3] = cpu_to_le64(cd->mair); - - if (!(smmu->features & ARM_SMMU_FEAT_HD)) - cd->tcr &= ~CTXDESC_CD_0_TCR_HD; - if (!(smmu->features & ARM_SMMU_FEAT_HA)) - cd->tcr &= ~CTXDESC_CD_0_TCR_HA; - - val = cd->tcr | -#ifdef __BIG_ENDIAN - CTXDESC_CD_0_ENDI | -#endif - CTXDESC_CD_0_R | CTXDESC_CD_0_A | - (cd->mm ? 0 : CTXDESC_CD_0_ASET) | - CTXDESC_CD_0_AA64 | - FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | - CTXDESC_CD_0_V; - - if (cd_table->stall_enabled) - val |= CTXDESC_CD_0_S; - } - cdptr->data[0] = cpu_to_le64(val); - arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target); - return 0; -} - static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) { int ret; @@ -1650,7 +1583,6 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- cd_table->stall_enabled = master->stall_enabled; cd_table->s1cdmax = master->ssid_bits; max_contexts = 1 << cd_table->s1cdmax;
@@ -1748,7 +1680,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span); val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
- /* See comment in arm_smmu_write_ctx_desc() */ + /* The HW has 64 bit atomicity with stores to the L2 STE table */ WRITE_ONCE(*dst, cpu_to_le64(val)); }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a353b68a95f3..5f1da8fd94ae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -660,8 +660,6 @@ struct arm_smmu_ctx_desc_cfg { u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; - /* Whether CD entries in this table have the stall bit set. */ - u8 stall_enabled:1; };
struct arm_smmu_s2_cfg { @@ -829,11 +827,12 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; -extern struct arm_smmu_ctx_desc quiet_cd;
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid); +struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, + u32 ssid); void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain); @@ -841,8 +840,6 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target);
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, - struct arm_smmu_ctx_desc *cd); void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit 04905c17f64890311e6b5a5065d8c220602712e5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Half the code was living in arm_smmu_domain_finalise_s1(), just move it here and take the values directly from the pgtbl_ops instead of storing copies.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Mostafa Saleh smostafa@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/8-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org SIgned-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 55 +++++++++------------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 -- 2 files changed, 22 insertions(+), 36 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index eec336eb24f8..c5279249af02 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1536,15 +1536,25 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, struct arm_smmu_domain *smmu_domain) { struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; + const struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = + &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
memset(target, 0, sizeof(*target));
target->data[0] = cpu_to_le64( - cd->tcr | + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | #ifdef __BIG_ENDIAN CTXDESC_CD_0_ENDI | #endif + CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_V | + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | CTXDESC_CD_0_AA64 | (master->stall_enabled ? CTXDESC_CD_0_S : 0) | CTXDESC_CD_0_R | @@ -1553,13 +1563,14 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) );
- if (!(master->smmu->features & ARM_SMMU_FEAT_HD)) - target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HD); - if (!(master->smmu->features & ARM_SMMU_FEAT_HA)) - target->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_TCR_HA); + if (master->smmu->features & ARM_SMMU_FEAT_HD) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HD); + if (master->smmu->features & ARM_SMMU_FEAT_HA) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA);
- target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); - target->data[3] = cpu_to_le64(cd->mair); + target->data[1] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.ttbr & + CTXDESC_CD_1_TTB0_MASK); + target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.mair); }
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) @@ -2538,13 +2549,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) }
static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg) + struct arm_smmu_domain *smmu_domain) { int ret; u32 asid; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; - typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
refcount_set(&cd->refs, 1);
@@ -2552,32 +2561,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, mutex_lock(&arm_smmu_asid_lock); ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); - if (ret) - goto out_unlock; - cd->asid = (u16)asid; - cd->ttbr = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; - cd->tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | - FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | - FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | - CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD | - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; - cd->mair = pgtbl_cfg->arm_lpae_s1_cfg.mair; - - mutex_unlock(&arm_smmu_asid_lock); - return 0; - -out_unlock: mutex_unlock(&arm_smmu_asid_lock); return ret; }
static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg) + struct arm_smmu_domain *smmu_domain) { int vmid; struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; @@ -2601,8 +2591,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; int (*finalise_stage_fn)(struct arm_smmu_device *smmu, - struct arm_smmu_domain *smmu_domain, - struct io_pgtable_cfg *pgtbl_cfg); + struct arm_smmu_domain *smmu_domain);
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2658,7 +2647,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; smmu_domain->domain.geometry.force_aperture = true;
- ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg); + ret = finalise_stage_fn(smmu, smmu_domain); if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 5f1da8fd94ae..5eedfe0c118f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -639,9 +639,6 @@ struct arm_smmu_strtab_l1_desc {
struct arm_smmu_ctx_desc { u16 asid; - u64 ttbr; - u64 tcr; - u64 mair;
refcount_t refs; struct mm_struct *mm;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit 56e1a4cc2588a7cb9664457a62fd7a77e005aa01 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add tests for some of the more common STE update operations that we expect to see, as well as some artificial STE updates to test the edges of arm_smmu_write_entry. These also serve as a record of which common operation is expected to be hitless, and how many syncs they require.
arm_smmu_write_entry implements a generic algorithm that updates an STE/CD to any other abritrary STE/CD configuration. The update requires a sequence of write+sync operations with some invariants that must be held true after each sync. arm_smmu_write_entry lends itself well to unit-testing since the function's interaction with the STE/CD is already abstracted by input callbacks that we can hook to introspect into the sequence of operations. We can use these hooks to guarantee that invariants are held throughout the entire update operation.
Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Michael Shavit mshavit@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/9-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/Kconfig | 13 +- drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 8 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 465 ++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 30 ++ 6 files changed, 533 insertions(+), 27 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 75131a1b762c..88fc30074eb3 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -398,9 +398,9 @@ config ARM_SMMU_V3 Say Y here if your system includes an IOMMU device implementing the ARM SMMUv3 architecture.
+if ARM_SMMU_V3 config ARM_SMMU_V3_SVA bool "Shared Virtual Addressing support for the ARM SMMUv3" - depends on ARM_SMMU_V3 select IOMMU_SVA select IOMMU_IOPF select MMU_NOTIFIER @@ -436,6 +436,17 @@ config ARM_SMMU_V3_ECMDQ
If not sure, say no.
+config ARM_SMMU_V3_KUNIT_TEST + bool "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS + depends on KUNIT + depends on ARM_SMMU_V3_SVA + default KUNIT_ALL_TESTS + help + Enable this option to unit-test arm-smmu-v3 driver functions. + + If unsure, say N. +endif + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 8781d131961d..feada86b1cc7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -3,4 +3,5 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-objs-y += arm-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_HISI_VIRTCCA_CODA) += arm-s-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o +arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 7c65abff434a..c98f9c28ae60 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -8,6 +8,7 @@ #include <linux/mmu_notifier.h> #include <linux/sched/mm.h> #include <linux/slab.h> +#include <kunit/visibility.h>
#include "arm-smmu-v3.h" #include "../../io-pgtable-arm.h" @@ -120,9 +121,10 @@ static u64 page_size_to_cd(void) return ARM_LPAE_TCR_TG0_4K; }
-static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, - struct arm_smmu_master *master, - struct mm_struct *mm, u16 asid) +VISIBLE_IF_KUNIT +void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, struct mm_struct *mm, + u16 asid) { u64 par;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c new file mode 100644 index 000000000000..417804392ff0 --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -0,0 +1,465 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2024 Google LLC. + */ +#include <kunit/test.h> +#include <linux/io-pgtable.h> + +#include "arm-smmu-v3.h" + +struct arm_smmu_test_writer { + struct arm_smmu_entry_writer writer; + struct kunit *test; + const __le64 *init_entry; + const __le64 *target_entry; + __le64 *entry; + + bool invalid_entry_written; + unsigned int num_syncs; +}; + +#define NUM_ENTRY_QWORDS 8 +#define NUM_EXPECTED_SYNCS(x) x + +static struct arm_smmu_ste bypass_ste; +static struct arm_smmu_ste abort_ste; +static struct arm_smmu_device smmu = { + .features = ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_ATTR_TYPES_OVR +}; +static struct mm_struct sva_mm = { + .pgd = (void *)0xdaedbeefdeadbeefULL, +}; + +static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry, + const __le64 *used_bits, + const __le64 *target, + unsigned int length) +{ + bool differs = false; + unsigned int i; + + for (i = 0; i < length; i++) { + if ((entry[i] & used_bits[i]) != target[i]) + differs = true; + } + return differs; +} + +static void +arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer) +{ + struct arm_smmu_test_writer *test_writer = + container_of(writer, struct arm_smmu_test_writer, writer); + __le64 *entry_used_bits; + + entry_used_bits = kunit_kzalloc( + test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS, + GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits); + + pr_debug("STE value is now set to: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, + test_writer->entry, + NUM_ENTRY_QWORDS * sizeof(*test_writer->entry), + false); + + test_writer->num_syncs += 1; + if (!test_writer->entry[0]) { + test_writer->invalid_entry_written = true; + } else { + /* + * At any stage in a hitless transition, the entry must be + * equivalent to either the initial entry or the target entry + * when only considering the bits used by the current + * configuration. + */ + writer->ops->get_used(test_writer->entry, entry_used_bits); + KUNIT_EXPECT_FALSE( + test_writer->test, + arm_smmu_entry_differs_in_used_bits( + test_writer->entry, entry_used_bits, + test_writer->init_entry, NUM_ENTRY_QWORDS) && + arm_smmu_entry_differs_in_used_bits( + test_writer->entry, entry_used_bits, + test_writer->target_entry, + NUM_ENTRY_QWORDS)); + } +} + +static void +arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer, + const __le64 *ste) +{ + __le64 used_bits[NUM_ENTRY_QWORDS] = {}; + + arm_smmu_get_ste_used(ste, used_bits); + pr_debug("STE used bits: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, used_bits, + sizeof(used_bits), false); +} + +static const struct arm_smmu_entry_writer_ops test_ste_ops = { + .sync = arm_smmu_test_writer_record_syncs, + .get_used = arm_smmu_get_ste_used, +}; + +static const struct arm_smmu_entry_writer_ops test_cd_ops = { + .sync = arm_smmu_test_writer_record_syncs, + .get_used = arm_smmu_get_cd_used, +}; + +static void arm_smmu_v3_test_ste_expect_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, unsigned int num_syncs_expected, + bool hitless) +{ + struct arm_smmu_ste cur_copy = *cur; + struct arm_smmu_test_writer test_writer = { + .writer = { + .ops = &test_ste_ops, + }, + .test = test, + .init_entry = cur->data, + .target_entry = target->data, + .entry = cur_copy.data, + .num_syncs = 0, + .invalid_entry_written = false, + + }; + + pr_debug("STE initial value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data, + sizeof(cur_copy), false); + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data); + pr_debug("STE target value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, target->data, + sizeof(cur_copy), false); + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, + target->data); + + arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data); + + KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless); + KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected); + KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); +} + +static void arm_smmu_v3_test_ste_expect_hitless_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, unsigned int num_syncs_expected) +{ + arm_smmu_v3_test_ste_expect_transition(test, cur, target, + num_syncs_expected, true); +} + +static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0; + +static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, + const dma_addr_t dma_addr) +{ + struct arm_smmu_master master = { + .cd_table.cdtab_dma = dma_addr, + .cd_table.s1cdmax = 0xFF, + .cd_table.s1fmt = STRTAB_STE_0_S1FMT_64K_L2, + .smmu = &smmu, + }; + + arm_smmu_make_cdtable_ste(ste, &master); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) +{ + /* + * Bypass STEs has used bits in the first two Qwords, while abort STEs + * only have used bits in the first QWord. Transitioning from bypass to + * abort requires two syncs: the first to set the first qword and make + * the STE into an abort, the second to clean up the second qword. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &abort_ste, NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_abort_to_bypass(struct kunit *test) +{ + /* + * Transitioning from abort to bypass also requires two syncs: the first + * to set the second qword data required by the bypass STE, and the + * second to set the first qword and switch to bypass. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &abort_ste, &bypass_ste, NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste, + NUM_EXPECTED_SYNCS(3)); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste, + NUM_EXPECTED_SYNCS(3)); +} + +static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, + bool ats_enabled) +{ + struct arm_smmu_master master = { + .smmu = &smmu, + .ats_enabled = ats_enabled, + }; + struct io_pgtable io_pgtable = {}; + struct arm_smmu_domain smmu_domain = { + .pgtbl_ops = &io_pgtable.ops, + }; + + io_pgtable.cfg.arm_lpae_s2_cfg.vttbr = 0xdaedbeefdeadbeefULL; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.ps = 1; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tg = 2; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sh = 3; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.orgn = 1; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.irgn = 2; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3; + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4; + + arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain); +} + +static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_abort_to_s2(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_s2_to_bypass(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test) +{ + struct arm_smmu_ste ste; + + arm_smmu_test_make_s2_ste(&ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste, + NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_test_cd_expect_transition( + struct kunit *test, const struct arm_smmu_cd *cur, + const struct arm_smmu_cd *target, unsigned int num_syncs_expected, + bool hitless) +{ + struct arm_smmu_cd cur_copy = *cur; + struct arm_smmu_test_writer test_writer = { + .writer = { + .ops = &test_cd_ops, + }, + .test = test, + .init_entry = cur->data, + .target_entry = target->data, + .entry = cur_copy.data, + .num_syncs = 0, + .invalid_entry_written = false, + + }; + + pr_debug("CD initial value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data, + sizeof(cur_copy), false); + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data); + pr_debug("CD target value: "); + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, target->data, + sizeof(cur_copy), false); + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, + target->data); + + arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data); + + KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless); + KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected); + KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); +} + +static void arm_smmu_v3_test_cd_expect_non_hitless_transition( + struct kunit *test, const struct arm_smmu_cd *cur, + const struct arm_smmu_cd *target, unsigned int num_syncs_expected) +{ + arm_smmu_v3_test_cd_expect_transition(test, cur, target, + num_syncs_expected, false); +} + +static void arm_smmu_v3_test_cd_expect_hitless_transition( + struct kunit *test, const struct arm_smmu_cd *cur, + const struct arm_smmu_cd *target, unsigned int num_syncs_expected) +{ + arm_smmu_v3_test_cd_expect_transition(test, cur, target, + num_syncs_expected, true); +} + +static void arm_smmu_test_make_s1_cd(struct arm_smmu_cd *cd, unsigned int asid) +{ + struct arm_smmu_master master = { + .smmu = &smmu, + }; + struct io_pgtable io_pgtable = {}; + struct arm_smmu_domain smmu_domain = { + .pgtbl_ops = &io_pgtable.ops, + .cd = { + .asid = asid, + }, + }; + + io_pgtable.cfg.arm_lpae_s1_cfg.ttbr = 0xdaedbeefdeadbeefULL; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.ips = 1; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tg = 2; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.sh = 3; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.orgn = 1; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.irgn = 2; + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tsz = 4; + io_pgtable.cfg.arm_lpae_s1_cfg.mair = 0xabcdef012345678ULL; + + arm_smmu_make_s1_cd(cd, &master, &smmu_domain); +} + +static void arm_smmu_v3_write_cd_test_s1_clear(struct kunit *test) +{ + struct arm_smmu_cd cd = {}; + struct arm_smmu_cd cd_2; + + arm_smmu_test_make_s1_cd(&cd_2, 1997); + arm_smmu_v3_test_cd_expect_non_hitless_transition( + test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2)); + arm_smmu_v3_test_cd_expect_non_hitless_transition( + test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_cd_test_s1_change_asid(struct kunit *test) +{ + struct arm_smmu_cd cd = {}; + struct arm_smmu_cd cd_2; + + arm_smmu_test_make_s1_cd(&cd, 778); + arm_smmu_test_make_s1_cd(&cd_2, 1997); + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2, + NUM_EXPECTED_SYNCS(1)); + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd, + NUM_EXPECTED_SYNCS(1)); +} + +static void arm_smmu_test_make_sva_cd(struct arm_smmu_cd *cd, unsigned int asid) +{ + struct arm_smmu_master master = { + .smmu = &smmu, + }; + + arm_smmu_make_sva_cd(cd, &master, &sva_mm, asid); +} + +static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd, + unsigned int asid) +{ + struct arm_smmu_master master = { + .smmu = &smmu, + }; + + arm_smmu_make_sva_cd(cd, &master, NULL, asid); +} + +static void arm_smmu_v3_write_cd_test_sva_clear(struct kunit *test) +{ + struct arm_smmu_cd cd = {}; + struct arm_smmu_cd cd_2; + + arm_smmu_test_make_sva_cd(&cd_2, 1997); + arm_smmu_v3_test_cd_expect_non_hitless_transition( + test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2)); + arm_smmu_v3_test_cd_expect_non_hitless_transition( + test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2)); +} + +static void arm_smmu_v3_write_cd_test_sva_release(struct kunit *test) +{ + struct arm_smmu_cd cd; + struct arm_smmu_cd cd_2; + + arm_smmu_test_make_sva_cd(&cd, 1997); + arm_smmu_test_make_sva_release_cd(&cd_2, 1997); + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2, + NUM_EXPECTED_SYNCS(2)); + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd, + NUM_EXPECTED_SYNCS(2)); +} + +static struct kunit_case arm_smmu_v3_test_cases[] = { + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort), + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2), + KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_clear), + KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid), + KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear), + KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release), + {}, +}; + +static int arm_smmu_v3_test_suite_init(struct kunit_suite *test) +{ + arm_smmu_make_bypass_ste(&smmu, &bypass_ste); + arm_smmu_make_abort_ste(&abort_ste); + return 0; +} + +static struct kunit_suite arm_smmu_v3_test_module = { + .name = "arm-smmu-v3-kunit-test", + .suite_init = arm_smmu_v3_test_suite_init, + .test_cases = arm_smmu_v3_test_cases, +}; +kunit_test_suites(&arm_smmu_v3_test_module); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index c5279249af02..c0ad46d32171 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -26,6 +26,7 @@ #include <linux/pci.h> #include <linux/pci-ats.h> #include <linux/platform_device.h> +#include <kunit/visibility.h>
#include "arm-smmu-v3.h" #include "../../dma-iommu.h" @@ -105,17 +106,6 @@ enum arm_smmu_msi_index { ARM_SMMU_MAX_MSIS, };
-struct arm_smmu_entry_writer_ops; -struct arm_smmu_entry_writer { - const struct arm_smmu_entry_writer_ops *ops; - struct arm_smmu_master *master; -}; - -struct arm_smmu_entry_writer_ops { - void (*get_used)(const __le64 *entry, __le64 *used); - void (*sync)(struct arm_smmu_entry_writer *writer); -}; - #define NUM_ENTRY_QWORDS 8 static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64)); static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64)); @@ -1203,7 +1193,8 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) * would be nice if this was complete according to the spec, but minimally it * has to capture the bits this driver uses. */ -static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) +VISIBLE_IF_KUNIT +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) { unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
@@ -1325,8 +1316,9 @@ static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry, * V=0 process. This relies on the IGNORED behavior described in the * specification. */ -static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, - __le64 *entry, const __le64 *target) +VISIBLE_IF_KUNIT +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry, + const __le64 *target) { __le64 unused_update[NUM_ENTRY_QWORDS]; u8 used_qword_diff; @@ -1483,7 +1475,8 @@ struct arm_smmu_cd_writer { unsigned int ssid; };
-static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) +VISIBLE_IF_KUNIT +void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) { used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V); if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V))) @@ -1747,7 +1740,8 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, } }
-static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) +VISIBLE_IF_KUNIT +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1755,8 +1749,9 @@ static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); }
-static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, - struct arm_smmu_ste *target) +VISIBLE_IF_KUNIT +void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, + struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); target->data[0] = cpu_to_le64( @@ -1768,8 +1763,9 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, STRTAB_STE_1_SHCFG_INCOMING)); }
-static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master) +VISIBLE_IF_KUNIT +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1818,9 +1814,10 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, } }
-static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) +VISIBLE_IF_KUNIT +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain) { struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; const struct io_pgtable_cfg *pgtbl_cfg = diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 5eedfe0c118f..a096817dcaff 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -817,6 +817,36 @@ struct arm_smmu_domain { #endif };
+/* The following are exposed for testing purposes. */ +struct arm_smmu_entry_writer_ops; +struct arm_smmu_entry_writer { + const struct arm_smmu_entry_writer_ops *ops; + struct arm_smmu_master *master; +}; + +struct arm_smmu_entry_writer_ops { + void (*get_used)(const __le64 *entry, __le64 *used); + void (*sync)(struct arm_smmu_entry_writer *writer); +}; + +#if IS_ENABLED(CONFIG_KUNIT) +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits); +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur, + const __le64 *target); +void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits); +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); +void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, + struct arm_smmu_ste *target); +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master); +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain); +void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, + struct arm_smmu_master *master, struct mm_struct *mm, + u16 asid); +#endif + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 678d79b98028ce2365b30e35479bea0e555c23d3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This allows the driver the receive the mm and always a device during allocation. Later patches need this to properly setup the notifier when the domain is first allocated.
Remove ops->domain_alloc() as SVA was the only remaining purpose.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 +--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 +++----- 3 files changed, 8 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index c98f9c28ae60..517387282dc8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -660,13 +660,15 @@ static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { .free = arm_smmu_sva_domain_free };
-struct iommu_domain *arm_smmu_sva_domain_alloc(void) +struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, + struct mm_struct *mm) { struct iommu_domain *domain;
domain = kzalloc(sizeof(*domain), GFP_KERNEL); if (!domain) - return NULL; + return ERR_PTR(-ENOMEM); + domain->type = IOMMU_DOMAIN_SVA; domain->ops = &arm_smmu_sva_domain_ops;
return domain; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index c0ad46d32171..3d54cb295db5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2484,14 +2484,6 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) } }
-static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) -{ - - if (type == IOMMU_DOMAIN_SVA) - return arm_smmu_sva_domain_alloc(); - return ERR_PTR(-EOPNOTSUPP); -} - static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) { struct arm_smmu_domain *smmu_domain; @@ -3586,8 +3578,8 @@ static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, - .domain_alloc = arm_smmu_domain_alloc, .domain_alloc_paging = arm_smmu_domain_alloc_paging, + .domain_alloc_sva = arm_smmu_sva_domain_alloc, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a096817dcaff..eaf1ade04c65 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -883,7 +883,8 @@ int arm_smmu_master_enable_sva(struct arm_smmu_master *master); int arm_smmu_master_disable_sva(struct arm_smmu_master *master); bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); void arm_smmu_sva_notifier_synchronize(void); -struct iommu_domain *arm_smmu_sva_domain_alloc(void); +struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, + struct mm_struct *mm); void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */ @@ -919,10 +920,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
static inline void arm_smmu_sva_notifier_synchronize(void) {}
-static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void) -{ - return NULL; -} +#define arm_smmu_sva_domain_alloc NULL
static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct device *dev,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 85f2fb6ef4137c631c9d2663716d998d7e4f164f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add arm_smmu_set_pasid()/arm_smmu_remove_pasid() which are to be used by callers that already constructed the arm_smmu_cd they wish to program.
These functions will encapsulate the shared logic to setup a CD entry that will be shared by SVA and S1 domain cases.
Prior fixes had already moved most of this logic up into __arm_smmu_sva_bind(), move it to it's final home.
Following patches will relieve some of the remaining SVA restrictions:
- The RID domain is a S1 domain and has already setup the STE to point to the CD table - The programmed PASID is the mm_get_enqcmd_pasid() - Nothing changes while SVA is running (sva_enable)
SVA invalidation will still iterate over the S1 domain's master list, later patches will resolve that.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 57 ++++++++++--------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 ++- 3 files changed, 67 insertions(+), 31 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 517387282dc8..02a3d2b76950 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -421,29 +421,27 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) arm_smmu_free_shared_cd(cd); }
-static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, - struct mm_struct *mm) +static struct arm_smmu_bond *__arm_smmu_sva_bind(struct device *dev, + struct mm_struct *mm) { int ret; - struct arm_smmu_cd target; - struct arm_smmu_cd *cdptr; struct arm_smmu_bond *bond; struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct arm_smmu_domain *smmu_domain;
if (!(domain->type & __IOMMU_DOMAIN_PAGING)) - return -ENODEV; + return ERR_PTR(-ENODEV); smmu_domain = to_smmu_domain(domain); if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) - return -ENODEV; + return ERR_PTR(-ENODEV);
if (!master || !master->sva_enabled) - return -ENODEV; + return ERR_PTR(-ENODEV);
bond = kzalloc(sizeof(*bond), GFP_KERNEL); if (!bond) - return -ENOMEM; + return ERR_PTR(-ENOMEM);
bond->mm = mm;
@@ -453,22 +451,12 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid, goto err_free_bond; }
- cdptr = arm_smmu_alloc_cd_ptr(master, mm_get_enqcmd_pasid(mm)); - if (!cdptr) { - ret = -ENOMEM; - goto err_put_notifier; - } - arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); - arm_smmu_write_cd_entry(master, pasid, cdptr, &target); - list_add(&bond->list, &master->bonds); - return 0; + return bond;
-err_put_notifier: - arm_smmu_mmu_notifier_put(bond->smmu_mn); err_free_bond: kfree(bond); - return ret; + return ERR_PTR(ret); }
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) @@ -615,10 +603,9 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, struct arm_smmu_bond *bond = NULL, *t; struct arm_smmu_master *master = dev_iommu_priv_get(dev);
- mutex_lock(&sva_lock); - - arm_smmu_clear_cd(master, id); + arm_smmu_remove_pasid(master, to_smmu_domain(domain), id);
+ mutex_lock(&sva_lock); list_for_each_entry(t, &master->bonds, list) { if (t->mm == mm) { bond = t; @@ -637,17 +624,33 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id) { - int ret = 0; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct mm_struct *mm = domain->mm; + struct arm_smmu_bond *bond; + struct arm_smmu_cd target; + int ret;
if (mm_get_enqcmd_pasid(mm) != id) return -EINVAL;
mutex_lock(&sva_lock); - ret = __arm_smmu_sva_bind(dev, id, mm); - mutex_unlock(&sva_lock); + bond = __arm_smmu_sva_bind(dev, mm); + if (IS_ERR(bond)) { + mutex_unlock(&sva_lock); + return PTR_ERR(bond); + }
- return ret; + arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); + ret = arm_smmu_set_pasid(master, NULL, id, &target); + if (ret) { + list_del(&bond->list); + arm_smmu_mmu_notifier_put(bond->smmu_mn); + kfree(bond); + mutex_unlock(&sva_lock); + return ret; + } + mutex_unlock(&sva_lock); + return 0; }
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 3d54cb295db5..b9eb9fcae98e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1436,8 +1436,8 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES]; }
-struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, - u32 ssid) +static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, + u32 ssid) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -2672,6 +2672,10 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, int i, j; struct arm_smmu_device *smmu = master->smmu;
+ master->cd_table.in_ste = + FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(target->data[0])) == + STRTAB_STE_0_CFG_S1_TRANS; + for (i = 0; i < master->num_streams; ++i) { u32 sid = master->streams[i].id; struct arm_smmu_ste *step = @@ -2892,6 +2896,30 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
+int arm_smmu_set_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid, + const struct arm_smmu_cd *cd) +{ + struct arm_smmu_cd *cdptr; + + /* The core code validates pasid */ + + if (!master->cd_table.in_ste) + return -ENODEV; + + cdptr = arm_smmu_alloc_cd_ptr(master, pasid); + if (!cdptr) + return -ENOMEM; + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); + return 0; +} + +void arm_smmu_remove_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid) +{ + arm_smmu_clear_cd(master, pasid); +} + static int arm_smmu_attach_dev_ste(struct device *dev, struct arm_smmu_ste *ste) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index eaf1ade04c65..aee347113366 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -654,6 +654,7 @@ struct arm_smmu_ctx_desc_cfg { dma_addr_t cdtab_dma; struct arm_smmu_l1_ctx_desc *l1_desc; unsigned int num_l1_ents; + u8 in_ste; u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; @@ -858,8 +859,6 @@ extern struct mutex arm_smmu_asid_lock; void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid); -struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, - u32 ssid); void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain); @@ -867,6 +866,12 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target);
+int arm_smmu_set_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid, + const struct arm_smmu_cd *cd); +void arm_smmu_remove_pasid(struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, ioasid_t pasid); + void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf,
mainline inclusion from mainline-v6.11-rc1 commit ad10dce61303d82f7bdd2dbb116e02146778f728 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The next patch will need to store the same master twice (with different SSIDs), so allocate memory for each list element.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/coda/coda.c | 4 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 ++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++- virt/kvm/vfio.c | 4 +- 5 files changed, 53 insertions(+), 12 deletions(-)
diff --git a/drivers/coda/coda.c b/drivers/coda/coda.c index 7e88da67478c..a163efbd2705 100644 --- a/drivers/coda/coda.c +++ b/drivers/coda/coda.c @@ -268,6 +268,7 @@ u32 virtcca_tmi_dev_attach(struct arm_smmu_domain *arm_smmu_domain, struct kvm * unsigned long flags; int i, j; struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain; int ret = 0; u64 cmd[CMDQ_ENT_DWORDS] = {0}; struct virtcca_cvm *virtcca_cvm = kvm->arch.virtcca_cvm; @@ -277,7 +278,8 @@ u32 virtcca_tmi_dev_attach(struct arm_smmu_domain *arm_smmu_domain, struct kvm * * Traverse all devices under the secure smmu domain and * set the correspnding address translation table for each device */ - list_for_each_entry(master, &arm_smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &arm_smmu_domain->devices, devices_elm) { + master = master_domain->master; if (master && master->num_streams >= 0) { for (i = 0; i < master->num_streams; i++) { u32 sid = master->streams[i].id; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 02a3d2b76950..edcde13bfcac 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -38,12 +38,13 @@ static DEFINE_MUTEX(sva_lock); static void arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { - struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain; struct arm_smmu_cd target_cd; unsigned long flags;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { + struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd *cdptr;
/* S1 domains only support RID attachment right now */ @@ -305,7 +306,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); struct arm_smmu_domain *smmu_domain = smmu_mn->domain; - struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain; unsigned long flags;
mutex_lock(&sva_lock); @@ -319,7 +320,9 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) * but disable translation. */ spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd target; struct arm_smmu_cd *cdptr;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b9eb9fcae98e..bd80073306ab 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2248,10 +2248,10 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, unsigned long iova, size_t size) { + struct arm_smmu_master_domain *master_domain; int i, ret; unsigned long flags; struct arm_smmu_cmdq_ent cmd; - struct arm_smmu_master *master; struct arm_smmu_cmdq_batch cmds;
if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) @@ -2280,7 +2280,10 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
arm_smmu_preempt_disable(smmu_domain->smmu); spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_for_each_entry(master, &smmu_domain->devices, domain_head) { + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + struct arm_smmu_master *master = master_domain->master; + if (!master->ats_enabled) continue;
@@ -2794,9 +2797,26 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) pci_disable_pasid(pdev); }
+static struct arm_smmu_master_domain * +arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_master *master) +{ + struct arm_smmu_master_domain *master_domain; + + lockdep_assert_held(&smmu_domain->devices_lock); + + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + if (master_domain->master == master) + return master_domain; + } + return NULL; +} + static void arm_smmu_detach_dev(struct arm_smmu_master *master) { struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_master_domain *master_domain; struct arm_smmu_domain *smmu_domain; unsigned long flags;
@@ -2807,7 +2827,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) arm_smmu_disable_ats(master, smmu_domain);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_del_init(&master->domain_head); + master_domain = arm_smmu_find_master_domain(smmu_domain, master); + if (master_domain) { + list_del(&master_domain->devices_elm); + kfree(master_domain); + } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
master->ats_enabled = false; @@ -2821,6 +2845,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master_domain *master_domain; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr;
@@ -2857,6 +2882,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENOMEM; }
+ master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); + if (!master_domain) + return -ENOMEM; + master_domain->master = master; + /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2870,7 +2900,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->ats_enabled = arm_smmu_ats_supported(master);
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_add(&master->domain_head, &smmu_domain->devices); + list_add(&master_domain->devices_elm, &smmu_domain->devices); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
switch (smmu_domain->stage) { @@ -3207,7 +3237,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) master->dev = dev; master->smmu = smmu; INIT_LIST_HEAD(&master->bonds); - INIT_LIST_HEAD(&master->domain_head); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index aee347113366..c1b7233d1cfe 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -773,7 +773,6 @@ struct arm_smmu_stream { struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; - struct list_head domain_head; struct arm_smmu_stream *streams; /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; @@ -807,6 +806,7 @@ struct arm_smmu_domain {
struct iommu_domain domain;
+ /* List of struct arm_smmu_master_domain */ struct list_head devices; spinlock_t devices_lock;
@@ -848,6 +848,11 @@ void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, u16 asid); #endif
+struct arm_smmu_master_domain { + struct list_head devices_elm; + struct arm_smmu_master *master; +}; + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index a07dd1badbe6..af211059332e 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -461,10 +461,12 @@ struct kvm *virtcca_arm_smmu_get_kvm(struct arm_smmu_domain *domain) struct iommu_group *iommu_group; unsigned long flags; struct arm_smmu_master *master; + struct arm_smmu_master_domain *master_domain;
spin_lock_irqsave(&domain->devices_lock, flags); /* Get smmu master from smmu domain */ - list_for_each_entry(master, &domain->devices, domain_head) { + list_for_each_entry(master_domain, &domain->devices, devices_elm) { + master = master_domain->master; if (master && master->num_streams >= 0) { ret = 0; break;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 7497f4211f4fbdcec5fc5bb4df7f6ccd345966e8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The core code allows the domain to be changed on the fly without a forced stop in BLOCKED/IDENTITY. In this flow the driver should just continually maintain the ATS with no change while the STE is updated.
ATS relies on a linked list smmu_domain->devices to keep track of which masters have the domain programmed, but this list is also used by arm_smmu_share_asid(), unrelated to ats.
Create two new functions to encapsulate this combined logic: arm_smmu_attach_prepare() <caller generates and sets the STE> arm_smmu_attach_commit()
The two functions can sequence both enabling ATS and disabling across the STE store. Have every update of the STE use this sequence.
Installing a S1/S2 domain always enables the ATS if the PCIe device supports it.
The enable flow is now ordered differently to allow it to be hitless:
1) Add the master to the new smmu_domain->devices list 2) Program the STE 3) Enable ATS at PCIe 4) Remove the master from the old smmu_domain
This flow ensures that invalidations to either domain will generate an ATC invalidation to the device while the STE is being switched. Thus we don't need to turn off the ATS anymore for correctness.
The disable flow is the reverse: 1) Disable ATS at PCIe 2) Program the STE 3) Invalidate the ATC 4) Remove the master from the old smmu_domain
Move the nr_ats_masters adjustments to be close to the list manipulations. It is a count of the number of ATS enabled masters currently in the list. This is stricly before and after the STE/CD are revised, and done under the list's spin_lock.
This is part of the bigger picture to allow changing the RID domain while a PASID is in use. If a SVA PASID is relying on ATS to function then changing the RID domain cannot just temporarily toggle ATS off without also wrecking the SVA PASID. The new infrastructure here is organized so that the PASID attach/detach flows will make use of it as well in following patches.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 237 +++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 +- 3 files changed, 177 insertions(+), 71 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c index 417804392ff0..68aa1c61fdb3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -164,7 +164,7 @@ static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, .smmu = &smmu, };
- arm_smmu_make_cdtable_ste(ste, &master); + arm_smmu_make_cdtable_ste(ste, &master, true); }
static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) @@ -231,7 +231,6 @@ static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, { struct arm_smmu_master master = { .smmu = &smmu, - .ats_enabled = ats_enabled, }; struct io_pgtable io_pgtable = {}; struct arm_smmu_domain smmu_domain = { @@ -247,7 +246,7 @@ static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3; io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4;
- arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain); + arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain, ats_enabled); }
static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index bd80073306ab..38fe1f005c97 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1765,7 +1765,7 @@ void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
VISIBLE_IF_KUNIT void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master) + struct arm_smmu_master *master, bool ats_enabled) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1788,7 +1788,7 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, STRTAB_STE_1_S1STALLD : 0) | FIELD_PREP(STRTAB_STE_1_EATS, - master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
if (smmu->features & ARM_SMMU_FEAT_E2H) { /* @@ -1817,7 +1817,8 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, VISIBLE_IF_KUNIT void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) + struct arm_smmu_domain *smmu_domain, + bool ats_enabled) { struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; const struct io_pgtable_cfg *pgtbl_cfg = @@ -1834,7 +1835,7 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
target->data[1] = cpu_to_le64( FIELD_PREP(STRTAB_STE_1_EATS, - master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); + ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) target->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, @@ -2710,22 +2711,16 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master) return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); }
-static void arm_smmu_enable_ats(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) +static void arm_smmu_enable_ats(struct arm_smmu_master *master) { size_t stu; struct pci_dev *pdev; struct arm_smmu_device *smmu = master->smmu;
- /* Don't enable ATS at the endpoint if it's not enabled in the STE */ - if (!master->ats_enabled) - return; - /* Smallest Translation Unit: log2 of the smallest supported granule */ stu = __ffs(smmu->pgsize_bitmap); pdev = to_pci_dev(master->dev);
- atomic_inc(&smmu_domain->nr_ats_masters); /* * ATC invalidation of PASID 0 causes the entire ATC to be flushed. */ @@ -2734,22 +2729,6 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master, dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); }
-static void arm_smmu_disable_ats(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain) -{ - if (!master->ats_enabled) - return; - - pci_disable_ats(to_pci_dev(master->dev)); - /* - * Ensure ATS is disabled at the endpoint before we issue the - * ATC invalidation via the SMMU. - */ - wmb(); - arm_smmu_atc_inv_master(master); - atomic_dec(&smmu_domain->nr_ats_masters); -} - static int arm_smmu_enable_pasid(struct arm_smmu_master *master) { int ret; @@ -2813,46 +2792,181 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, return NULL; }
-static void arm_smmu_detach_dev(struct arm_smmu_master *master) +/* + * If the domain uses the smmu_domain->devices list return the arm_smmu_domain + * structure, otherwise NULL. These domains track attached devices so they can + * issue invalidations. + */ +static struct arm_smmu_domain * +to_smmu_domain_devices(struct iommu_domain *domain) +{ + /* The domain can be NULL only when processing the first attach */ + if (!domain) + return NULL; + if (domain->type & __IOMMU_DOMAIN_PAGING) + return to_smmu_domain(domain); + return NULL; +} + +static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, + struct iommu_domain *domain) { - struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev); + struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; - struct arm_smmu_domain *smmu_domain; unsigned long flags;
- if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING)) + if (!smmu_domain) return;
- smmu_domain = to_smmu_domain(domain); - arm_smmu_disable_ats(master, smmu_domain); - spin_lock_irqsave(&smmu_domain->devices_lock, flags); master_domain = arm_smmu_find_master_domain(smmu_domain, master); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); + if (master->ats_enabled) + atomic_dec(&smmu_domain->nr_ats_masters); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); +} + +struct arm_smmu_attach_state { + /* Inputs */ + struct iommu_domain *old_domain; + struct arm_smmu_master *master; + /* Resulting state */ + bool ats_enabled; +}; + +/* + * Start the sequence to attach a domain to a master. The sequence contains three + * steps: + * arm_smmu_attach_prepare() + * arm_smmu_install_ste_for_dev() + * arm_smmu_attach_commit() + * + * If prepare succeeds then the sequence must be completed. The STE installed + * must set the STE.EATS field according to state.ats_enabled. + * + * If the device supports ATS then this determines if EATS should be enabled + * in the STE, and starts sequencing EATS disable if required. + * + * The change of the EATS in the STE and the PCI ATS config space is managed by + * this sequence to be in the right order so that if PCI ATS is enabled then + * STE.ETAS is enabled. + * + * new_domain can be a non-paging domain. In this case ATS will not be enabled, + * and invalidations won't be tracked. + */ +static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, + struct iommu_domain *new_domain) +{ + struct arm_smmu_master *master = state->master; + struct arm_smmu_master_domain *master_domain; + struct arm_smmu_domain *smmu_domain = + to_smmu_domain_devices(new_domain); + unsigned long flags; + + /* + * arm_smmu_share_asid() must not see two domains pointing to the same + * arm_smmu_master_domain contents otherwise it could randomly write one + * or the other to the CD. + */ + lockdep_assert_held(&arm_smmu_asid_lock); + + if (smmu_domain) { + /* + * The SMMU does not support enabling ATS with bypass/abort. + * When the STE is in bypass (STE.Config[2:0] == 0b100), ATS + * Translation Requests and Translated transactions are denied + * as though ATS is disabled for the stream (STE.EATS == 0b00), + * causing F_BAD_ATS_TREQ and F_TRANSL_FORBIDDEN events + * (IHI0070Ea 5.2 Stream Table Entry). Thus ATS can only be + * enabled if we have arm_smmu_domain, those always have page + * tables. + */ + state->ats_enabled = arm_smmu_ats_supported(master); + + master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); + if (!master_domain) + return -ENOMEM; + master_domain->master = master;
- master->ats_enabled = false; + /* + * During prepare we want the current smmu_domain and new + * smmu_domain to be in the devices list before we change any + * HW. This ensures that both domains will send ATS + * invalidations to the master until we are done. + * + * It is tempting to make this list only track masters that are + * using ATS, but arm_smmu_share_asid() also uses this to change + * the ASID of a domain, unrelated to ATS. + * + * Notice if we are re-attaching the same domain then the list + * will have two identical entries and commit will remove only + * one of them. + */ + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + if (state->ats_enabled) + atomic_inc(&smmu_domain->nr_ats_masters); + list_add(&master_domain->devices_elm, &smmu_domain->devices); + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + } + + if (!state->ats_enabled && master->ats_enabled) { + pci_disable_ats(to_pci_dev(master->dev)); + /* + * This is probably overkill, but the config write for disabling + * ATS should complete before the STE is configured to generate + * UR to avoid AER noise. + */ + wmb(); + } + return 0; +} + +/* + * Commit is done after the STE/CD are configured with the EATS setting. It + * completes synchronizing the PCI device's ATC and finishes manipulating the + * smmu_domain->devices list. + */ +static void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) +{ + struct arm_smmu_master *master = state->master; + + lockdep_assert_held(&arm_smmu_asid_lock); + + if (state->ats_enabled && !master->ats_enabled) { + arm_smmu_enable_ats(master); + } else if (master->ats_enabled) { + /* + * The translation has changed, flush the ATC. At this point the + * SMMU is translating for the new domain and both the old&new + * domain will issue invalidations. + */ + arm_smmu_atc_inv_master(master); + } + master->ats_enabled = state->ats_enabled; + + arm_smmu_remove_master_domain(master, state->old_domain); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret = 0; - unsigned long flags; struct arm_smmu_ste target; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct arm_smmu_device *smmu; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - struct arm_smmu_master_domain *master_domain; + struct arm_smmu_attach_state state = { + .old_domain = iommu_get_domain_for_dev(dev), + }; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr;
if (!fwspec) return -ENOENT;
- master = dev_iommu_priv_get(dev); + state.master = master = dev_iommu_priv_get(dev); smmu = master->smmu;
/* @@ -2882,11 +2996,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENOMEM; }
- master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); - if (!master_domain) - return -ENOMEM; - master_domain->master = master; - /* * Prevent arm_smmu_share_asid() from trying to change the ASID * of either the old or new domain while we are working on it. @@ -2895,13 +3004,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) */ mutex_lock(&arm_smmu_asid_lock);
- arm_smmu_detach_dev(master); - - master->ats_enabled = arm_smmu_ats_supported(master); - - spin_lock_irqsave(&smmu_domain->devices_lock, flags); - list_add(&master_domain->devices_elm, &smmu_domain->devices); - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + ret = arm_smmu_attach_prepare(&state, domain); + if (ret) { + mutex_unlock(&arm_smmu_asid_lock); + return ret; + }
switch (smmu_domain->stage) { case ARM_SMMU_DOMAIN_S1: { @@ -2910,18 +3017,19 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, &target_cd); - arm_smmu_make_cdtable_ste(&target, master); + arm_smmu_make_cdtable_ste(&target, master, state.ats_enabled); arm_smmu_install_ste_for_dev(master, &target); break; } case ARM_SMMU_DOMAIN_S2: - arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); + arm_smmu_make_s2_domain_ste(&target, master, smmu_domain, + state.ats_enabled); arm_smmu_install_ste_for_dev(master, &target); arm_smmu_clear_cd(master, IOMMU_NO_PASID); break; }
- arm_smmu_enable_ats(master, smmu_domain); + arm_smmu_attach_commit(&state); mutex_unlock(&arm_smmu_asid_lock); return 0; } @@ -2950,10 +3058,14 @@ void arm_smmu_remove_pasid(struct arm_smmu_master *master, arm_smmu_clear_cd(master, pasid); }
-static int arm_smmu_attach_dev_ste(struct device *dev, - struct arm_smmu_ste *ste) +static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, + struct device *dev, struct arm_smmu_ste *ste) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_attach_state state = { + .master = master, + .old_domain = iommu_get_domain_for_dev(dev), + };
if (arm_smmu_master_sva_enabled(master)) return -EBUSY; @@ -2964,16 +3076,9 @@ static int arm_smmu_attach_dev_ste(struct device *dev, */ mutex_lock(&arm_smmu_asid_lock);
- /* - * The SMMU does not support enabling ATS with bypass/abort. When the - * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests - * and Translated transactions are denied as though ATS is disabled for - * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and - * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry). - */ - arm_smmu_detach_dev(master); - + arm_smmu_attach_prepare(&state, domain); arm_smmu_install_ste_for_dev(master, ste); + arm_smmu_attach_commit(&state); mutex_unlock(&arm_smmu_asid_lock);
/* @@ -2992,7 +3097,7 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct arm_smmu_master *master = dev_iommu_priv_get(dev);
arm_smmu_make_bypass_ste(master->smmu, &ste); - return arm_smmu_attach_dev_ste(dev, &ste); + return arm_smmu_attach_dev_ste(domain, dev, &ste); }
static const struct iommu_domain_ops arm_smmu_identity_ops = { @@ -3010,7 +3115,7 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_abort_ste(&ste); - return arm_smmu_attach_dev_ste(dev, &ste); + return arm_smmu_attach_dev_ste(domain, dev, &ste); }
static const struct iommu_domain_ops arm_smmu_blocked_ops = { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index c1b7233d1cfe..f30649fc99f1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -839,10 +839,12 @@ void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, struct arm_smmu_ste *target); void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master); + struct arm_smmu_master *master, + bool ats_enabled); void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain); + struct arm_smmu_domain *smmu_domain, + bool ats_enabled); void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, struct mm_struct *mm, u16 asid);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 64efb3def3a53effe01fa750eec6e7369f65e386 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Prepare to allow a S1 domain to be attached to a PASID as well. Keep track of the SSID the domain is using on each master in the arm_smmu_master_domain.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 15 ++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++++++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++- 3 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index edcde13bfcac..8b88bea9eb6a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -47,13 +47,12 @@ arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) struct arm_smmu_master *master = master_domain->master; struct arm_smmu_cd *cdptr;
- /* S1 domains only support RID attachment right now */ - cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); + cdptr = arm_smmu_get_cd_ptr(master, master_domain->ssid); if (WARN_ON(!cdptr)) continue;
arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); - arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, + arm_smmu_write_cd_entry(master, master_domain->ssid, cdptr, &target_cd); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); @@ -298,8 +297,8 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, smmu_domain); }
- arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), start, - size); + arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), start, + size); }
static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) @@ -336,7 +335,7 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); - arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); + arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0);
smmu_mn->cleared = true; mutex_unlock(&sva_lock); @@ -415,8 +414,8 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) */ if (!smmu_mn->cleared) { arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid); - arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, - 0); + arm_smmu_atc_inv_domain_sva(smmu_domain, + mm_get_enqcmd_pasid(mm), 0, 0); }
/* Frees smmu_mn */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 38fe1f005c97..db07e5bbd695 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2246,8 +2246,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) return ret; }
-int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, - unsigned long iova, size_t size) +static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size) { struct arm_smmu_master_domain *master_domain; int i, ret; @@ -2275,8 +2275,6 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, if (!atomic_read(&smmu_domain->nr_ats_masters)) return 0;
- arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); - cmds.num = 0;
arm_smmu_preempt_disable(smmu_domain->smmu); @@ -2288,6 +2286,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, if (!master->ats_enabled) continue;
+ /* + * Non-zero ssid means SVA is co-opting the S1 domain to issue + * invalidations for SVA PASIDs. + */ + if (ssid != IOMMU_NO_PASID) + arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); + else + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd); @@ -2301,6 +2309,19 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, return ret; }
+static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size) +{ + return __arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, + size); +} + +int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size) +{ + return __arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size); +} + /* IO_PGTABLE API */ static void arm_smmu_tlb_inv_context(void *cookie) { @@ -2322,7 +2343,7 @@ static void arm_smmu_tlb_inv_context(void *cookie) cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); } - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, 0, 0); + arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, @@ -2422,7 +2443,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, * Unfortunately, this can't be leaf-only since we may have * zapped an entire table. */ - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, size); + arm_smmu_atc_inv_domain(smmu_domain, iova, size); }
void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, @@ -2778,7 +2799,8 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_master *master) + struct arm_smmu_master *master, + ioasid_t ssid) { struct arm_smmu_master_domain *master_domain;
@@ -2786,7 +2808,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { - if (master_domain->master == master) + if (master_domain->master == master && + master_domain->ssid == ssid) return master_domain; } return NULL; @@ -2819,7 +2842,8 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, return;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, + IOMMU_NO_PASID); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f30649fc99f1..618a8d228618 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -853,6 +853,7 @@ void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; + ioasid_t ssid; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) @@ -884,8 +885,8 @@ void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain); bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd); -int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, - unsigned long iova, size_t size); +int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, + ioasid_t ssid, unsigned long iova, size_t size);
#ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit be7c90de39fdebdba4f9cce7575b71c6b2506ea0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We no longer need a master->sva_enable to control what attaches are allowed. Instead we can tell if the attach is legal based on the current configuration of the master.
Keep track of the number of valid CD entries for SSID's in the cd_table and if the cd_table has been installed in the STE directly so we know what the configuration is.
The attach logic is then made into: - SVA bind, check if the CD is installed - RID attach of S2, block if SSIDs are used - RID attach of IDENTITY/BLOCKING, block if SSIDs are used
arm_smmu_set_pasid() is already checking if it is possible to setup a CD entry, at this patch it means the RID path already set a STE pointing at the CD table.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 ++++++++++----------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 ++++++ 2 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index db07e5bbd695..7bda77895d66 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1513,6 +1513,8 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, struct arm_smmu_cd *cdptr, const struct arm_smmu_cd *target) { + bool target_valid = target->data[0] & cpu_to_le64(CTXDESC_CD_0_V); + bool cur_valid = cdptr->data[0] & cpu_to_le64(CTXDESC_CD_0_V); struct arm_smmu_cd_writer cd_writer = { .writer = { .ops = &arm_smmu_cd_writer_ops, @@ -1521,6 +1523,13 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, .ssid = ssid, };
+ if (ssid != IOMMU_NO_PASID && cur_valid != target_valid) { + if (cur_valid) + master->cd_table.used_ssids--; + else + master->cd_table.used_ssids++; + } + arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); }
@@ -2993,16 +3002,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) state.master = master = dev_iommu_priv_get(dev); smmu = master->smmu;
- /* - * Checking that SVA is disabled ensures that this device isn't bound to - * any mm, and can be safely detached from its old domain. Bonds cannot - * be removed concurrently since we're holding the group mutex. - */ - if (arm_smmu_master_sva_enabled(master)) { - dev_err(dev, "cannot attach - SVA enabled\n"); - return -EBUSY; - } - mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { @@ -3018,7 +3017,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID); if (!cdptr) return -ENOMEM; - } + } else if (arm_smmu_ssids_in_use(&master->cd_table)) + return -EBUSY;
/* * Prevent arm_smmu_share_asid() from trying to change the ASID @@ -3091,7 +3091,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, .old_domain = iommu_get_domain_for_dev(dev), };
- if (arm_smmu_master_sva_enabled(master)) + if (arm_smmu_ssids_in_use(&master->cd_table)) return -EBUSY;
/* diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 618a8d228618..0d648704405e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -654,12 +654,19 @@ struct arm_smmu_ctx_desc_cfg { dma_addr_t cdtab_dma; struct arm_smmu_l1_ctx_desc *l1_desc; unsigned int num_l1_ents; + unsigned int used_ssids; u8 in_ste; u8 s1fmt; /* log2 of the maximum number of CDs supported by this table */ u8 s1cdmax; };
+/* True if the cd table has SSIDS > 0 in use. */ +static inline bool arm_smmu_ssids_in_use(struct arm_smmu_ctx_desc_cfg *cd_table) +{ + return cd_table->used_ssids; +} + struct arm_smmu_s2_cfg { u16 vmid; };
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 1d5f34f0002f9f56d0ca153022cfdead07d45dc6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Allow creating and managing arm_smmu_mater_domain's with a non-zero SSID through the arm_smmu_attach_*() family of functions. This triggers ATC invalidation for the correct SSID in PASID cases and tracks the per-attachment SSID in the struct arm_smmu_master_domain.
Generalize arm_smmu_attach_remove() to be able to remove SSID's as well by ensuring the ATC for the PASID is flushed properly.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++++------- 1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 7bda77895d66..d59e349e0a48 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2235,13 +2235,14 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size, cmd->atc.size = log2_span; }
-static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) +static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, + ioasid_t ssid) { int i, ret; struct arm_smmu_cmdq_ent cmd; struct arm_smmu_cmdq_batch cmds;
- arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
cmds.num = 0; arm_smmu_preempt_disable(master->smmu); @@ -2754,7 +2755,7 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master) /* * ATC invalidation of PASID 0 causes the entire ATC to be flushed. */ - arm_smmu_atc_inv_master(master); + arm_smmu_atc_inv_master(master, IOMMU_NO_PASID); if (pci_enable_ats(pdev, stu)) dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); } @@ -2841,7 +2842,8 @@ to_smmu_domain_devices(struct iommu_domain *domain) }
static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, - struct iommu_domain *domain) + struct iommu_domain *domain, + ioasid_t ssid) { struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; @@ -2851,8 +2853,7 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, return;
spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, - IOMMU_NO_PASID); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); @@ -2866,6 +2867,7 @@ struct arm_smmu_attach_state { /* Inputs */ struct iommu_domain *old_domain; struct arm_smmu_master *master; + ioasid_t ssid; /* Resulting state */ bool ats_enabled; }; @@ -2923,6 +2925,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, if (!master_domain) return -ENOMEM; master_domain->master = master; + master_domain->ssid = state->ssid;
/* * During prepare we want the current smmu_domain and new @@ -2970,17 +2973,20 @@ static void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
if (state->ats_enabled && !master->ats_enabled) { arm_smmu_enable_ats(master); - } else if (master->ats_enabled) { + } else if (state->ats_enabled && master->ats_enabled) { /* * The translation has changed, flush the ATC. At this point the * SMMU is translating for the new domain and both the old&new * domain will issue invalidations. */ - arm_smmu_atc_inv_master(master); + arm_smmu_atc_inv_master(master, state->ssid); + } else if (!state->ats_enabled && master->ats_enabled) { + /* ATS is being switched off, invalidate the entire ATC */ + arm_smmu_atc_inv_master(master, IOMMU_NO_PASID); } master->ats_enabled = state->ats_enabled;
- arm_smmu_remove_master_domain(master, state->old_domain); + arm_smmu_remove_master_domain(master, state->old_domain, state->ssid); }
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) @@ -2992,6 +2998,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_attach_state state = { .old_domain = iommu_get_domain_for_dev(dev), + .ssid = IOMMU_NO_PASID, }; struct arm_smmu_master *master; struct arm_smmu_cd *cdptr; @@ -3089,6 +3096,7 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, struct arm_smmu_attach_state state = { .master = master, .old_domain = iommu_get_domain_for_dev(dev), + .ssid = IOMMU_NO_PASID, };
if (arm_smmu_ssids_in_use(&master->cd_table))
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit d7b2d2ba1b84f4ae7cd94de22f74d6c6c5419de6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently the SVA domain is a naked struct iommu_domain, allocate a struct arm_smmu_domain instead.
This is necessary to be able to use the struct arm_master_domain mechanism.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/8-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
Conflicts: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 21 ++++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++++++++++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ 3 files changed, 33 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 8b88bea9eb6a..83da2b2e8352 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -643,7 +643,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, }
arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); - ret = arm_smmu_set_pasid(master, NULL, id, &target); + ret = arm_smmu_set_pasid(master, to_smmu_domain(domain), id, &target); if (ret) { list_del(&bond->list); arm_smmu_mmu_notifier_put(bond->smmu_mn); @@ -657,7 +657,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) { - kfree(domain); + kfree(to_smmu_domain(domain)); }
static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { @@ -668,13 +668,16 @@ static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct mm_struct *mm) { - struct iommu_domain *domain; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_domain *smmu_domain;
- domain = kzalloc(sizeof(*domain), GFP_KERNEL); - if (!domain) - return ERR_PTR(-ENOMEM); - domain->type = IOMMU_DOMAIN_SVA; - domain->ops = &arm_smmu_sva_domain_ops; + smmu_domain = arm_smmu_domain_alloc(); + if (IS_ERR(smmu_domain)) + return ERR_CAST(smmu_domain); + smmu_domain->domain.type = IOMMU_DOMAIN_SVA; + smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; + smmu_domain->smmu = smmu;
- return domain; + return &smmu_domain->domain; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index d59e349e0a48..50b8181ea0bf 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2519,15 +2519,10 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) } }
-static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) +struct arm_smmu_domain *arm_smmu_domain_alloc(void) { struct arm_smmu_domain *smmu_domain;
- /* - * Allocate the domain and initialise some of its data structures. - * We can't really do anything meaningful until we've added a - * master. - */ smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); if (!smmu_domain) return ERR_PTR(-ENOMEM); @@ -2537,6 +2532,22 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) spin_lock_init(&smmu_domain->devices_lock); INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+ return smmu_domain; +} + +static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) +{ + struct arm_smmu_domain *smmu_domain; + + /* + * Allocate the domain and initialise some of its data structures. + * We can't really do anything meaningful until we've added a + * master. + */ + smmu_domain = arm_smmu_domain_alloc(); + if (IS_ERR(smmu_domain)) + return ERR_CAST(smmu_domain); + if (dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); int ret; @@ -2550,7 +2561,7 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) return &smmu_domain->domain; }
-static void arm_smmu_domain_free(struct iommu_domain *domain) +static void arm_smmu_domain_free_paging(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; @@ -3803,7 +3814,7 @@ static struct iommu_ops arm_smmu_ops = { .sync_dirty_log = arm_smmu_sync_dirty_log, .clear_dirty_log = arm_smmu_clear_dirty_log, #endif - .free = arm_smmu_domain_free, + .free = arm_smmu_domain_free_paging, } };
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 0d648704405e..896ed43cdff6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -871,6 +871,8 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock;
+struct arm_smmu_domain *arm_smmu_domain_alloc(void); + void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 49db2ed23c52f8371c12ab8646df23fa1daad4b2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Fill in the smmu_domain->devices list in the new struct arm_smmu_domain that SVA allocates. Keep track of every SSID and master that is using the domain reusing the logic for the RID attach.
This is the first step to making the SVA invalidation follow the same design as S1/S2 invalidation. At present nothing will read this list.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/9-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvid... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 50b8181ea0bf..9fe5178db5dd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2847,7 +2847,8 @@ to_smmu_domain_devices(struct iommu_domain *domain) /* The domain can be NULL only when processing the first attach */ if (!domain) return NULL; - if (domain->type & __IOMMU_DOMAIN_PAGING) + if ((domain->type & __IOMMU_DOMAIN_PAGING) || + domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); return NULL; } @@ -3080,7 +3081,16 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, const struct arm_smmu_cd *cd) { + struct arm_smmu_attach_state state = { + .master = master, + /* + * For now the core code prevents calling this when a domain is + * already attached, no need to set old_domain. + */ + .ssid = pasid, + }; struct arm_smmu_cd *cdptr; + int ret;
/* The core code validates pasid */
@@ -3090,14 +3100,30 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, cdptr = arm_smmu_alloc_cd_ptr(master, pasid); if (!cdptr) return -ENOMEM; + + mutex_lock(&arm_smmu_asid_lock); + ret = arm_smmu_attach_prepare(&state, &smmu_domain->domain); + if (ret) + goto out_unlock; + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); - return 0; + + arm_smmu_attach_commit(&state); + +out_unlock: + mutex_unlock(&arm_smmu_asid_lock); + return ret; }
void arm_smmu_remove_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid) { + mutex_lock(&arm_smmu_asid_lock); arm_smmu_clear_cd(master, pasid); + if (master->ats_enabled) + arm_smmu_atc_inv_master(master, pasid); + arm_smmu_remove_master_domain(master, &smmu_domain->domain, pasid); + mutex_unlock(&arm_smmu_asid_lock); }
static int arm_smmu_attach_dev_ste(struct iommu_domain *domain,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit d38c28dbefeee03d7dd02004ad80d9676ac54d86 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This removes all the notifier de-duplication logic in the driver and relies on the core code to de-duplicate and allocate only one SVA domain per mm per smmu instance. This naturally gives a 1:1 relationship between SVA domain and mmu notifier.
It is a significant simplication of the flow, as we end up with a single struct arm_smmu_domain for each MM and the invalidation can then be shifted to properly use the masters list like S1/S2 do.
Remove all of the previous mmu_notifier, bond, shared cd, and cd refcount logic entirely.
The logic here is tightly wound together with the unusued BTM support. Since the BTM logic requires holding all the iommu_domains in a global ASID xarray it conflicts with the design to have a single SVA domain per PASID, as multiple SMMU instances will need to have different domains.
Following patches resolve this by making the ASID xarray per-instance instead of global. However, converting the BTM code over to this methodology requires many changes.
Thus, since ARM_SMMU_FEAT_BTM is never enabled, remove the parts of the BTM support for ASID sharing that interact with SVA as well.
A followup series is already working on fully enabling the BTM support, that requires iommufd's VIOMMU feature to bring in the KVM's VMID as well. It will come with an already written patch to bring back the ASID sharing using a per-instance ASID xarray.
https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.koloth... https://lore.kernel.org/linux-iommu/26-v6-228e7adf25eb+4155-smmuv3_newapi_p2...
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Michael Shavit mshavit@google.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/10-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com
Conflicts: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 395 +++--------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 69 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 15 +- 3 files changed, 86 insertions(+), 393 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 83da2b2e8352..5ab49b59a22e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -13,29 +13,9 @@ #include "arm-smmu-v3.h" #include "../../io-pgtable-arm.h"
-struct arm_smmu_mmu_notifier { - struct mmu_notifier mn; - struct arm_smmu_ctx_desc *cd; - bool cleared; - refcount_t refs; - struct list_head list; - struct arm_smmu_domain *domain; -}; - -#define mn_to_smmu(mn) container_of(mn, struct arm_smmu_mmu_notifier, mn) - -struct arm_smmu_bond { - struct mm_struct *mm; - struct arm_smmu_mmu_notifier *smmu_mn; - struct list_head list; -}; - -#define sva_to_bond(handle) \ - container_of(handle, struct arm_smmu_bond, sva) - static DEFINE_MUTEX(sva_lock);
-static void +static void __maybe_unused arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) { struct arm_smmu_master_domain *master_domain; @@ -58,58 +38,6 @@ arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); }
-/* - * Check if the CPU ASID is available on the SMMU side. If a private context - * descriptor is using it, try to replace it. - */ -static struct arm_smmu_ctx_desc * -arm_smmu_share_asid(struct mm_struct *mm, u16 asid) -{ - int ret; - u32 new_asid; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_device *smmu; - struct arm_smmu_domain *smmu_domain; - - cd = xa_load(&arm_smmu_asid_xa, asid); - if (!cd) - return NULL; - - if (cd->mm) { - if (WARN_ON(cd->mm != mm)) - return ERR_PTR(-EINVAL); - /* All devices bound to this mm use the same cd struct. */ - refcount_inc(&cd->refs); - return cd; - } - - smmu_domain = container_of(cd, struct arm_smmu_domain, cd); - smmu = smmu_domain->smmu; - - ret = xa_alloc(&arm_smmu_asid_xa, &new_asid, cd, - XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); - if (ret) - return ERR_PTR(-ENOSPC); - /* - * Race with unmap: TLB invalidations will start targeting the new ASID, - * which isn't assigned yet. We'll do an invalidate-all on the old ASID - * later, so it doesn't matter. - */ - cd->asid = new_asid; - /* - * Update ASID and invalidate CD in all associated masters. There will - * be some overlap between use of both ASIDs, until we invalidate the - * TLB. - */ - arm_smmu_update_s1_domain_cd_entry(smmu_domain); - - /* Invalidate TLB entries previously associated with that context */ - arm_smmu_tlb_inv_asid(smmu, asid); - - xa_erase(&arm_smmu_asid_xa, asid); - return NULL; -} - static u64 page_size_to_cd(void) { static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K || @@ -191,69 +119,6 @@ void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); }
-static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) -{ - u16 asid; - int err = 0; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_ctx_desc *ret = NULL; - - /* Don't free the mm until we release the ASID */ - mmgrab(mm); - - asid = arm64_mm_context_get(mm); - if (!asid) { - err = -ESRCH; - goto out_drop_mm; - } - - cd = kzalloc(sizeof(*cd), GFP_KERNEL); - if (!cd) { - err = -ENOMEM; - goto out_put_context; - } - - refcount_set(&cd->refs, 1); - - mutex_lock(&arm_smmu_asid_lock); - ret = arm_smmu_share_asid(mm, asid); - if (ret) { - mutex_unlock(&arm_smmu_asid_lock); - goto out_free_cd; - } - - err = xa_insert(&arm_smmu_asid_xa, asid, cd, GFP_KERNEL); - mutex_unlock(&arm_smmu_asid_lock); - - if (err) - goto out_free_asid; - - cd->asid = asid; - cd->mm = mm; - - return cd; - -out_free_asid: - arm_smmu_free_asid(cd); -out_free_cd: - kfree(cd); -out_put_context: - arm64_mm_context_put(mm); -out_drop_mm: - mmdrop(mm); - return err < 0 ? ERR_PTR(err) : ret; -} - -static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd) -{ - if (arm_smmu_free_asid(cd)) { - /* Unpin ASID */ - arm64_mm_context_put(cd->mm); - mmdrop(cd->mm); - kfree(cd); - } -} - /* * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this * is used as a threshold to replace per-page TLBI commands to issue in the @@ -268,8 +133,8 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, unsigned long start, unsigned long end) { - struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_domain *smmu_domain = + container_of(mn, struct arm_smmu_domain, mmu_notifier); size_t size;
/* @@ -286,34 +151,22 @@ static void arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, size = 0; }
- if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { - if (!size) - arm_smmu_tlb_inv_asid(smmu_domain->smmu, - smmu_mn->cd->asid); - else - arm_smmu_tlb_inv_range_asid(start, size, - smmu_mn->cd->asid, - PAGE_SIZE, false, - smmu_domain); - } + if (!size) + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + else + arm_smmu_tlb_inv_range_asid(start, size, smmu_domain->cd.asid, + PAGE_SIZE, false, smmu_domain);
- arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), start, - size); + arm_smmu_atc_inv_domain(smmu_domain, start, size); }
static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) { - struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; + struct arm_smmu_domain *smmu_domain = + container_of(mn, struct arm_smmu_domain, mmu_notifier); struct arm_smmu_master_domain *master_domain; unsigned long flags;
- mutex_lock(&sva_lock); - if (smmu_mn->cleared) { - mutex_unlock(&sva_lock); - return; - } - /* * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events, * but disable translation. @@ -325,25 +178,23 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) struct arm_smmu_cd target; struct arm_smmu_cd *cdptr;
- cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); + cdptr = arm_smmu_get_cd_ptr(master, master_domain->ssid); if (WARN_ON(!cdptr)) continue; - arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid); - arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr, + arm_smmu_make_sva_cd(&target, master, NULL, + smmu_domain->cd.asid); + arm_smmu_write_cd_entry(master, master_domain->ssid, cdptr, &target); } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); - arm_smmu_atc_inv_domain_sva(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); - - smmu_mn->cleared = true; - mutex_unlock(&sva_lock); + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + arm_smmu_atc_inv_domain(smmu_domain, 0, 0); }
static void arm_smmu_mmu_notifier_free(struct mmu_notifier *mn) { - kfree(mn_to_smmu(mn)); + kfree(container_of(mn, struct arm_smmu_domain, mmu_notifier)); }
static const struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = { @@ -352,115 +203,6 @@ static const struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = { .free_notifier = arm_smmu_mmu_notifier_free, };
-/* Allocate or get existing MMU notifier for this {domain, mm} pair */ -static struct arm_smmu_mmu_notifier * -arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain, - struct mm_struct *mm) -{ - int ret; - struct arm_smmu_ctx_desc *cd; - struct arm_smmu_mmu_notifier *smmu_mn; - - list_for_each_entry(smmu_mn, &smmu_domain->mmu_notifiers, list) { - if (smmu_mn->mn.mm == mm) { - refcount_inc(&smmu_mn->refs); - return smmu_mn; - } - } - - cd = arm_smmu_alloc_shared_cd(mm); - if (IS_ERR(cd)) - return ERR_CAST(cd); - - smmu_mn = kzalloc(sizeof(*smmu_mn), GFP_KERNEL); - if (!smmu_mn) { - ret = -ENOMEM; - goto err_free_cd; - } - - refcount_set(&smmu_mn->refs, 1); - smmu_mn->cd = cd; - smmu_mn->domain = smmu_domain; - smmu_mn->mn.ops = &arm_smmu_mmu_notifier_ops; - - ret = mmu_notifier_register(&smmu_mn->mn, mm); - if (ret) { - kfree(smmu_mn); - goto err_free_cd; - } - - list_add(&smmu_mn->list, &smmu_domain->mmu_notifiers); - return smmu_mn; - -err_free_cd: - arm_smmu_free_shared_cd(cd); - return ERR_PTR(ret); -} - -static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn) -{ - struct mm_struct *mm = smmu_mn->mn.mm; - struct arm_smmu_ctx_desc *cd = smmu_mn->cd; - struct arm_smmu_domain *smmu_domain = smmu_mn->domain; - - if (!refcount_dec_and_test(&smmu_mn->refs)) - return; - - list_del(&smmu_mn->list); - - /* - * If we went through clear(), we've already invalidated, and no - * new TLB entry can have been formed. - */ - if (!smmu_mn->cleared) { - arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid); - arm_smmu_atc_inv_domain_sva(smmu_domain, - mm_get_enqcmd_pasid(mm), 0, 0); - } - - /* Frees smmu_mn */ - mmu_notifier_put(&smmu_mn->mn); - arm_smmu_free_shared_cd(cd); -} - -static struct arm_smmu_bond *__arm_smmu_sva_bind(struct device *dev, - struct mm_struct *mm) -{ - int ret; - struct arm_smmu_bond *bond; - struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct arm_smmu_domain *smmu_domain; - - if (!(domain->type & __IOMMU_DOMAIN_PAGING)) - return ERR_PTR(-ENODEV); - smmu_domain = to_smmu_domain(domain); - if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) - return ERR_PTR(-ENODEV); - - if (!master || !master->sva_enabled) - return ERR_PTR(-ENODEV); - - bond = kzalloc(sizeof(*bond), GFP_KERNEL); - if (!bond) - return ERR_PTR(-ENOMEM); - - bond->mm = mm; - - bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm); - if (IS_ERR(bond->smmu_mn)) { - ret = PTR_ERR(bond->smmu_mn); - goto err_free_bond; - } - - list_add(&bond->list, &master->bonds); - return bond; - -err_free_bond: - kfree(bond); - return ERR_PTR(ret); -} - bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) { unsigned long reg, fld; @@ -577,11 +319,6 @@ int arm_smmu_master_enable_sva(struct arm_smmu_master *master) int arm_smmu_master_disable_sva(struct arm_smmu_master *master) { mutex_lock(&sva_lock); - if (!list_empty(&master->bonds)) { - dev_err(master->dev, "cannot disable SVA, device is bound\n"); - mutex_unlock(&sva_lock); - return -EBUSY; - } arm_smmu_master_sva_disable_iopf(master); master->sva_enabled = false; mutex_unlock(&sva_lock); @@ -598,66 +335,51 @@ void arm_smmu_sva_notifier_synchronize(void) mmu_notifier_synchronize(); }
-void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t id) -{ - struct mm_struct *mm = domain->mm; - struct arm_smmu_bond *bond = NULL, *t; - struct arm_smmu_master *master = dev_iommu_priv_get(dev); - - arm_smmu_remove_pasid(master, to_smmu_domain(domain), id); - - mutex_lock(&sva_lock); - list_for_each_entry(t, &master->bonds, list) { - if (t->mm == mm) { - bond = t; - break; - } - } - - if (!WARN_ON(!bond)) { - list_del(&bond->list); - arm_smmu_mmu_notifier_put(bond->smmu_mn); - kfree(bond); - } - mutex_unlock(&sva_lock); -} - static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id) { + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_master *master = dev_iommu_priv_get(dev); - struct mm_struct *mm = domain->mm; - struct arm_smmu_bond *bond; struct arm_smmu_cd target; int ret;
- if (mm_get_enqcmd_pasid(mm) != id) + /* Prevent arm_smmu_mm_release from being called while we are attaching */ + if (!mmget_not_zero(domain->mm)) return -EINVAL;
- mutex_lock(&sva_lock); - bond = __arm_smmu_sva_bind(dev, mm); - if (IS_ERR(bond)) { - mutex_unlock(&sva_lock); - return PTR_ERR(bond); - } + /* + * This does not need the arm_smmu_asid_lock because SVA domains never + * get reassigned + */ + arm_smmu_make_sva_cd(&target, master, domain->mm, smmu_domain->cd.asid); + ret = arm_smmu_set_pasid(master, smmu_domain, id, &target);
- arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); - ret = arm_smmu_set_pasid(master, to_smmu_domain(domain), id, &target); - if (ret) { - list_del(&bond->list); - arm_smmu_mmu_notifier_put(bond->smmu_mn); - kfree(bond); - mutex_unlock(&sva_lock); - return ret; - } - mutex_unlock(&sva_lock); - return 0; + mmput(domain->mm); + return ret; }
static void arm_smmu_sva_domain_free(struct iommu_domain *domain) { - kfree(to_smmu_domain(domain)); + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + /* + * Ensure the ASID is empty in the iommu cache before allowing reuse. + */ + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_domain->cd.asid); + + /* + * Notice that the arm_smmu_mm_arch_invalidate_secondary_tlbs op can + * still be called/running at this point. We allow the ASID to be + * reused, and if there is a race then it just suffers harmless + * unnecessary invalidation. + */ + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); + + /* + * Actual free is defered to the SRCU callback + * arm_smmu_mmu_notifier_free() + */ + mmu_notifier_put(&smmu_domain->mmu_notifier); }
static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { @@ -671,6 +393,8 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_domain *smmu_domain; + u32 asid; + int ret;
smmu_domain = arm_smmu_domain_alloc(); if (IS_ERR(smmu_domain)) @@ -679,5 +403,22 @@ struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, smmu_domain->domain.ops = &arm_smmu_sva_domain_ops; smmu_domain->smmu = smmu;
+ ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); + if (ret) + goto err_free; + + smmu_domain->cd.asid = asid; + smmu_domain->mmu_notifier.ops = &arm_smmu_mmu_notifier_ops; + ret = mmu_notifier_register(&smmu_domain->mmu_notifier, mm); + if (ret) + goto err_asid; + return &smmu_domain->domain; + +err_asid: + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); +err_free: + kfree(smmu_domain); + return ERR_PTR(ret); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9fe5178db5dd..e80b792a2a14 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1668,22 +1668,6 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) cd_table->cdtab = NULL; }
-bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd) -{ - bool free; - struct arm_smmu_ctx_desc *old_cd; - - if (!cd->asid) - return false; - - free = refcount_dec_and_test(&cd->refs); - if (free) { - old_cd = xa_erase(&arm_smmu_asid_xa, cd->asid); - WARN_ON(old_cd != cd); - } - return free; -} - /* Stream table manipulation functions */ static void arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) @@ -2256,8 +2240,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, return ret; }
-static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size) +int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size) { struct arm_smmu_master_domain *master_domain; int i, ret; @@ -2296,15 +2280,7 @@ static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- /* - * Non-zero ssid means SVA is co-opting the S1 domain to issue - * invalidations for SVA PASIDs. - */ - if (ssid != IOMMU_NO_PASID) - arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); - else - arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, - &cmd); + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd);
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2319,19 +2295,6 @@ static int __arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, return ret; }
-static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, - unsigned long iova, size_t size) -{ - return __arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, iova, - size); -} - -int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size) -{ - return __arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size); -} - /* IO_PGTABLE API */ static void arm_smmu_tlb_inv_context(void *cookie) { @@ -2530,7 +2493,6 @@ struct arm_smmu_domain *arm_smmu_domain_alloc(void) mutex_init(&smmu_domain->init_mutex); INIT_LIST_HEAD(&smmu_domain->devices); spin_lock_init(&smmu_domain->devices_lock); - INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
return smmu_domain; } @@ -2572,7 +2534,7 @@ static void arm_smmu_domain_free_paging(struct iommu_domain *domain) if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { /* Prevent SVA from touching the CD while we're freeing it */ mutex_lock(&arm_smmu_asid_lock); - arm_smmu_free_asid(&smmu_domain->cd); + xa_erase(&arm_smmu_asid_xa, smmu_domain->cd.asid); mutex_unlock(&arm_smmu_asid_lock); } else { struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; @@ -2590,11 +2552,9 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, u32 asid; struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
- refcount_set(&cd->refs, 1); - /* Prevent SVA from modifying the ASID until it is written to the CD */ mutex_lock(&arm_smmu_asid_lock); - ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd, + ret = xa_alloc(&arm_smmu_asid_xa, &asid, smmu_domain, XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); cd->asid = (u16)asid; mutex_unlock(&arm_smmu_asid_lock); @@ -3094,6 +3054,9 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master,
/* The core code validates pasid */
+ if (smmu_domain->smmu != master->smmu) + return -EINVAL; + if (!master->cd_table.in_ste) return -ENODEV;
@@ -3115,9 +3078,14 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, return ret; }
-void arm_smmu_remove_pasid(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain, ioasid_t pasid) +static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, + struct iommu_domain *domain) { + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *smmu_domain; + + smmu_domain = to_smmu_domain(domain); + mutex_lock(&arm_smmu_asid_lock); arm_smmu_clear_cd(master, pasid); if (master->ats_enabled) @@ -3410,7 +3378,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
master->dev = dev; master->smmu = smmu; - INIT_LIST_HEAD(&master->bonds); dev_iommu_priv_set(dev, master);
ret = arm_smmu_insert_master(smmu, master); @@ -3799,12 +3766,6 @@ static int arm_smmu_def_domain_type(struct device *dev) return ret; }
-static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, - struct iommu_domain *domain) -{ - arm_smmu_sva_remove_dev_pasid(domain, dev, pasid); -} - static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 896ed43cdff6..786652700f5e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -639,9 +639,6 @@ struct arm_smmu_strtab_l1_desc {
struct arm_smmu_ctx_desc { u16 asid; - - refcount_t refs; - struct mm_struct *mm; };
struct arm_smmu_l1_ctx_desc { @@ -788,7 +785,6 @@ struct arm_smmu_master { bool stall_enabled; bool sva_enabled; bool iopf_enabled; - struct list_head bonds; unsigned int ssid_bits; };
@@ -817,7 +813,7 @@ struct arm_smmu_domain { struct list_head devices; spinlock_t devices_lock;
- struct list_head mmu_notifiers; + struct mmu_notifier mmu_notifier;
#ifdef CONFIG_HISI_VIRTCCA_CODA struct list_head node; @@ -886,16 +882,13 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, const struct arm_smmu_cd *cd); -void arm_smmu_remove_pasid(struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain, ioasid_t pasid);
void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, size_t granule, bool leaf, struct arm_smmu_domain *smmu_domain); -bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd); -int arm_smmu_atc_inv_domain_sva(struct arm_smmu_domain *smmu_domain, - ioasid_t ssid, unsigned long iova, size_t size); +int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, + unsigned long iova, size_t size);
#ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); @@ -907,8 +900,6 @@ bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); void arm_smmu_sva_notifier_synchronize(void); struct iommu_domain *arm_smmu_sva_domain_alloc(struct device *dev, struct mm_struct *mm); -void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */ static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit ce26ea9e6e12df01432bd2a1cb8cbfa025b8a977 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The HW supports this, use the S1DSS bits to configure the behavior of SSID=0 which is the RID's translation.
If SSID's are currently being used in the CD table then just update the S1DSS bits in the STE, remove the master_domain and leave ATS alone.
For iommufd the driver design has a small problem that all the unused CD table entries are set with V=0 which will generate an event if VFIO userspace tries to use the CD entry. This patch extends this problem to include the RID as well if PASID is being used.
For BLOCKED with used PASIDs the F_STREAM_DISABLED (STRTAB_STE_1_S1DSS_TERMINATE) event is generated on untagged traffic and a substream CD table entry with V=0 (removed pasid) will generate C_BAD_CD. Arguably there is no advantage to using S1DSS over the CD entry 0 with V=0.
As we don't yet support PASID in iommufd this is a problem to resolve later, possibly by using EPD0 for unused CD table entries instead of V=0, and not using S1DSS for BLOCKED.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/11-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 60 +++++++++++++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 +- 3 files changed, 50 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c index 68aa1c61fdb3..8608504fe786 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -164,7 +164,7 @@ static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, .smmu = &smmu, };
- arm_smmu_make_cdtable_ste(ste, &master, true); + arm_smmu_make_cdtable_ste(ste, &master, true, STRTAB_STE_1_S1DSS_SSID0); }
static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e80b792a2a14..24f071881484 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1215,6 +1215,14 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | STRTAB_STE_1_EATS); used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); + + /* + * See 13.5 Summary of attribute/permission configuration fields + * for the SHCFG behavior. + */ + if (FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) == + STRTAB_STE_1_S1DSS_BYPASS) + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); }
/* S2 translates */ @@ -1758,7 +1766,8 @@ void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
VISIBLE_IF_KUNIT void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, bool ats_enabled) + struct arm_smmu_master *master, bool ats_enabled, + unsigned int s1dss) { struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; struct arm_smmu_device *smmu = master->smmu; @@ -1772,7 +1781,7 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
target->data[1] = cpu_to_le64( - FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1DSS, s1dss) | FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | @@ -1783,6 +1792,11 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_1_EATS, ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+ if ((smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) && + s1dss == STRTAB_STE_1_S1DSS_BYPASS) + target->data[1] |= cpu_to_le64(FIELD_PREP( + STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); + if (smmu->features & ARM_SMMU_FEAT_E2H) { /* * To support BTM the streamworld needs to match the @@ -2839,6 +2853,7 @@ struct arm_smmu_attach_state { /* Inputs */ struct iommu_domain *old_domain; struct arm_smmu_master *master; + bool cd_needs_ats; ioasid_t ssid; /* Resulting state */ bool ats_enabled; @@ -2880,7 +2895,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, */ lockdep_assert_held(&arm_smmu_asid_lock);
- if (smmu_domain) { + if (smmu_domain || state->cd_needs_ats) { /* * The SMMU does not support enabling ATS with bypass/abort. * When the STE is in bypass (STE.Config[2:0] == 0b100), ATS @@ -2892,7 +2907,9 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, * tables. */ state->ats_enabled = arm_smmu_ats_supported(master); + }
+ if (smmu_domain) { master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); if (!master_domain) return -ENOMEM; @@ -3020,7 +3037,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, &target_cd); - arm_smmu_make_cdtable_ste(&target, master, state.ats_enabled); + arm_smmu_make_cdtable_ste(&target, master, state.ats_enabled, + STRTAB_STE_1_S1DSS_SSID0); arm_smmu_install_ste_for_dev(master, &target); break; } @@ -3094,8 +3112,10 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, mutex_unlock(&arm_smmu_asid_lock); }
-static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, - struct device *dev, struct arm_smmu_ste *ste) +static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, + struct device *dev, + struct arm_smmu_ste *ste, + unsigned int s1dss) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_attach_state state = { @@ -3104,16 +3124,28 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, .ssid = IOMMU_NO_PASID, };
- if (arm_smmu_ssids_in_use(&master->cd_table)) - return -EBUSY; - /* * Do not allow any ASID to be changed while are working on the STE, * otherwise we could miss invalidations. */ mutex_lock(&arm_smmu_asid_lock);
- arm_smmu_attach_prepare(&state, domain); + /* + * If the CD table is not in use we can use the provided STE, otherwise + * we use a cdtable STE with the provided S1DSS. + */ + if (arm_smmu_ssids_in_use(&master->cd_table)) { + /* + * If a CD table has to be present then we need to run with ATS + * on even though the RID will fail ATS queries with UR. This is + * because we have no idea what the PASID's need. + */ + state.cd_needs_ats = true; + arm_smmu_attach_prepare(&state, domain); + arm_smmu_make_cdtable_ste(ste, master, state.ats_enabled, s1dss); + } else { + arm_smmu_attach_prepare(&state, domain); + } arm_smmu_install_ste_for_dev(master, ste); arm_smmu_attach_commit(&state); mutex_unlock(&arm_smmu_asid_lock); @@ -3124,7 +3156,6 @@ static int arm_smmu_attach_dev_ste(struct iommu_domain *domain, * descriptor from arm_smmu_share_asid(). */ arm_smmu_clear_cd(master, IOMMU_NO_PASID); - return 0; }
static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, @@ -3134,7 +3165,8 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct arm_smmu_master *master = dev_iommu_priv_get(dev);
arm_smmu_make_bypass_ste(master->smmu, &ste); - return arm_smmu_attach_dev_ste(domain, dev, &ste); + arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS); + return 0; }
static const struct iommu_domain_ops arm_smmu_identity_ops = { @@ -3152,7 +3184,9 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct arm_smmu_ste ste;
arm_smmu_make_abort_ste(&ste); - return arm_smmu_attach_dev_ste(domain, dev, &ste); + arm_smmu_attach_dev_ste(domain, dev, &ste, + STRTAB_STE_1_S1DSS_TERMINATE); + return 0; }
static const struct iommu_domain_ops arm_smmu_blocked_ops = { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 786652700f5e..c4a042ec111d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -842,8 +842,8 @@ void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, struct arm_smmu_ste *target); void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, - bool ats_enabled); + struct arm_smmu_master *master, bool ats_enabled, + unsigned int s1dss); void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 3b5302cbb06af6b62022360066944a1ff6aea0d1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
S1DSS brings in quite a few new transition pairs that are interesting. Test to/from S1DSS_BYPASS <-> S1DSS_SSID0, and BYPASS <-> S1DSS_SSID0.
Test a contrived non-hitless flow to make sure that the logic works.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Michael Shavit mshavit@google.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/12-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 113 +++++++++++++++++- 1 file changed, 108 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c index 8608504fe786..ee90dee4865c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -144,6 +144,14 @@ static void arm_smmu_v3_test_ste_expect_transition( KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); }
+static void arm_smmu_v3_test_ste_expect_non_hitless_transition( + struct kunit *test, const struct arm_smmu_ste *cur, + const struct arm_smmu_ste *target, unsigned int num_syncs_expected) +{ + arm_smmu_v3_test_ste_expect_transition(test, cur, target, + num_syncs_expected, false); +} + static void arm_smmu_v3_test_ste_expect_hitless_transition( struct kunit *test, const struct arm_smmu_ste *cur, const struct arm_smmu_ste *target, unsigned int num_syncs_expected) @@ -155,6 +163,7 @@ static void arm_smmu_v3_test_ste_expect_hitless_transition( static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0;
static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, + unsigned int s1dss, const dma_addr_t dma_addr) { struct arm_smmu_master master = { @@ -164,7 +173,7 @@ static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, .smmu = &smmu, };
- arm_smmu_make_cdtable_ste(ste, &master, true, STRTAB_STE_1_S1DSS_SSID0); + arm_smmu_make_cdtable_ste(ste, &master, true, s1dss); }
static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) @@ -194,7 +203,8 @@ static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test) { struct arm_smmu_ste ste;
- arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste, NUM_EXPECTED_SYNCS(2)); } @@ -203,7 +213,8 @@ static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test) { struct arm_smmu_ste ste;
- arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste, NUM_EXPECTED_SYNCS(2)); } @@ -212,7 +223,8 @@ static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test) { struct arm_smmu_ste ste;
- arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste, NUM_EXPECTED_SYNCS(3)); } @@ -221,11 +233,54 @@ static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test) { struct arm_smmu_ste ste;
- arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste, NUM_EXPECTED_SYNCS(3)); }
+static void arm_smmu_v3_write_ste_test_cdtable_s1dss_change(struct kunit *test) +{ + struct arm_smmu_ste ste; + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + + /* + * Flipping s1dss on a CD table STE only involves changes to the second + * qword of an STE and can be done in a single write. + */ + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &ste, &s1dss_bypass, NUM_EXPECTED_SYNCS(1)); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s1dss_bypass, &ste, NUM_EXPECTED_SYNCS(1)); +} + +static void +arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass(struct kunit *test) +{ + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &s1dss_bypass, &bypass_ste, NUM_EXPECTED_SYNCS(2)); +} + +static void +arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass(struct kunit *test) +{ + struct arm_smmu_ste s1dss_bypass; + + arm_smmu_test_make_cdtable_ste(&s1dss_bypass, STRTAB_STE_1_S1DSS_BYPASS, + fake_cdtab_dma_addr); + arm_smmu_v3_test_ste_expect_hitless_transition( + test, &bypass_ste, &s1dss_bypass, NUM_EXPECTED_SYNCS(2)); +} + static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, bool ats_enabled) { @@ -285,6 +340,48 @@ static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test) NUM_EXPECTED_SYNCS(2)); }
+static void arm_smmu_v3_write_ste_test_s1_to_s2(struct kunit *test) +{ + struct arm_smmu_ste s1_ste; + struct arm_smmu_ste s2_ste; + + arm_smmu_test_make_cdtable_ste(&s1_ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_s2_ste(&s2_ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &s1_ste, &s2_ste, + NUM_EXPECTED_SYNCS(3)); +} + +static void arm_smmu_v3_write_ste_test_s2_to_s1(struct kunit *test) +{ + struct arm_smmu_ste s1_ste; + struct arm_smmu_ste s2_ste; + + arm_smmu_test_make_cdtable_ste(&s1_ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_s2_ste(&s2_ste, true); + arm_smmu_v3_test_ste_expect_hitless_transition(test, &s2_ste, &s1_ste, + NUM_EXPECTED_SYNCS(3)); +} + +static void arm_smmu_v3_write_ste_test_non_hitless(struct kunit *test) +{ + struct arm_smmu_ste ste; + struct arm_smmu_ste ste_2; + + /* + * Although no flow resembles this in practice, one way to force an STE + * update to be non-hitless is to change its CD table pointer as well as + * s1 dss field in the same update. + */ + arm_smmu_test_make_cdtable_ste(&ste, STRTAB_STE_1_S1DSS_SSID0, + fake_cdtab_dma_addr); + arm_smmu_test_make_cdtable_ste(&ste_2, STRTAB_STE_1_S1DSS_BYPASS, + 0x4B4B4b4B4B); + arm_smmu_v3_test_ste_expect_non_hitless_transition( + test, &ste, &ste_2, NUM_EXPECTED_SYNCS(3)); +} + static void arm_smmu_v3_test_cd_expect_transition( struct kunit *test, const struct arm_smmu_cd *cur, const struct arm_smmu_cd *target, unsigned int num_syncs_expected, @@ -438,10 +535,16 @@ static struct kunit_case arm_smmu_v3_test_cases[] = { KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable), KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass), KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable), + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_s1dss_change), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass), + KUNIT_CASE(arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass), KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort), KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2), KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass), KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s1_to_s2), + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_s1), + KUNIT_CASE(arm_smmu_v3_write_ste_test_non_hitless), KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_clear), KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid), KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear),
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 8ee9175c25827240dd84a7adffbfa9c16938ac5d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the STE doesn't point to the CD table we can upgrade it by reprogramming the STE with the appropriate S1DSS. We may also need to turn on ATS at the same time.
Keep track if the installed STE is pointing at the cd_table and the ATS state to trigger this path.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/13-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 ++++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +- 2 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 24f071881484..9283b0741e02 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2695,6 +2695,9 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, master->cd_table.in_ste = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(target->data[0])) == STRTAB_STE_0_CFG_S1_TRANS; + master->ste_ats_enabled = + FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(target->data[1])) == + STRTAB_STE_1_EATS_TRANS;
for (i = 0; i < master->num_streams; ++i) { u32 sid = master->streams[i].id; @@ -3055,10 +3058,36 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
+static void arm_smmu_update_ste(struct arm_smmu_master *master, + struct iommu_domain *sid_domain, + bool ats_enabled) +{ + unsigned int s1dss = STRTAB_STE_1_S1DSS_TERMINATE; + struct arm_smmu_ste ste; + + if (master->cd_table.in_ste && master->ste_ats_enabled == ats_enabled) + return; + + if (sid_domain->type == IOMMU_DOMAIN_IDENTITY) + s1dss = STRTAB_STE_1_S1DSS_BYPASS; + else + WARN_ON(sid_domain->type != IOMMU_DOMAIN_BLOCKED); + + /* + * Change the STE into a cdtable one with SID IDENTITY/BLOCKED behavior + * using s1dss if necessary. If the cd_table is already installed then + * the S1DSS is correct and this will just update the EATS. Otherwise it + * installs the entire thing. This will be hitless. + */ + arm_smmu_make_cdtable_ste(&ste, master, ats_enabled, s1dss); + arm_smmu_install_ste_for_dev(master, &ste); +} + int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, const struct arm_smmu_cd *cd) { + struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct arm_smmu_attach_state state = { .master = master, /* @@ -3075,8 +3104,10 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, if (smmu_domain->smmu != master->smmu) return -EINVAL;
- if (!master->cd_table.in_ste) - return -ENODEV; + if (!master->cd_table.in_ste && + sid_domain->type != IOMMU_DOMAIN_IDENTITY && + sid_domain->type != IOMMU_DOMAIN_BLOCKED) + return -EINVAL;
cdptr = arm_smmu_alloc_cd_ptr(master, pasid); if (!cdptr) @@ -3088,6 +3119,7 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, goto out_unlock;
arm_smmu_write_cd_entry(master, pasid, cdptr, cd); + arm_smmu_update_ste(master, sid_domain, state.ats_enabled);
arm_smmu_attach_commit(&state);
@@ -3110,6 +3142,19 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, arm_smmu_atc_inv_master(master, pasid); arm_smmu_remove_master_domain(master, &smmu_domain->domain, pasid); mutex_unlock(&arm_smmu_asid_lock); + + /* + * When the last user of the CD table goes away downgrade the STE back + * to a non-cd_table one. + */ + if (!arm_smmu_ssids_in_use(&master->cd_table)) { + struct iommu_domain *sid_domain = + iommu_get_domain_for_dev(master->dev); + + if (sid_domain->type == IOMMU_DOMAIN_IDENTITY || + sid_domain->type == IOMMU_DOMAIN_BLOCKED) + sid_domain->ops->attach_dev(sid_domain, dev); + } }
static void arm_smmu_attach_dev_ste(struct iommu_domain *domain, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index c4a042ec111d..978438710f9c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -781,7 +781,8 @@ struct arm_smmu_master { /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; unsigned int num_streams; - bool ats_enabled; + bool ats_enabled : 1; + bool ste_ats_enabled : 1; bool stall_enabled; bool sva_enabled; bool iopf_enabled;
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit f3b273b7c7e42ff7ef5b6063834d768d33c7ba79 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The SVA cleanup made the SSID logic entirely general so all we need to do is call it with the correct cd table entry for a S1 domain.
This is slightly tricky because of the ASID and how the locking works, the simple fix is to just update the ASID once we get the right locks.
Tested-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/14-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvi... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 ++++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- 2 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9283b0741e02..cbcbb7434087 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3058,6 +3058,36 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return 0; }
+static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_cd target_cd; + int ret = 0; + + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->smmu) + ret = arm_smmu_domain_finalise(smmu_domain, smmu); + else if (smmu_domain->smmu != smmu) + ret = -EINVAL; + mutex_unlock(&smmu_domain->init_mutex); + if (ret) + return ret; + + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) + return -EINVAL; + + /* + * We can read cd.asid outside the lock because arm_smmu_set_pasid() + * will fix it + */ + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); + return arm_smmu_set_pasid(master, to_smmu_domain(domain), id, + &target_cd); +} + static void arm_smmu_update_ste(struct arm_smmu_master *master, struct iommu_domain *sid_domain, bool ats_enabled) @@ -3085,7 +3115,7 @@ static void arm_smmu_update_ste(struct arm_smmu_master *master,
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, - const struct arm_smmu_cd *cd) + struct arm_smmu_cd *cd) { struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct arm_smmu_attach_state state = { @@ -3118,6 +3148,14 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master, if (ret) goto out_unlock;
+ /* + * We don't want to obtain to the asid_lock too early, so fix up the + * caller set ASID under the lock in case it changed. + */ + cd->data[0] &= ~cpu_to_le64(CTXDESC_CD_0_ASID); + cd->data[0] |= cpu_to_le64( + FIELD_PREP(CTXDESC_CD_0_ASID, smmu_domain->cd.asid)); + arm_smmu_write_cd_entry(master, pasid, cdptr, cd); arm_smmu_update_ste(master, sid_domain, state.ats_enabled);
@@ -3865,6 +3903,7 @@ static struct iommu_ops arm_smmu_ops = { .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = arm_smmu_attach_dev, + .set_dev_pasid = arm_smmu_s1_set_dev_pasid, .map_pages = arm_smmu_map_pages, .unmap_pages = arm_smmu_unmap_pages, .flush_iotlb_all = arm_smmu_flush_iotlb_all, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 978438710f9c..eb3fe79bfa10 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -882,7 +882,7 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, - const struct arm_smmu_cd *cd); + struct arm_smmu_cd *cd);
void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
From: Shameer Kolothum shameerali.kolothum.thodi@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 52acd7d8a4130ad4dda6540dbbb821a92e1c0138 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This will be used by iommufd for allocating usr managed domains and is also required when we add support for iommufd based dirty tracking support.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240703101604.2576-2-shameerali.kolothum.thodi@hu... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index cbcbb7434087..77044c0c2bfb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -99,6 +99,8 @@ static int arm_smmu_bypass_dev_domain_type(struct device *dev) } #endif
+static struct iommu_ops arm_smmu_ops; + enum arm_smmu_msi_index { EVTQ_MSI_INDEX, GERROR_MSI_INDEX, @@ -3281,6 +3283,34 @@ static struct iommu_domain arm_smmu_blocked_domain = { .ops = &arm_smmu_blocked_ops, };
+static struct iommu_domain * +arm_smmu_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *smmu_domain; + int ret; + + if (flags || parent || user_data) + return ERR_PTR(-EOPNOTSUPP); + + smmu_domain = arm_smmu_domain_alloc(); + if (!smmu_domain) + return ERR_PTR(-ENOMEM); + + smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; + smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops; + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + if (ret) + goto err_free; + return &smmu_domain->domain; + +err_free: + kfree(smmu_domain); + return ERR_PTR(ret); +} + static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) @@ -3473,8 +3503,6 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master) kfree(master->streams); }
-static struct iommu_ops arm_smmu_ops; - static struct iommu_device *arm_smmu_probe_device(struct device *dev) { int ret; @@ -3889,6 +3917,7 @@ static struct iommu_ops arm_smmu_ops = { .capable = arm_smmu_capable, .domain_alloc_paging = arm_smmu_domain_alloc_paging, .domain_alloc_sva = arm_smmu_sva_domain_alloc, + .domain_alloc_user = arm_smmu_domain_alloc_user, .probe_device = arm_smmu_probe_device, .release_device = arm_smmu_release_device, .device_group = arm_smmu_device_group,
From: Jean-Philippe Brucker jean-philippe@linaro.org
mainline inclusion from mainline-v6.11-rc1 commit 2f8d6178b4fe3e2f50782fa640921a9ee46b6d6f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the SMMU supports it and the kernel was built with HTTU support, Probe support for Hardware Translation Table Update (HTTU) which is essentially to enable hardware update of access and dirty flags.
Probe and set the smmu::features for Hardware Dirty and Hardware Access bits. This is in preparation, to enable it on the context descriptors of stage 1 format.
Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org Signed-off-by: Joao Martins joao.m.martins@oracle.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240703101604.2576-3-shameerali.kolothum.thodi@hu... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 44 ++++++++++----------- 1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 77044c0c2bfb..e1da05d3b26f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -4852,28 +4852,6 @@ static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu) #endif }
-static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg) -{ - u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD); - u32 features = 0; - - switch (FIELD_GET(IDR0_HTTU, reg)) { - case IDR0_HTTU_ACCESS_DIRTY: - features |= ARM_SMMU_FEAT_HD; - fallthrough; - case IDR0_HTTU_ACCESS: - features |= ARM_SMMU_FEAT_HA; - } - - if (smmu->dev->of_node) - smmu->features |= features; - else if (features != fw_features) - /* ACPI IORT sets the HTTU bits */ - dev_warn(smmu->dev, - "IDR0.HTTU overridden by FW configuration (0x%x)\n", - fw_features); -} - #ifdef CONFIG_HISILICON_ERRATUM_162100602 static void hisi_smmu_check_errata(struct arm_smmu_device *smmu) { @@ -4913,6 +4891,28 @@ static void hisi_smmu_check_errata(struct arm_smmu_device *smmu) static void hisi_smmu_check_errata(struct arm_smmu_device *smmu) {} #endif
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg) +{ + u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD); + u32 hw_features = 0; + + switch (FIELD_GET(IDR0_HTTU, reg)) { + case IDR0_HTTU_ACCESS_DIRTY: + hw_features |= ARM_SMMU_FEAT_HD; + fallthrough; + case IDR0_HTTU_ACCESS: + hw_features |= ARM_SMMU_FEAT_HA; + } + + if (smmu->dev->of_node) + smmu->features |= hw_features; + else if (hw_features != fw_features) + /* ACPI IORT sets the HTTU bits */ + dev_warn(smmu->dev, + "IDR0.HTTU features(0x%x) overridden by FW configuration (0x%x)\n", + hw_features, fw_features); +} + static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) { u32 reg;
From: Shameer Kolothum shameerali.kolothum.thodi@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit 4fe88fd8b4aecb7f9680bf898811db76b94095a9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
.read_and_clear_dirty() IOMMU domain op takes care of reading the dirty bits (i.e. PTE has DBM set and AP[2] clear) and marshalling into a bitmap of a given page size.
While reading the dirty bits we also set the PTE AP[2] bit to mark it as writeable-clean depending on read_and_clear_dirty() flags.
PTE states with respect to DBM bit:
DBM bit AP[2]("RDONLY" bit) 1. writable_clean 1 1 2. writable_dirty 1 0 3. read-only 0 1
Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240703101604.2576-4-shameerali.kolothum.thodi@hu... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/io-pgtable-arm.c | 116 +++++++++++++++++++++++++++++++-- 1 file changed, 111 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 07c04322bc58..5e89e572b890 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -87,7 +87,7 @@
#define ARM_LPAE_PTE_ATTR_LO_MASK (((arm_lpae_iopte)0x3ff) << 2) /* Ignore the contiguous bit for block splitting */ -#define ARM_LPAE_PTE_ATTR_HI_MASK (((arm_lpae_iopte)13) << 51) +#define ARM_LPAE_PTE_ATTR_HI_MASK (ARM_LPAE_PTE_XN | ARM_LPAE_PTE_DBM) #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK | \ ARM_LPAE_PTE_ATTR_HI_MASK) /* Software bit for solving coherency races */ @@ -95,7 +95,11 @@
/* Stage-1 PTE */ #define ARM_LPAE_PTE_AP_UNPRIV (((arm_lpae_iopte)1) << 6) -#define ARM_LPAE_PTE_AP_RDONLY (((arm_lpae_iopte)2) << 6) +#define ARM_LPAE_PTE_AP_RDONLY_BIT 7 +#define ARM_LPAE_PTE_AP_RDONLY (((arm_lpae_iopte)1) << \ + ARM_LPAE_PTE_AP_RDONLY_BIT) +#define ARM_LPAE_PTE_AP_WR_CLEAN_MASK (ARM_LPAE_PTE_AP_RDONLY | \ + ARM_LPAE_PTE_DBM) #define ARM_LPAE_PTE_ATTRINDX_SHIFT 2 #define ARM_LPAE_PTE_nG (((arm_lpae_iopte)1) << 11)
@@ -141,6 +145,12 @@
#define iopte_prot(pte) ((pte) & ARM_LPAE_PTE_ATTR_MASK)
+#define iopte_writeable_dirty(pte) \ + (((pte) & ARM_LPAE_PTE_AP_WR_CLEAN_MASK) == ARM_LPAE_PTE_DBM) + +#define iopte_set_writeable_clean(ptep) \ + set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long *)(ptep)) + struct arm_lpae_io_pgtable { struct io_pgtable iop;
@@ -162,6 +172,13 @@ static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl, return iopte_type(pte) == ARM_LPAE_PTE_TYPE_BLOCK; }
+static inline bool iopte_table(arm_lpae_iopte pte, int lvl) +{ + if (lvl == (ARM_LPAE_MAX_LEVELS - 1)) + return false; + return iopte_type(pte) == ARM_LPAE_PTE_TYPE_TABLE; +} + static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, struct arm_lpae_io_pgtable *data) { @@ -1044,9 +1061,6 @@ static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data, size_t base, next_size; int nbits, ret, i;
- if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) - return -EINVAL; - ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); pte = READ_ONCE(*ptep); if (WARN_ON(!pte)) @@ -1123,6 +1137,97 @@ static int arm_lpae_clear_dirty_log(struct io_pgtable_ops *ops, bitmap, base_iova, bitmap_pgshift); }
+struct io_pgtable_walk_data { + struct iommu_dirty_bitmap *dirty; + unsigned long flags; + u64 addr; + const u64 end; +}; + +static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data, + struct io_pgtable_walk_data *walk_data, + arm_lpae_iopte *ptep, + int lvl); + +static int io_pgtable_visit_dirty(struct arm_lpae_io_pgtable *data, + struct io_pgtable_walk_data *walk_data, + arm_lpae_iopte *ptep, int lvl) +{ + struct io_pgtable *iop = &data->iop; + arm_lpae_iopte pte = READ_ONCE(*ptep); + + if (iopte_leaf(pte, lvl, iop->fmt)) { + size_t size = ARM_LPAE_BLOCK_SIZE(lvl, data); + + if (iopte_writeable_dirty(pte)) { + iommu_dirty_bitmap_record(walk_data->dirty, + walk_data->addr, size); + if (!(walk_data->flags & IOMMU_DIRTY_NO_CLEAR)) + iopte_set_writeable_clean(ptep); + } + walk_data->addr += size; + return 0; + } + + if (WARN_ON(!iopte_table(pte, lvl))) + return -EINVAL; + + ptep = iopte_deref(pte, data); + return __arm_lpae_iopte_walk_dirty(data, walk_data, ptep, lvl + 1); +} + +static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data, + struct io_pgtable_walk_data *walk_data, + arm_lpae_iopte *ptep, + int lvl) +{ + u32 idx; + int max_entries, ret; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return -EINVAL; + + if (lvl == data->start_level) + max_entries = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte); + else + max_entries = ARM_LPAE_PTES_PER_TABLE(data); + + for (idx = ARM_LPAE_LVL_IDX(walk_data->addr, lvl, data); + (idx < max_entries) && (walk_data->addr < walk_data->end); ++idx) { + ret = io_pgtable_visit_dirty(data, walk_data, ptep + idx, lvl); + if (ret) + return ret; + } + + return 0; +} + +static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops, + unsigned long iova, size_t size, + unsigned long flags, + struct iommu_dirty_bitmap *dirty) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = &data->iop.cfg; + struct io_pgtable_walk_data walk_data = { + .dirty = dirty, + .flags = flags, + .addr = iova, + .end = iova + size, + }; + arm_lpae_iopte *ptep = data->pgd; + int lvl = data->start_level; + + if (WARN_ON(!size)) + return -EINVAL; + if (WARN_ON((iova + size - 1) & ~(BIT(cfg->ias) - 1))) + return -EINVAL; + if (data->iop.fmt != ARM_64_LPAE_S1) + return -EINVAL; + + return __arm_lpae_iopte_walk_dirty(data, &walk_data, ptep, lvl); +} + static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { unsigned long granule, page_sizes; @@ -1205,6 +1310,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) .merge_page = arm_lpae_merge_page, .sync_dirty_log = arm_lpae_sync_dirty_log, .clear_dirty_log = arm_lpae_clear_dirty_log, + .read_and_clear_dirty = arm_lpae_read_and_clear_dirty, };
return data;
From: Joao Martins joao.m.martins@oracle.com
mainline inclusion from mainline-v6.11-rc1 commit eb054d67b21a53f6ccf3af49a62fb99397b48fc2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This provides all the infrastructure to enable dirty tracking if the hardware has the capability and domain alloc request for it.
Also, add a device_iommu_capable() check in iommufd core for IOMMU_CAP_DIRTY_TRACKING before we request a user domain with dirty tracking support.
Please note, we still report no support for IOMMU_CAP_DIRTY_TRACKING as it will finally be enabled in a subsequent patch.
Signed-off-by: Joao Martins joao.m.martins@oracle.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240703101604.2576-5-shameerali.kolothum.thodi@hu... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 84 +++++++++++++++------ drivers/iommu/iommufd/hw_pagetable.c | 3 + include/linux/io-pgtable.h | 2 +- 3 files changed, 65 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e1da05d3b26f..e4cbb440bb46 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -27,6 +27,7 @@ #include <linux/pci-ats.h> #include <linux/platform_device.h> #include <kunit/visibility.h> +#include <uapi/linux/iommufd.h>
#include "arm-smmu-v3.h" #include "../../dma-iommu.h" @@ -100,6 +101,7 @@ static int arm_smmu_bypass_dev_domain_type(struct device *dev) #endif
static struct iommu_ops arm_smmu_ops; +static struct iommu_dirty_ops arm_smmu_dirty_ops;
enum arm_smmu_msi_index { EVTQ_MSI_INDEX, @@ -146,7 +148,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { };
static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_device *smmu); + struct arm_smmu_device *smmu, u32 flags); static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
static void parse_driver_options(struct arm_smmu_device *smmu) @@ -2530,7 +2532,7 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) struct arm_smmu_master *master = dev_iommu_priv_get(dev); int ret;
- ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu, 0); if (ret) { kfree(smmu_domain); return ERR_PTR(ret); @@ -2594,15 +2596,15 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, }
static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_device *smmu) + struct arm_smmu_device *smmu, u32 flags) { int ret; - unsigned long ias, oas; enum io_pgtable_fmt fmt; struct io_pgtable_cfg pgtbl_cfg; struct io_pgtable_ops *pgtbl_ops; int (*finalise_stage_fn)(struct arm_smmu_device *smmu, struct arm_smmu_domain *smmu_domain); + bool enable_dirty = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
/* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) @@ -2615,17 +2617,31 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, virtcca_smmu_set_stage(domain, smmu_domain); #endif
+ pgtbl_cfg = (struct io_pgtable_cfg) { + .pgsize_bitmap = smmu->pgsize_bitmap, + .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, + .tlb = &arm_smmu_flush_ops, + .iommu_dev = smmu->dev, + }; + switch (smmu_domain->stage) { - case ARM_SMMU_DOMAIN_S1: - ias = (smmu->features & ARM_SMMU_FEAT_VAX) ? 52 : 48; - ias = min_t(unsigned long, ias, VA_BITS); - oas = smmu->ias; + case ARM_SMMU_DOMAIN_S1: { + unsigned long ias = (smmu->features & + ARM_SMMU_FEAT_VAX) ? 52 : 48; + + pgtbl_cfg.ias = min_t(unsigned long, ias, VA_BITS); + pgtbl_cfg.oas = smmu->ias; + if (enable_dirty) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD; fmt = ARM_64_LPAE_S1; finalise_stage_fn = arm_smmu_domain_finalise_s1; break; + } case ARM_SMMU_DOMAIN_S2: - ias = smmu->ias; - oas = smmu->oas; + if (enable_dirty) + return -EOPNOTSUPP; + pgtbl_cfg.ias = smmu->ias; + pgtbl_cfg.oas = smmu->oas; fmt = ARM_64_LPAE_S2; finalise_stage_fn = arm_smmu_domain_finalise_s2; break; @@ -2633,15 +2649,6 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, return -EINVAL; }
- pgtbl_cfg = (struct io_pgtable_cfg) { - .pgsize_bitmap = smmu->pgsize_bitmap, - .ias = ias, - .oas = oas, - .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, - .tlb = &arm_smmu_flush_ops, - .iommu_dev = smmu->dev, - }; - if (smmu->features & ARM_SMMU_FEAT_HD) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
@@ -2657,6 +2664,8 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, smmu_domain->domain.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; smmu_domain->domain.geometry.force_aperture = true; + if (enable_dirty && smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + smmu_domain->domain.dirty_ops = &arm_smmu_dirty_ops;
ret = finalise_stage_fn(smmu, smmu_domain); if (ret < 0) { @@ -3006,7 +3015,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) mutex_lock(&smmu_domain->init_mutex);
if (!smmu_domain->smmu) { - ret = arm_smmu_domain_finalise(smmu_domain, smmu); + ret = arm_smmu_domain_finalise(smmu_domain, smmu, 0); } else if (smmu_domain->smmu != smmu) ret = -EINVAL;
@@ -3071,7 +3080,7 @@ static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain,
mutex_lock(&smmu_domain->init_mutex); if (!smmu_domain->smmu) - ret = arm_smmu_domain_finalise(smmu_domain, smmu); + ret = arm_smmu_domain_finalise(smmu_domain, smmu, 0); else if (smmu_domain->smmu != smmu) ret = -EINVAL; mutex_unlock(&smmu_domain->init_mutex); @@ -3289,10 +3298,13 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, const struct iommu_user_data *user_data) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING; struct arm_smmu_domain *smmu_domain; int ret;
- if (flags || parent || user_data) + if (flags & ~PAGING_FLAGS) + return ERR_PTR(-EOPNOTSUPP); + if (parent || user_data) return ERR_PTR(-EOPNOTSUPP);
smmu_domain = arm_smmu_domain_alloc(); @@ -3301,7 +3313,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags,
smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops; - ret = arm_smmu_domain_finalise(smmu_domain, master->smmu); + ret = arm_smmu_domain_finalise(smmu_domain, master->smmu, flags); if (ret) goto err_free; return &smmu_domain->domain; @@ -3578,6 +3590,27 @@ static void arm_smmu_release_device(struct device *dev) kfree(master); }
+static int arm_smmu_read_and_clear_dirty(struct iommu_domain *domain, + unsigned long iova, size_t size, + unsigned long flags, + struct iommu_dirty_bitmap *dirty) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + + return ops->read_and_clear_dirty(ops, iova, size, flags, dirty); +} + +static int arm_smmu_set_dirty_tracking(struct iommu_domain *domain, + bool enabled) +{ + /* + * Always enabled and the dirty bitmap is cleared prior to + * set_dirty_tracking(). + */ + return 0; +} + static struct iommu_group *arm_smmu_device_group(struct device *dev) { struct iommu_group *group; @@ -3952,6 +3985,11 @@ static struct iommu_ops arm_smmu_ops = { } };
+static struct iommu_dirty_ops arm_smmu_dirty_ops = { + .read_and_clear_dirty = arm_smmu_read_and_clear_dirty, + .set_dirty_tracking = arm_smmu_set_dirty_tracking, +}; + /* Probing and initialisation functions */ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, struct arm_smmu_queue *q, diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index a9f1fe44c4c0..5fc233ae6ba4 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -114,6 +114,9 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, return ERR_PTR(-EOPNOTSUPP); if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP); + if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) && + !device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING)) + return ERR_PTR(-EOPNOTSUPP);
hwpt_paging = __iommufd_object_alloc( ictx, hwpt_paging, IOMMUFD_OBJ_HWPT_PAGING, common.obj); diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 0796205afc6d..954119ae0f84 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -86,7 +86,7 @@ struct io_pgtable_cfg { * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability * attributes set in the TCR for a non-coherent page-table walker. * - * IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status. + * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking in stage 1 pagetable. * * IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior * when changing block size.
mainline inclusion from mainline-v6.11-rc1 commit 25c776dd03b3e3ee16ad3402feabe20d811c7cb2 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If io-pgtable quirk flag indicates support for hardware update of dirty state, enable HA/HD bits in the SMMUÂ CD and also set the DBM bit in the page descriptor.
Now report the dirty page tracking capability of SMMUv3 and select IOMMUFD_DRIVER for ARM_SMMU_V3 if IOMMUFD is enabled.
Co-developed-by: Keqian Zhu zhukeqian1@huawei.com Signed-off-by: Keqian Zhu zhukeqian1@huawei.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com Signed-off-by: Joao Martins joao.m.martins@oracle.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Link: https://lore.kernel.org/r/20240703101604.2576-6-shameerali.kolothum.thodi@hu... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/Kconfig | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 +++++++++++++---- 2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 88fc30074eb3..b035e042abb1 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -391,6 +391,7 @@ config ARM_SMMU_V3 select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select GENERIC_MSI_IRQ + select IOMMUFD_DRIVER if IOMMUFD help Support for implementations of the ARM System MMU architecture version 3 providing translation support to a PCIe root complex. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e4cbb440bb46..9d798f1fa010 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1577,10 +1577,10 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) );
- if (master->smmu->features & ARM_SMMU_FEAT_HD) - target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HD); - if (master->smmu->features & ARM_SMMU_FEAT_HA) - target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA); + /* To enable dirty flag update, set both Access flag and dirty state update */ + if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD) + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_HA | + CTXDESC_CD_0_TCR_HD);
target->data[1] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.ttbr & CTXDESC_CD_1_TTB0_MASK); @@ -2483,6 +2483,13 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_add_page = arm_smmu_tlb_inv_page_nosync, };
+static bool arm_smmu_dbm_capable(struct arm_smmu_device *smmu) +{ + u32 features = (ARM_SMMU_FEAT_HD | ARM_SMMU_FEAT_COHERENCY); + + return (smmu->features & features) == features; +} + /* IOMMU API */ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) { @@ -2495,6 +2502,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) case IOMMU_CAP_NOEXEC: case IOMMU_CAP_DEFERRED_FLUSH: return true; + case IOMMU_CAP_DIRTY_TRACKING: + return arm_smmu_dbm_capable(master->smmu); default: return false; }
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 14678219cf4093e897ab353fd78eab7994d1be7d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently, when attaching a domain to a device or its PASID, domain is stored within the iommu group. It could be retrieved for use during the window between attachment and detachment.
With new features introduced, there's a need to store more information than just a domain pointer. This information essentially represents the association between a domain and a device. For example, the SVA code already has a custom struct iommu_sva which represents a bond between sva domain and a PASID of a device. Looking forward, the IOMMUFD needs a place to store the iommufd_device pointer in the core, so that the device object ID could be quickly retrieved in the critical fault handling path.
Introduce domain attachment handle that explicitly represents the attachment relationship between a domain and a device or its PASID.
Co-developed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240702063444.105814-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/dma/idxd/init.c | 2 +- drivers/iommu/iommu-sva.c | 13 ++++++++----- drivers/iommu/iommu.c | 26 ++++++++++++++++---------- include/linux/iommu.h | 18 +++++++++++++++--- 4 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index a7295943fa22..385c488c9cd1 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -584,7 +584,7 @@ static int idxd_enable_system_pasid(struct idxd_device *idxd) * DMA domain is owned by the driver, it should support all valid * types such as DMA-FQ, identity, etc. */ - ret = iommu_attach_device_pasid(domain, dev, pasid); + ret = iommu_attach_device_pasid(domain, dev, pasid, NULL); if (ret) { dev_err(dev, "failed to attach device pasid %d, domain type %d", pasid, domain->type); diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 18a35e798b72..0fb923254062 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -99,7 +99,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
/* Search for an existing domain. */ list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { - ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); + handle->handle.domain = domain; + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, + &handle->handle); if (!ret) { domain->users++; goto out; @@ -113,7 +115,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_free_handle; }
- ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); + handle->handle.domain = domain; + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, + &handle->handle); if (ret) goto out_free_domain; domain->users = 1; @@ -124,7 +128,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm list_add(&handle->handle_item, &mm->iommu_mm->sva_handles); mutex_unlock(&iommu_sva_lock); handle->dev = dev; - handle->domain = domain; return handle;
out_free_domain: @@ -147,7 +150,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device); */ void iommu_sva_unbind_device(struct iommu_sva *handle) { - struct iommu_domain *domain = handle->domain; + struct iommu_domain *domain = handle->handle.domain; struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm; struct device *dev = handle->dev;
@@ -170,7 +173,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
u32 iommu_sva_get_pasid(struct iommu_sva *handle) { - struct iommu_domain *domain = handle->domain; + struct iommu_domain *domain = handle->handle.domain;
return mm_get_enqcmd_pasid(domain->mm); } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ddcf0a3c03d3..463eba849495 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3564,16 +3564,17 @@ static void __iommu_remove_group_pasid(struct iommu_group *group, * @domain: the iommu domain. * @dev: the attached device. * @pasid: the pasid of the device. + * @handle: the attach handle. * * Return: 0 on success, or an error. */ int iommu_attach_device_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t pasid) + struct device *dev, ioasid_t pasid, + struct iommu_attach_handle *handle) { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; struct group_device *device; - void *curr; int ret;
if (!domain->ops->set_dev_pasid) @@ -3594,11 +3595,12 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, } }
- curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); - if (curr) { - ret = xa_err(curr) ? : -EBUSY; + if (handle) + handle->domain = domain; + + ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL); + if (ret) goto out_unlock; - }
ret = __iommu_set_group_pasid(domain, group, pasid); if (ret) @@ -3626,7 +3628,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
mutex_lock(&group->mutex); __iommu_remove_group_pasid(group, pasid, domain); - WARN_ON(xa_erase(&group->pasid_array, pasid) != domain); + xa_erase(&group->pasid_array, pasid); mutex_unlock(&group->mutex); } EXPORT_SYMBOL_GPL(iommu_detach_device_pasid); @@ -3651,15 +3653,19 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev, { /* Caller must be a probed driver on dev */ struct iommu_group *group = dev->iommu_group; - struct iommu_domain *domain; + struct iommu_attach_handle *handle; + struct iommu_domain *domain = NULL;
if (!group) return NULL;
xa_lock(&group->pasid_array); - domain = xa_load(&group->pasid_array, pasid); + handle = xa_load(&group->pasid_array, pasid); + if (handle) + domain = handle->domain; + if (type && domain && domain->type != type) - domain = ERR_PTR(-EBUSY); + domain = NULL; xa_unlock(&group->pasid_array);
return domain; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 468e329da33f..4634e70ad699 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1107,12 +1107,22 @@ struct iommu_fwspec { /* ATS is supported */ #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
+/* + * An iommu attach handle represents a relationship between an iommu domain + * and a PASID or RID of a device. It is allocated and managed by the component + * that manages the domain and is stored in the iommu group during the time the + * domain is attached. + */ +struct iommu_attach_handle { + struct iommu_domain *domain; +}; + /** * struct iommu_sva - handle to a device-mm bond */ struct iommu_sva { + struct iommu_attach_handle handle; struct device *dev; - struct iommu_domain *domain; struct list_head handle_item; refcount_t users; }; @@ -1170,7 +1180,8 @@ int iommu_device_claim_dma_owner(struct device *dev, void *owner); void iommu_device_release_dma_owner(struct device *dev);
int iommu_attach_device_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t pasid); + struct device *dev, ioasid_t pasid, + struct iommu_attach_handle *handle); void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); struct iommu_domain * @@ -1536,7 +1547,8 @@ static inline int iommu_device_claim_dma_owner(struct device *dev, void *owner) }
static inline int iommu_attach_device_pasid(struct iommu_domain *domain, - struct device *dev, ioasid_t pasid) + struct device *dev, ioasid_t pasid, + struct iommu_attach_handle *handle) { return -ENODEV; }
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 3e7f57d1ef3f5fbed58974fae38d35e430f57d35 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The struct sva_iommu represents an association between an SVA domain and a PASID of a device. It's stored in the iommu group's pasid array and also tracked by a list in the per-mm data structure. Removes duplicate tracking of sva_iommu by eliminating the list.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240702063444.105814-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommu-priv.h | 3 +++ drivers/iommu/iommu-sva.c | 30 ++++++++++++++++++++---------- drivers/iommu/iommu.c | 31 +++++++++++++++++++++++++++++++ include/linux/iommu.h | 2 -- 4 files changed, 54 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index 2024a2313348..7708f60bd01a 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -27,4 +27,7 @@ void iommu_device_unregister_bus(struct iommu_device *iommu, struct bus_type *bus, struct notifier_block *nb);
+struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group, + ioasid_t pasid, + unsigned int type); #endif /* __LINUX_IOMMU_PRIV_H */ diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 0fb923254062..9b7f62517419 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -41,7 +41,6 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de } iommu_mm->pasid = pasid; INIT_LIST_HEAD(&iommu_mm->sva_domains); - INIT_LIST_HEAD(&iommu_mm->sva_handles); /* * Make sure the write to mm->iommu_mm is not reordered in front of * initialization to iommu_mm fields. If it does, readers may see a @@ -69,11 +68,16 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de */ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) { + struct iommu_group *group = dev->iommu_group; + struct iommu_attach_handle *attach_handle; struct iommu_mm_data *iommu_mm; struct iommu_domain *domain; struct iommu_sva *handle; int ret;
+ if (!group) + return ERR_PTR(-ENODEV); + mutex_lock(&iommu_sva_lock);
/* Allocate mm->pasid if necessary. */ @@ -83,12 +87,22 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_unlock; }
- list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) { - if (handle->dev == dev) { - refcount_inc(&handle->users); - mutex_unlock(&iommu_sva_lock); - return handle; + /* A bond already exists, just take a reference`. */ + attach_handle = iommu_attach_handle_get(group, iommu_mm->pasid, IOMMU_DOMAIN_SVA); + if (!IS_ERR(attach_handle)) { + handle = container_of(attach_handle, struct iommu_sva, handle); + if (attach_handle->domain->mm != mm) { + ret = -EBUSY; + goto out_unlock; } + refcount_inc(&handle->users); + mutex_unlock(&iommu_sva_lock); + return handle; + } + + if (PTR_ERR(attach_handle) != -ENOENT) { + ret = PTR_ERR(attach_handle); + goto out_unlock; }
handle = kzalloc(sizeof(*handle), GFP_KERNEL); @@ -99,7 +113,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
/* Search for an existing domain. */ list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { - handle->handle.domain = domain; ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, &handle->handle); if (!ret) { @@ -115,7 +128,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_free_handle; }
- handle->handle.domain = domain; ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, &handle->handle); if (ret) @@ -125,7 +137,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
out: refcount_set(&handle->users, 1); - list_add(&handle->handle_item, &mm->iommu_mm->sva_handles); mutex_unlock(&iommu_sva_lock); handle->dev = dev; return handle; @@ -159,7 +170,6 @@ void iommu_sva_unbind_device(struct iommu_sva *handle) mutex_unlock(&iommu_sva_lock); return; } - list_del(&handle->handle_item);
iommu_detach_device_pasid(domain, dev, iommu_mm->pasid); if (--domain->users == 0) { diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 463eba849495..2e05e2725e36 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3698,3 +3698,34 @@ void iommu_free_global_pasid(ioasid_t pasid) ida_free(&iommu_global_pasid_ida, pasid); } EXPORT_SYMBOL_GPL(iommu_free_global_pasid); + +/** + * iommu_attach_handle_get - Return the attach handle + * @group: the iommu group that domain was attached to + * @pasid: the pasid within the group + * @type: matched domain type, 0 for any match + * + * Return handle or ERR_PTR(-ENOENT) on none, ERR_PTR(-EBUSY) on mismatch. + * + * Return the attach handle to the caller. The life cycle of an iommu attach + * handle is from the time when the domain is attached to the time when the + * domain is detached. Callers are required to synchronize the call of + * iommu_attach_handle_get() with domain attachment and detachment. The attach + * handle can only be used during its life cycle. + */ +struct iommu_attach_handle * +iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type) +{ + struct iommu_attach_handle *handle; + + xa_lock(&group->pasid_array); + handle = xa_load(&group->pasid_array, pasid); + if (!handle) + handle = ERR_PTR(-ENOENT); + else if (type && handle->domain->type != type) + handle = ERR_PTR(-EBUSY); + xa_unlock(&group->pasid_array); + + return handle; +} +EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 4634e70ad699..bc61593a4a9f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1123,14 +1123,12 @@ struct iommu_attach_handle { struct iommu_sva { struct iommu_attach_handle handle; struct device *dev; - struct list_head handle_item; refcount_t users; };
struct iommu_mm_data { u32 pasid; struct list_head sva_domains; - struct list_head sva_handles; };
int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 06cdcc32d65759d42c6340700796e2906045b6a5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Previously, the domain that a page fault targets is stored in an iopf_group, which represents a minimal set of page faults. With the introduction of attach handle, replace the domain with the handle so that the fault handler can obtain more information as needed when handling the faults.
iommu_report_device_fault() is currently used for SVA page faults, which handles the page fault in an internal cycle. The domain is retrieved with iommu_get_domain_for_dev_pasid() if the pasid in the fault message is valid. This doesn't work in IOMMUFD case, where if the pasid table of a device is wholly managed by user space, there is no domain attached to the PASID of the device, and all page faults are forwarded through a NESTING domain attaching to RID.
Add a static flag in iommu ops, which indicates if the IOMMU driver supports user-managed PASID tables. In the iopf deliver path, if no attach handle found for the iopf PASID, roll back to RID domain when the IOMMU driver supports this capability.
iommu_get_domain_for_dev_pasid() is no longer used and can be removed.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240702063444.105814-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/io-pgfault.c | 61 +++++++++++++++++++++----------------- drivers/iommu/iommu-sva.c | 3 +- drivers/iommu/iommu.c | 39 ------------------------ include/linux/iommu.h | 17 ++++------- 4 files changed, 42 insertions(+), 78 deletions(-)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index f2c87c695a17..b9828cf1693a 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -59,30 +59,6 @@ void iopf_free_group(struct iopf_group *group) } EXPORT_SYMBOL_GPL(iopf_free_group);
-static struct iommu_domain *get_domain_for_iopf(struct device *dev, - struct iommu_fault *fault) -{ - struct iommu_domain *domain; - - if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { - domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0); - if (IS_ERR(domain)) - domain = NULL; - } else { - domain = iommu_get_domain_for_dev(dev); - } - - if (!domain || !domain->iopf_handler) { - dev_warn_ratelimited(dev, - "iopf (pasid %d) without domain attached or handler installed\n", - fault->prm.pasid); - - return NULL; - } - - return domain; -} - /* Non-last request of a group. Postpone until the last one. */ static int report_partial_fault(struct iommu_fault_param *fault_param, struct iommu_fault *fault) @@ -207,20 +183,51 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) if (group == &abort_group) goto err_abort;
- group->domain = get_domain_for_iopf(dev, fault); - if (!group->domain) + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { + group->attach_handle = iommu_attach_handle_get(dev->iommu_group, + fault->prm.pasid, + 0); + if (IS_ERR(group->attach_handle)) { + const struct iommu_ops *ops = dev_iommu_ops(dev); + + if (!ops->user_pasid_table) + goto err_abort; + + /* + * The iommu driver for this device supports user- + * managed PASID table. Therefore page faults for + * any PASID should go through the NESTING domain + * attached to the device RID. + */ + group->attach_handle = + iommu_attach_handle_get(dev->iommu_group, + IOMMU_NO_PASID, + IOMMU_DOMAIN_NESTED); + if (IS_ERR(group->attach_handle)) + goto err_abort; + } + } else { + group->attach_handle = + iommu_attach_handle_get(dev->iommu_group, IOMMU_NO_PASID, 0); + if (IS_ERR(group->attach_handle)) + goto err_abort; + } + + if (!group->attach_handle->domain->iopf_handler) goto err_abort;
/* * On success iopf_handler must call iopf_group_response() and * iopf_free_group() */ - if (group->domain->iopf_handler(group)) + if (group->attach_handle->domain->iopf_handler(group)) goto err_abort;
return;
err_abort: + dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n", + fault->prm.pasid); iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE); if (group == &abort_group) __iopf_free_group(group); diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 9b7f62517419..69cde094440e 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -272,7 +272,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work) if (status != IOMMU_PAGE_RESP_SUCCESS) break;
- status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm); + status = iommu_sva_handle_mm(&iopf->fault, + group->attach_handle->domain->mm); }
iopf_group_response(group, status); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2e05e2725e36..71aa2885667f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3633,45 +3633,6 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, } EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
-/* - * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev - * @dev: the queried device - * @pasid: the pasid of the device - * @type: matched domain type, 0 for any match - * - * This is a variant of iommu_get_domain_for_dev(). It returns the existing - * domain attached to pasid of a device. Callers must hold a lock around this - * function, and both iommu_attach/detach_dev_pasid() whenever a domain of - * type is being manipulated. This API does not internally resolve races with - * attach/detach. - * - * Return: attached domain on success, NULL otherwise. - */ -struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev, - ioasid_t pasid, - unsigned int type) -{ - /* Caller must be a probed driver on dev */ - struct iommu_group *group = dev->iommu_group; - struct iommu_attach_handle *handle; - struct iommu_domain *domain = NULL; - - if (!group) - return NULL; - - xa_lock(&group->pasid_array); - handle = xa_load(&group->pasid_array, pasid); - if (handle) - domain = handle->domain; - - if (type && domain && domain->type != type) - domain = NULL; - xa_unlock(&group->pasid_array); - - return domain; -} -EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid); - ioasid_t iommu_alloc_global_pasid(struct device *dev) { int ret; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index bc61593a4a9f..0e6bbf5312e1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -144,7 +144,7 @@ struct iopf_group { /* list node for iommu_fault_param::faults */ struct list_head pending_node; struct work_struct work; - struct iommu_domain *domain; + struct iommu_attach_handle *attach_handle; /* The device's fault data parameter. */ struct iommu_fault_param *fault_param;
@@ -599,6 +599,10 @@ static inline int __iommu_copy_struct_from_user_array( * @default_domain: If not NULL this will always be set as the default domain. * This should be an IDENTITY/BLOCKED/PLATFORM domain. * Do not use in new drivers. + * @user_pasid_table: IOMMU driver supports user-managed PASID table. There is + * no user domain for each PASID and the I/O page faults are + * forwarded through the user domain attached to the device + * RID. */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap); @@ -642,6 +646,7 @@ struct iommu_ops { struct iommu_domain *identity_domain; struct iommu_domain *blocked_domain; struct iommu_domain *default_domain; + u8 user_pasid_table:1;
KABI_RESERVE(1) KABI_RESERVE(2) @@ -1182,9 +1187,6 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, struct iommu_attach_handle *handle); void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); -struct iommu_domain * -iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, - unsigned int type); ioasid_t iommu_alloc_global_pasid(struct device *dev); void iommu_free_global_pasid(ioasid_t pasid); #else /* CONFIG_IOMMU_API */ @@ -1556,13 +1558,6 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain, { }
-static inline struct iommu_domain * -iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, - unsigned int type) -{ - return NULL; -} - static inline ioasid_t iommu_alloc_global_pasid(struct device *dev) { return IOMMU_PASID_INVALID;
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 8519e689834a3ecf9a36332a9abb1844bd34e459 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Unlike the SVA case where each PASID of a device has an SVA domain attached to it, the I/O page faults are handled by the fault handler of the SVA domain. The I/O page faults for a user page table might be handled by the domain attached to RID or the domain attached to the PASID, depending on whether the PASID table is managed by user space or kernel. As a result, there is a need for the domain attach group interfaces to have attach handle support. The attach handle will be forwarded to the fault handler of the user domain.
Add some variants of the domain attaching group interfaces so that they could support the attach handle and export them for use in IOMMUFD.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240702063444.105814-5-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommu-priv.h | 8 +++ drivers/iommu/iommu.c | 103 +++++++++++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+)
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index 7708f60bd01a..f163aa0129f4 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -30,4 +30,12 @@ void iommu_device_unregister_bus(struct iommu_device *iommu, struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type); +int iommu_attach_group_handle(struct iommu_domain *domain, + struct iommu_group *group, + struct iommu_attach_handle *handle); +void iommu_detach_group_handle(struct iommu_domain *domain, + struct iommu_group *group); +int iommu_replace_group_handle(struct iommu_group *group, + struct iommu_domain *new_domain, + struct iommu_attach_handle *handle); #endif /* __LINUX_IOMMU_PRIV_H */ diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 71aa2885667f..90f63e4a7ea0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3690,3 +3690,106 @@ iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int return handle; } EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL); + +/** + * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group + * @domain: IOMMU domain to attach + * @group: IOMMU group that will be attached + * @handle: attach handle + * + * Returns 0 on success and error code on failure. + * + * This is a variant of iommu_attach_group(). It allows the caller to provide + * an attach handle and use it when the domain is attached. This is currently + * used by IOMMUFD to deliver the I/O page faults. + */ +int iommu_attach_group_handle(struct iommu_domain *domain, + struct iommu_group *group, + struct iommu_attach_handle *handle) +{ + int ret; + + if (handle) + handle->domain = domain; + + mutex_lock(&group->mutex); + ret = xa_insert(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); + if (ret) + goto err_unlock; + + ret = __iommu_attach_group(domain, group); + if (ret) + goto err_erase; + mutex_unlock(&group->mutex); + + return 0; +err_erase: + xa_erase(&group->pasid_array, IOMMU_NO_PASID); +err_unlock: + mutex_unlock(&group->mutex); + return ret; +} +EXPORT_SYMBOL_NS_GPL(iommu_attach_group_handle, IOMMUFD_INTERNAL); + +/** + * iommu_detach_group_handle - Detach an IOMMU domain from an IOMMU group + * @domain: IOMMU domain to attach + * @group: IOMMU group that will be attached + * + * Detach the specified IOMMU domain from the specified IOMMU group. + * It must be used in conjunction with iommu_attach_group_handle(). + */ +void iommu_detach_group_handle(struct iommu_domain *domain, + struct iommu_group *group) +{ + mutex_lock(&group->mutex); + __iommu_group_set_core_domain(group); + xa_erase(&group->pasid_array, IOMMU_NO_PASID); + mutex_unlock(&group->mutex); +} +EXPORT_SYMBOL_NS_GPL(iommu_detach_group_handle, IOMMUFD_INTERNAL); + +/** + * iommu_replace_group_handle - replace the domain that a group is attached to + * @group: IOMMU group that will be attached to the new domain + * @new_domain: new IOMMU domain to replace with + * @handle: attach handle + * + * This is a variant of iommu_group_replace_domain(). It allows the caller to + * provide an attach handle for the new domain and use it when the domain is + * attached. + */ +int iommu_replace_group_handle(struct iommu_group *group, + struct iommu_domain *new_domain, + struct iommu_attach_handle *handle) +{ + void *curr; + int ret; + + if (!new_domain) + return -EINVAL; + + mutex_lock(&group->mutex); + if (handle) { + ret = xa_reserve(&group->pasid_array, IOMMU_NO_PASID, GFP_KERNEL); + if (ret) + goto err_unlock; + } + + ret = __iommu_group_set_domain(group, new_domain); + if (ret) + goto err_release; + + curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL); + WARN_ON(xa_is_err(curr)); + + mutex_unlock(&group->mutex); + + return 0; +err_release: + xa_release(&group->pasid_array, IOMMU_NO_PASID); +err_unlock: + mutex_unlock(&group->mutex); + return ret; +} +EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, IOMMUFD_INTERNAL);
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit c714f15860fcca02fe0fd7c3f1f1fc35b1768ac1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
iommu_hwpt_pgfaults represent fault messages that the userspace can retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group, with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last iommu_hwpt_pgfault.
An iommu_hwpt_page_response is a response message that the userspace should send to the kernel after finishing handling a group of fault messages. The @dev_id, @pasid, and @grpid fields in the message identify an outstanding iopf group for a device. The @cookie field, which matches the cookie field of the last fault in the group, will be used by the kernel to look up the pending message.
Link: https://lore.kernel.org/r/20240702063444.105814-6-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/uapi/linux/iommufd.h | 83 ++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 1dfeaa2e649e..4d89ed97b533 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -692,4 +692,87 @@ struct iommu_hwpt_invalidate { __u32 __reserved; }; #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) + +/** + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is + * valid. + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group. + */ +enum iommu_hwpt_pgfault_flags { + IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0), + IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1), +}; + +/** + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault + * @IOMMU_PGFAULT_PERM_READ: request for read permission + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission + * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the + * Execute Requested bit set in PASID TLP Prefix. + * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the + * Privileged Mode Requested bit set in PASID TLP + * Prefix. + */ +enum iommu_hwpt_pgfault_perm { + IOMMU_PGFAULT_PERM_READ = (1 << 0), + IOMMU_PGFAULT_PERM_WRITE = (1 << 1), + IOMMU_PGFAULT_PERM_EXEC = (1 << 2), + IOMMU_PGFAULT_PERM_PRIV = (1 << 3), +}; + +/** + * struct iommu_hwpt_pgfault - iommu page fault data + * @flags: Combination of enum iommu_hwpt_pgfault_flags + * @dev_id: id of the originated device + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @perm: Combination of enum iommu_hwpt_pgfault_perm + * @addr: Fault address + * @length: a hint of how much data the requestor is expecting to fetch. For + * example, if the PRI initiator knows it is going to do a 10MB + * transfer, it could fill in 10MB and the OS could pre-fault in + * 10MB of IOVA. It's default to 0 if there's no such hint. + * @cookie: kernel-managed cookie identifying a group of fault messages. The + * cookie number encoded in the last page fault of the group should + * be echoed back in the response message. + */ +struct iommu_hwpt_pgfault { + __u32 flags; + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; + __u32 length; + __u32 cookie; +}; + +/** + * enum iommufd_page_response_code - Return status of fault handlers + * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables + * populated, retry the access. This is the + * "Success" defined in PCI 10.4.2.1. + * @IOMMUFD_PAGE_RESP_INVALID: Could not handle this fault, don't retry the + * access. This is the "Invalid Request" in PCI + * 10.4.2.1. + * @IOMMUFD_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from + * this device if possible. This is the "Response + * Failure" in PCI 10.4.2.1. + */ +enum iommufd_page_response_code { + IOMMUFD_PAGE_RESP_SUCCESS = 0, + IOMMUFD_PAGE_RESP_INVALID, + IOMMUFD_PAGE_RESP_FAILURE, +}; + +/** + * struct iommu_hwpt_page_response - IOMMU page fault response + * @cookie: The kernel-managed cookie reported in the fault message. + * @code: One of response code in enum iommufd_page_response_code. + */ +struct iommu_hwpt_page_response { + __u32 cookie; + __u32 code; +}; #endif
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 07838f7fd529c8a6de44b601d4b7057e6c8d36ed category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
An iommufd fault object provides an interface for delivering I/O page faults to user space. These objects are created and destroyed by user space, and they can be associated with or dissociated from hardware page table objects during page table allocation or destruction.
User space interacts with the fault object through a file interface. This interface offers a straightforward and efficient way for user space to handle page faults. It allows user space to read fault messages sequentially and respond to them by writing to the same file. The file interface supports reading messages in poll mode, so it's recommended that user space applications use io_uring to enhance read and write efficiency.
A fault object can be associated with any iopf-capable iommufd_hw_pgtable during the pgtable's allocation. All I/O page faults triggered by devices when accessing the I/O addresses of an iommufd_hw_pgtable are routed through the fault object to user space. Similarly, user space's responses to these page faults are routed back to the iommu device driver through the same fault object.
Link: https://lore.kernel.org/r/20240702063444.105814-7-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/io-pgfault.c | 2 + drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/fault.c | 226 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 30 ++++ drivers/iommu/iommufd/main.c | 6 + include/linux/iommu.h | 4 + include/uapi/linux/iommufd.h | 18 ++ 7 files changed, 287 insertions(+) create mode 100644 drivers/iommu/iommufd/fault.c
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index b9828cf1693a..81e9cc6e3164 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -110,6 +110,8 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, list_add(&group->pending_node, &iopf_param->faults); mutex_unlock(&iopf_param->lock);
+ group->fault_count = list_count_nodes(&group->faults); + return group; }
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 34b446146961..cf4605962bea 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only iommufd-y := \ device.o \ + fault.o \ hw_pagetable.o \ io_pagetable.o \ ioas.o \ diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c new file mode 100644 index 000000000000..68ff94671d48 --- /dev/null +++ b/drivers/iommu/iommufd/fault.c @@ -0,0 +1,226 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2024 Intel Corporation + */ +#define pr_fmt(fmt) "iommufd: " fmt + +#include <linux/file.h> +#include <linux/fs.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/iommufd.h> +#include <linux/poll.h> +#include <linux/anon_inodes.h> +#include <uapi/linux/iommufd.h> + +#include "../iommu-priv.h" +#include "iommufd_private.h" + +void iommufd_fault_destroy(struct iommufd_object *obj) +{ + struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj); + struct iopf_group *group, *next; + + /* + * The iommufd object's reference count is zero at this point. + * We can be confident that no other threads are currently + * accessing this pointer. Therefore, acquiring the mutex here + * is unnecessary. + */ + list_for_each_entry_safe(group, next, &fault->deliver, node) { + list_del(&group->node); + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); + iopf_free_group(group); + } +} + +static void iommufd_compose_fault_message(struct iommu_fault *fault, + struct iommu_hwpt_pgfault *hwpt_fault, + struct iommufd_device *idev, + u32 cookie) +{ + hwpt_fault->flags = fault->prm.flags; + hwpt_fault->dev_id = idev->obj.id; + hwpt_fault->pasid = fault->prm.pasid; + hwpt_fault->grpid = fault->prm.grpid; + hwpt_fault->perm = fault->prm.perm; + hwpt_fault->addr = fault->prm.addr; + hwpt_fault->length = 0; + hwpt_fault->cookie = cookie; +} + +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf, + size_t count, loff_t *ppos) +{ + size_t fault_size = sizeof(struct iommu_hwpt_pgfault); + struct iommufd_fault *fault = filep->private_data; + struct iommu_hwpt_pgfault data; + struct iommufd_device *idev; + struct iopf_group *group; + struct iopf_fault *iopf; + size_t done = 0; + int rc = 0; + + if (*ppos || count % fault_size) + return -ESPIPE; + + mutex_lock(&fault->mutex); + while (!list_empty(&fault->deliver) && count > done) { + group = list_first_entry(&fault->deliver, + struct iopf_group, node); + + if (group->fault_count * fault_size > count - done) + break; + + rc = xa_alloc(&fault->response, &group->cookie, group, + xa_limit_32b, GFP_KERNEL); + if (rc) + break; + + idev = to_iommufd_handle(group->attach_handle)->idev; + list_for_each_entry(iopf, &group->faults, list) { + iommufd_compose_fault_message(&iopf->fault, + &data, idev, + group->cookie); + if (copy_to_user(buf + done, &data, fault_size)) { + xa_erase(&fault->response, group->cookie); + rc = -EFAULT; + break; + } + done += fault_size; + } + + list_del(&group->node); + } + mutex_unlock(&fault->mutex); + + return done == 0 ? rc : done; +} + +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf, + size_t count, loff_t *ppos) +{ + size_t response_size = sizeof(struct iommu_hwpt_page_response); + struct iommufd_fault *fault = filep->private_data; + struct iommu_hwpt_page_response response; + struct iopf_group *group; + size_t done = 0; + int rc = 0; + + if (*ppos || count % response_size) + return -ESPIPE; + + mutex_lock(&fault->mutex); + while (count > done) { + rc = copy_from_user(&response, buf + done, response_size); + if (rc) + break; + + group = xa_erase(&fault->response, response.cookie); + if (!group) { + rc = -EINVAL; + break; + } + + iopf_group_response(group, response.code); + iopf_free_group(group); + done += response_size; + } + mutex_unlock(&fault->mutex); + + return done == 0 ? rc : done; +} + +static __poll_t iommufd_fault_fops_poll(struct file *filep, + struct poll_table_struct *wait) +{ + struct iommufd_fault *fault = filep->private_data; + __poll_t pollflags = EPOLLOUT; + + poll_wait(filep, &fault->wait_queue, wait); + mutex_lock(&fault->mutex); + if (!list_empty(&fault->deliver)) + pollflags |= EPOLLIN | EPOLLRDNORM; + mutex_unlock(&fault->mutex); + + return pollflags; +} + +static int iommufd_fault_fops_release(struct inode *inode, struct file *filep) +{ + struct iommufd_fault *fault = filep->private_data; + + refcount_dec(&fault->obj.users); + iommufd_ctx_put(fault->ictx); + return 0; +} + +static const struct file_operations iommufd_fault_fops = { + .owner = THIS_MODULE, + .open = nonseekable_open, + .read = iommufd_fault_fops_read, + .write = iommufd_fault_fops_write, + .poll = iommufd_fault_fops_poll, + .release = iommufd_fault_fops_release, + .llseek = no_llseek, +}; + +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) +{ + struct iommu_fault_alloc *cmd = ucmd->cmd; + struct iommufd_fault *fault; + struct file *filep; + int fdno; + int rc; + + if (cmd->flags) + return -EOPNOTSUPP; + + fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT); + if (IS_ERR(fault)) + return PTR_ERR(fault); + + fault->ictx = ucmd->ictx; + INIT_LIST_HEAD(&fault->deliver); + xa_init_flags(&fault->response, XA_FLAGS_ALLOC1); + mutex_init(&fault->mutex); + init_waitqueue_head(&fault->wait_queue); + + filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops, + fault, O_RDWR); + if (IS_ERR(filep)) { + rc = PTR_ERR(filep); + goto out_abort; + } + + refcount_inc(&fault->obj.users); + iommufd_ctx_get(fault->ictx); + fault->filep = filep; + + fdno = get_unused_fd_flags(O_CLOEXEC); + if (fdno < 0) { + rc = fdno; + goto out_fput; + } + + cmd->out_fault_id = fault->obj.id; + cmd->out_fault_fd = fdno; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_put_fdno; + iommufd_object_finalize(ucmd->ictx, &fault->obj); + + fd_install(fdno, fault->filep); + + return 0; +out_put_fdno: + put_unused_fd(fdno); +out_fput: + fput(filep); + refcount_dec(&fault->obj.users); + iommufd_ctx_put(fault->ictx); +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj); + + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 991f864d1f9b..c8a4519f1405 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -128,6 +128,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_HWPT_NESTED, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, + IOMMUFD_OBJ_FAULT, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -426,6 +427,35 @@ void iopt_remove_access(struct io_pagetable *iopt, u32 iopt_access_list_id); void iommufd_access_destroy_object(struct iommufd_object *obj);
+/* + * An iommufd_fault object represents an interface to deliver I/O page faults + * to the user space. These objects are created/destroyed by the user space and + * associated with hardware page table objects during page-table allocation. + */ +struct iommufd_fault { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct file *filep; + + /* The lists of outstanding faults protected by below mutex. */ + struct mutex mutex; + struct list_head deliver; + struct xarray response; + + struct wait_queue_head wait_queue; +}; + +struct iommufd_attach_handle { + struct iommu_attach_handle handle; + struct iommufd_device *idev; +}; + +/* Convert an iommu attach handle to iommufd handle. */ +#define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle) + +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); +void iommufd_fault_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 39b32932c61e..83bbd7c5d160 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -319,6 +319,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_fault_alloc fault; struct iommu_hw_info info; struct iommu_hwpt_alloc hwpt; struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap; @@ -355,6 +356,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc, + out_fault_fd), IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, @@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_hwpt_nested_destroy, .abort = iommufd_hwpt_nested_abort, }, + [IOMMUFD_OBJ_FAULT] = { + .destroy = iommufd_fault_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0e6bbf5312e1..41b3014230bc 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -141,12 +141,16 @@ struct iopf_fault { struct iopf_group { struct iopf_fault last_fault; struct list_head faults; + size_t fault_count; /* list node for iommu_fault_param::faults */ struct list_head pending_node; struct work_struct work; struct iommu_attach_handle *attach_handle; /* The device's fault data parameter. */ struct iommu_fault_param *fault_param; + /* Used by handler provider to hook the group on its own lists. */ + struct list_head node; + u32 cookie;
KABI_RESERVE(1) KABI_RESERVE(2) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4d89ed97b533..70b8a38fcd46 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -50,6 +50,7 @@ enum { IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, IOMMUFD_CMD_HWPT_INVALIDATE, + IOMMUFD_CMD_FAULT_QUEUE_ALLOC, };
/** @@ -775,4 +776,21 @@ struct iommu_hwpt_page_response { __u32 cookie; __u32 code; }; + +/** + * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_QUEUE_ALLOC) + * @size: sizeof(struct iommu_fault_alloc) + * @flags: Must be 0 + * @out_fault_id: The ID of the new FAULT + * @out_fault_fd: The fd of the new FAULT + * + * Explicitly allocate a fault handling object. + */ +struct iommu_fault_alloc { + __u32 size; + __u32 flags; + __u32 out_fault_id; + __u32 out_fault_fd; +}; +#define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) #endif
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit b7d8833677baad8c80ed1aac8c396d687e64a376 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add iopf-capable hw page table attach/detach/replace helpers. The pointer to iommufd_device is stored in the domain attachment handle, so that it can be echo'ed back in the iopf_group.
The iopf-capable hw page tables can only be attached to devices that support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an iopf-capable hw_pagetable to the device, the IOPF feature is enabled on the device. Similarly, after the last iopf-capable hwpt is detached from the device, the IOPF feature is disabled on the device.
The current implementation allows a replacement between iopf-capable and non-iopf-capable hw page tables. This matches the nested translation use case, where a parent domain is attached by default and can then be replaced with a nested user domain with iopf support.
Link: https://lore.kernel.org/r/20240702063444.105814-8-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 7 +- drivers/iommu/iommufd/fault.c | 190 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 41 +++++ 3 files changed, 235 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 873630c111c1..9a7ec5997c61 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, refcount_inc(&idev->obj.users); /* igroup refcount moves into iommufd_device */ idev->igroup = igroup; + mutex_init(&idev->iopf_lock);
/* * If the caller fails after this success it must call @@ -376,7 +377,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { - rc = iommu_attach_group(hwpt->domain, idev->igroup->group); + rc = iommufd_hwpt_attach_device(hwpt, idev); if (rc) goto err_unresv; idev->igroup->hwpt = hwpt; @@ -402,7 +403,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) mutex_lock(&idev->igroup->lock); list_del(&idev->group_item); if (list_empty(&idev->igroup->device_list)) { - iommu_detach_group(hwpt->domain, idev->igroup->group); + iommufd_hwpt_detach_device(hwpt, idev); idev->igroup->hwpt = NULL; } if (hwpt_is_paging(hwpt)) @@ -497,7 +498,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, goto err_unlock; }
- rc = iommu_group_replace_domain(igroup->group, hwpt->domain); + rc = iommufd_hwpt_replace_device(idev, hwpt, old_hwpt); if (rc) goto err_unresv;
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 68ff94671d48..4934ae572638 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -8,6 +8,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/iommufd.h> +#include <linux/pci.h> #include <linux/poll.h> #include <linux/anon_inodes.h> #include <uapi/linux/iommufd.h> @@ -15,6 +16,195 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
+static int iommufd_fault_iopf_enable(struct iommufd_device *idev) +{ + struct device *dev = idev->dev; + int ret; + + /* + * Once we turn on PCI/PRI support for VF, the response failure code + * should not be forwarded to the hardware due to PRI being a shared + * resource between PF and VFs. There is no coordination for this + * shared capability. This waits for a vPRI reset to recover. + */ + if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn) + return -EINVAL; + + mutex_lock(&idev->iopf_lock); + /* Device iopf has already been on. */ + if (++idev->iopf_enabled > 1) { + mutex_unlock(&idev->iopf_lock); + return 0; + } + + ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF); + if (ret) + --idev->iopf_enabled; + mutex_unlock(&idev->iopf_lock); + + return ret; +} + +static void iommufd_fault_iopf_disable(struct iommufd_device *idev) +{ + mutex_lock(&idev->iopf_lock); + if (!WARN_ON(idev->iopf_enabled == 0)) { + if (--idev->iopf_enabled == 0) + iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF); + } + mutex_unlock(&idev->iopf_lock); +} + +static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + struct iommufd_attach_handle *handle; + int ret; + + handle = kzalloc(sizeof(*handle), GFP_KERNEL); + if (!handle) + return -ENOMEM; + + handle->idev = idev; + ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group, + &handle->handle); + if (ret) + kfree(handle); + + return ret; +} + +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + int ret; + + if (!hwpt->fault) + return -EINVAL; + + ret = iommufd_fault_iopf_enable(idev); + if (ret) + return ret; + + ret = __fault_domain_attach_dev(hwpt, idev); + if (ret) + iommufd_fault_iopf_disable(idev); + + return ret; +} + +static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt, + struct iommufd_attach_handle *handle) +{ + struct iommufd_fault *fault = hwpt->fault; + struct iopf_group *group, *next; + unsigned long index; + + if (!fault) + return; + + mutex_lock(&fault->mutex); + list_for_each_entry_safe(group, next, &fault->deliver, node) { + if (group->attach_handle != &handle->handle) + continue; + list_del(&group->node); + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); + iopf_free_group(group); + } + + xa_for_each(&fault->response, index, group) { + if (group->attach_handle != &handle->handle) + continue; + xa_erase(&fault->response, index); + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID); + iopf_free_group(group); + } + mutex_unlock(&fault->mutex); +} + +static struct iommufd_attach_handle * +iommufd_device_get_attach_handle(struct iommufd_device *idev) +{ + struct iommu_attach_handle *handle; + + handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0); + if (!handle) + return NULL; + + return to_iommufd_handle(handle); +} + +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + struct iommufd_attach_handle *handle; + + handle = iommufd_device_get_attach_handle(idev); + iommu_detach_group_handle(hwpt->domain, idev->igroup->group); + iommufd_auto_response_faults(hwpt, handle); + iommufd_fault_iopf_disable(idev); + kfree(handle); +} + +static int __fault_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old) +{ + struct iommufd_attach_handle *handle, *curr = NULL; + int ret; + + if (old->fault) + curr = iommufd_device_get_attach_handle(idev); + + if (hwpt->fault) { + handle = kzalloc(sizeof(*handle), GFP_KERNEL); + if (!handle) + return -ENOMEM; + + handle->handle.domain = hwpt->domain; + handle->idev = idev; + ret = iommu_replace_group_handle(idev->igroup->group, + hwpt->domain, &handle->handle); + } else { + ret = iommu_replace_group_handle(idev->igroup->group, + hwpt->domain, NULL); + } + + if (!ret && curr) { + iommufd_auto_response_faults(old, curr); + kfree(curr); + } + + return ret; +} + +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old) +{ + bool iopf_off = !hwpt->fault && old->fault; + bool iopf_on = hwpt->fault && !old->fault; + int ret; + + if (iopf_on) { + ret = iommufd_fault_iopf_enable(idev); + if (ret) + return ret; + } + + ret = __fault_domain_replace_dev(idev, hwpt, old); + if (ret) { + if (iopf_on) + iommufd_fault_iopf_disable(idev); + return ret; + } + + if (iopf_off) + iommufd_fault_iopf_disable(idev); + + return 0; +} + void iommufd_fault_destroy(struct iommufd_object *obj) { struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index c8a4519f1405..aa4c26c87cb9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -11,6 +11,7 @@ #include <linux/iommu.h> #include <linux/iova_bitmap.h> #include <uapi/linux/iommufd.h> +#include "../iommu-priv.h"
struct iommu_domain; struct iommu_group; @@ -293,6 +294,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt, struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain; + struct iommufd_fault *fault; };
struct iommufd_hwpt_paging { @@ -396,6 +398,9 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + /* protect iopf_enabled counter */ + struct mutex iopf_lock; + unsigned int iopf_enabled; };
static inline struct iommufd_device * @@ -456,6 +461,42 @@ struct iommufd_attach_handle { int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev); +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old); + +static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + if (hwpt->fault) + return iommufd_fault_domain_attach_dev(hwpt, idev); + + return iommu_attach_group(hwpt->domain, idev->igroup->group); +} + +static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + if (hwpt->fault) + iommufd_fault_domain_detach_dev(hwpt, idev); + + iommu_detach_group(hwpt->domain, idev->igroup->group); +} + +static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + struct iommufd_hw_pagetable *old) +{ + if (old->fault || hwpt->fault) + return iommufd_fault_domain_replace_dev(idev, hwpt, old); + + return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); +} + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj);
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 34765cbc679c59ea5d952d738d2d16bf4aadc497 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When allocating a user iommufd_hw_pagetable, the user space is allowed to associate a fault object with the hw_pagetable by specifying the fault object ID in the page table allocation data and setting the IOMMU_HWPT_FAULT_ID_VALID flag bit.
On a successful return of hwpt allocation, the user can retrieve and respond to page faults by reading and writing the file interface of the fault object.
Once a fault object has been associated with a hwpt, the hwpt is iopf-capable, indicated by hwpt->fault is non NULL. Attaching, detaching, or replacing an iopf-capable hwpt to an RID or PASID will differ from those that are not iopf-capable.
Link: https://lore.kernel.org/r/20240702063444.105814-9-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/fault.c | 17 +++++++++++ drivers/iommu/iommufd/hw_pagetable.c | 38 +++++++++++++++++++------ drivers/iommu/iommufd/iommufd_private.h | 9 ++++++ include/uapi/linux/iommufd.h | 8 ++++++ 4 files changed, 64 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 4934ae572638..54d6cd20a673 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -414,3 +414,20 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
return rc; } + +int iommufd_fault_iopf_handler(struct iopf_group *group) +{ + struct iommufd_hw_pagetable *hwpt; + struct iommufd_fault *fault; + + hwpt = group->attach_handle->domain->fault_data; + fault = hwpt->fault; + + mutex_lock(&fault->mutex); + list_add_tail(&group->node, &fault->deliver); + mutex_unlock(&fault->mutex); + + wake_up_interruptible(&fault->wait_queue); + + return 0; +} diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 5fc233ae6ba4..5118335f1b94 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) +{ + if (hwpt->domain) + iommu_domain_free(hwpt->domain); + + if (hwpt->fault) + refcount_dec(&hwpt->fault->obj.users); +} + void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) { struct iommufd_hwpt_paging *hwpt_paging = @@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) hwpt_paging->common.domain); }
- if (hwpt_paging->common.domain) - iommu_domain_free(hwpt_paging->common.domain); - + __iommufd_hwpt_destroy(&hwpt_paging->common); refcount_dec(&hwpt_paging->ioas->obj.users); }
@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) struct iommufd_hwpt_nested *hwpt_nested = container_of(obj, struct iommufd_hwpt_nested, common.obj);
- if (hwpt_nested->common.domain) - iommu_domain_free(hwpt_nested->common.domain); - + __iommufd_hwpt_destroy(&hwpt_nested->common); refcount_dec(&hwpt_nested->parent->common.obj.users); }
@@ -216,7 +221,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt; int rc;
- if (flags || !user_data->len || !ops->domain_alloc_user) + if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) || + !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (parent->auto_domain || !parent->nest_parent) return ERR_PTR(-EINVAL); @@ -230,7 +236,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, refcount_inc(&parent->common.obj.users); hwpt_nested->parent = parent;
- hwpt->domain = ops->domain_alloc_user(idev->dev, flags, + hwpt->domain = ops->domain_alloc_user(idev->dev, + flags & ~IOMMU_HWPT_FAULT_ID_VALID, parent->common.domain, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); @@ -312,6 +319,21 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_pt; }
+ if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) { + struct iommufd_fault *fault; + + fault = iommufd_get_fault(ucmd, cmd->fault_id); + if (IS_ERR(fault)) { + rc = PTR_ERR(fault); + goto out_hwpt; + } + hwpt->fault = fault; + hwpt->domain->iopf_handler = iommufd_fault_iopf_handler; + hwpt->domain->fault_data = hwpt; + refcount_inc(&fault->obj.users); + iommufd_put_object(ucmd->ictx, &fault->obj); + } + cmd->out_hwpt_id = hwpt->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index aa4c26c87cb9..92efe30a8f0d 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -458,8 +458,17 @@ struct iommufd_attach_handle { /* Convert an iommu attach handle to iommufd handle. */ #define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)
+static inline struct iommufd_fault * +iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_FAULT), + struct iommufd_fault, obj); +} + int iommufd_fault_alloc(struct iommufd_ucmd *ucmd); void iommufd_fault_destroy(struct iommufd_object *obj); +int iommufd_fault_iopf_handler(struct iopf_group *group);
int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 70b8a38fcd46..ede2b464a761 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,10 +357,13 @@ struct iommu_vfio_ioas { * the parent HWPT in a nesting configuration. * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is * enforced on device attachment + * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is + * valid. */ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, + IOMMU_HWPT_FAULT_ID_VALID = 1 << 2, };
/** @@ -412,6 +415,9 @@ enum iommu_hwpt_data_type { * @data_type: One of enum iommu_hwpt_data_type * @data_len: Length of the type specific data * @data_uptr: User pointer to the type specific data + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of + * IOMMU_HWPT_FAULT_ID_VALID is set. + * @__reserved2: Padding to 64-bit alignment. Must be 0. * * Explicitly allocate a hardware page table object. This is the same object * type that is returned by iommufd_device_attach() and represents the @@ -442,6 +448,8 @@ struct iommu_hwpt_alloc { __u32 data_type; __u32 data_len; __aligned_u64 data_uptr; + __u32 fault_id; + __u32 __reserved2; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit ddee19971081b42615d62f4fdada21274708ed4d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Extend the selftest mock device to support generating and responding to an IOPF. Also add an ioctl interface to userspace applications to trigger the IOPF on the mock device. This would allow userspace applications to test the IOMMUFD's handling of IOPFs without having to rely on any real hardware.
Link: https://lore.kernel.org/r/20240702063444.105814-10-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_test.h | 8 ++++ drivers/iommu/iommufd/selftest.c | 64 ++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index e854d3f67205..acbbba1c6671 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -22,6 +22,7 @@ enum { IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, + IOMMU_TEST_OP_TRIGGER_IOPF, };
enum { @@ -127,6 +128,13 @@ struct iommu_test_cmd { __u32 id; __u32 iotlb; } check_iotlb; + struct { + __u32 dev_id; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; + } trigger_iopf; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 7a2199470f31..1133f1b2362f 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -504,6 +504,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) return false; }
+static struct iopf_queue *mock_iommu_iopf_queue; + static struct iommu_device mock_iommu_device = { };
@@ -514,6 +516,29 @@ static struct iommu_device *mock_probe_device(struct device *dev) return &mock_iommu_device; }
+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, + struct iommu_page_response *msg) +{ +} + +static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + return iopf_queue_add_device(mock_iommu_iopf_queue, dev); +} + +static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat) +{ + if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue) + return -ENODEV; + + iopf_queue_remove_device(mock_iommu_iopf_queue, dev); + + return 0; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -529,6 +554,10 @@ static const struct iommu_ops mock_ops = { .capable = mock_domain_capable, .device_group = generic_device_group, .probe_device = mock_probe_device, + .page_response = mock_domain_page_response, + .dev_enable_feat = mock_dev_enable_feat, + .dev_disable_feat = mock_dev_disable_feat, + .user_pasid_table = true, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free, @@ -1375,6 +1404,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, return rc; }
+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd, + struct iommu_test_cmd *cmd) +{ + struct iopf_fault event = { }; + struct iommufd_device *idev; + + idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; + if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID) + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + event.fault.type = IOMMU_FAULT_PAGE_REQ; + event.fault.prm.addr = cmd->trigger_iopf.addr; + event.fault.prm.pasid = cmd->trigger_iopf.pasid; + event.fault.prm.grpid = cmd->trigger_iopf.grpid; + event.fault.prm.perm = cmd->trigger_iopf.perm; + + iommu_report_device_fault(idev->dev, &event); + iommufd_put_object(ucmd->ictx, &idev->obj); + + return 0; +} + void iommufd_selftest_destroy(struct iommufd_object *obj) { struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); @@ -1450,6 +1504,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd) cmd->dirty.page_size, u64_to_user_ptr(cmd->dirty.uptr), cmd->dirty.flags); + case IOMMU_TEST_OP_TRIGGER_IOPF: + return iommufd_test_trigger_iopf(ucmd, cmd); default: return -EOPNOTSUPP; } @@ -1491,6 +1547,9 @@ int __init iommufd_test_init(void) &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs; + + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq"); + return 0;
err_sysfs: @@ -1506,6 +1565,11 @@ int __init iommufd_test_init(void)
void iommufd_test_exit(void) { + if (mock_iommu_iopf_queue) { + iopf_queue_free(mock_iommu_iopf_queue); + mock_iommu_iopf_queue = NULL; + } + iommu_device_sysfs_remove(&mock_iommu_device); iommu_device_unregister_bus(&mock_iommu_device, &iommufd_mock_bus_type.bus,
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit d1211768b62d02e27b46a3ff78f739c4776a0f03 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Extend the selftest tool to add coverage of testing IOPF handling. This would include the following tests:
- Allocating and destroying an iommufd fault object. - Allocating and destroying an IOPF-capable HWPT. - Attaching/detaching/replacing an IOPF-capable HWPT on a device. - Triggering an IOPF on the mock device. - Retrieving and responding to the IOPF through the file interface.
Link: https://lore.kernel.org/r/20240702063444.105814-11-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- tools/testing/selftests/iommu/iommufd.c | 22 +++++ .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 86 +++++++++++++++++-- 3 files changed, 104 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index c8d0f13c91dd..9ebe55b875f4 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) uint32_t parent_hwpt_id = 0; uint32_t parent_hwpt_id_not_work = 0; uint32_t test_hwpt_id = 0; + uint32_t iopf_hwpt_id; + uint32_t fault_id; + uint32_t fault_fd;
if (self->device_id) { /* Negative tests */ @@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) sizeof(data));
/* Allocate two nested hwpts sharing one common parent hwpt */ + test_ioctl_fault_alloc(&fault_id, &fault_fd); test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, &nested_hwpt_id[0], IOMMU_HWPT_DATA_SELFTEST, &data, @@ -334,6 +338,14 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) &nested_hwpt_id[1], IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_err_hwpt_alloc_iopf(ENOENT, self->device_id, parent_hwpt_id, + UINT32_MAX, IOMMU_HWPT_FAULT_ID_VALID, + &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id, + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], IOMMU_TEST_IOTLB_DEFAULT); test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], @@ -504,14 +516,24 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); test_ioctl_destroy(nested_hwpt_id[0]);
+ /* Switch from nested_hwpt_id[1] to iopf_hwpt_id */ + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); + /* Trigger an IOPF on the device */ + test_cmd_trigger_iopf(self->device_id, fault_fd); + /* Detach from nested_hwpt_id[1] and destroy it */ test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); test_ioctl_destroy(nested_hwpt_id[1]); + test_ioctl_destroy(iopf_hwpt_id);
/* Detach from the parent hw_pagetable and destroy it */ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); test_ioctl_destroy(parent_hwpt_id); test_ioctl_destroy(parent_hwpt_id_not_work); + close(fault_fd); + test_ioctl_destroy(fault_id); } else { test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0, &parent_hwpt_id); diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index f590417cd67a..c5d5e69452b0 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL)) return -1;
- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id, + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id, IOMMU_HWPT_DATA_NONE, 0, 0)) return -1;
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 8d2b46b2114d..a70e35a78584 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \ pt_id, NULL))
-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, +static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id, __u32 flags, __u32 *hwpt_id, __u32 data_type, void *data, size_t data_len) { @@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, .data_type = data_type, .data_len = data_len, .data_uptr = (uint64_t)data, + .fault_id = ft_id, }; int ret;
@@ -177,24 +178,36 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, }
#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \ 0)) #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \ - self->fd, device_id, pt_id, flags, \ + self->fd, device_id, pt_id, 0, flags, \ hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))
#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len)) #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ data_type, data, data_len) \ EXPECT_ERRNO(_errno, \ - _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \ hwpt_id, data_type, data, data_len))
+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id, \ + data_type, data, data_len) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \ + flags, hwpt_id, data_type, data, \ + data_len)) +#define test_err_hwpt_alloc_iopf(_errno, device_id, pt_id, fault_id, flags, \ + hwpt_id, data_type, data, data_len) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \ + flags, hwpt_id, data_type, data, \ + data_len)) + #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \ ({ \ struct iommu_test_cmd test_cmd = { \ @@ -684,3 +697,66 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
#define test_cmd_get_hw_capabilities(device_id, caps, mask) \ ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps)) + +static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd) +{ + struct iommu_fault_alloc cmd = { + .size = sizeof(cmd), + }; + int ret; + + ret = ioctl(fd, IOMMU_FAULT_QUEUE_ALLOC, &cmd); + if (ret) + return ret; + *fault_id = cmd.out_fault_id; + *fault_fd = cmd.out_fault_fd; + return 0; +} + +#define test_ioctl_fault_alloc(fault_id, fault_fd) \ + ({ \ + ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \ + fault_fd)); \ + ASSERT_NE(0, *(fault_id)); \ + ASSERT_NE(0, *(fault_fd)); \ + }) + +static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd) +{ + struct iommu_test_cmd trigger_iopf_cmd = { + .size = sizeof(trigger_iopf_cmd), + .op = IOMMU_TEST_OP_TRIGGER_IOPF, + .trigger_iopf = { + .dev_id = device_id, + .pasid = 0x1, + .grpid = 0x2, + .perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE, + .addr = 0xdeadbeaf, + }, + }; + struct iommu_hwpt_page_response response = { + .code = IOMMUFD_PAGE_RESP_SUCCESS, + }; + struct iommu_hwpt_pgfault fault = {}; + ssize_t bytes; + int ret; + + ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd); + if (ret) + return ret; + + bytes = read(fault_fd, &fault, sizeof(fault)); + if (bytes <= 0) + return -EIO; + + response.cookie = fault.cookie; + + bytes = write(fault_fd, &response, sizeof(response)); + if (bytes <= 0) + return -EIO; + + return 0; +} + +#define test_cmd_trigger_iopf(device_id, fault_fd) \ + ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 861f96a785149a0062cce6578e0fa7cb95435a7e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The response code of IOMMUFD_PAGE_RESP_FAILURE was defined to be equivalent to the "Response Failure" in PCI spec, section 10.4.2.1. This response code indicates that one or more pages within the associated request group have encountered or caused an unrecoverable error. Therefore, this response disables the PRI at the function.
Modern I/O virtualization technologies, like SR-IOV, share PRI among the assignable device units. Therefore, a response failure on one unit might cause I/O failure on other units.
Remove this response code so that user space can only respond with SUCCESS or INVALID. The VMM is recommended to emulate a failure response as a PRI reset, or PRI disable and changing to a non-PRI domain.
Fixes: c714f15860fc ("iommufd: Add fault and response message definitions") Link: https://lore.kernel.org/r/20240710083341.44617-2-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/uapi/linux/iommufd.h | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ede2b464a761..e31385b75d0b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -765,14 +765,10 @@ struct iommu_hwpt_pgfault { * @IOMMUFD_PAGE_RESP_INVALID: Could not handle this fault, don't retry the * access. This is the "Invalid Request" in PCI * 10.4.2.1. - * @IOMMUFD_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from - * this device if possible. This is the "Response - * Failure" in PCI 10.4.2.1. */ enum iommufd_page_response_code { IOMMUFD_PAGE_RESP_SUCCESS = 0, IOMMUFD_PAGE_RESP_INVALID, - IOMMUFD_PAGE_RESP_FAILURE, };
/**
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit d73cf5ff743b5a8de6fa20651baba5bd56ba98a3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The response code from user space is only allowed to be SUCCESS or INVALID. All other values are treated by the device as a response code of Response Failure according to PCI spec, section 10.4.2.1. This response disables the Page Request Interface for the Function.
Add a check in iommufd_fault_fops_write() to avoid invalid response code.
Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://lore.kernel.org/r/20240710083341.44617-3-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/fault.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 54d6cd20a673..9c142cefa2d2 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -305,6 +305,16 @@ static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *b if (rc) break;
+ static_assert((int)IOMMUFD_PAGE_RESP_SUCCESS == + (int)IOMMU_PAGE_RESP_SUCCESS); + static_assert((int)IOMMUFD_PAGE_RESP_INVALID == + (int)IOMMU_PAGE_RESP_INVALID); + if (response.code != IOMMUFD_PAGE_RESP_SUCCESS && + response.code != IOMMUFD_PAGE_RESP_INVALID) { + rc = -EINVAL; + break; + } + group = xa_erase(&fault->response, response.cookie); if (!group) { rc = -EINVAL;
From: Lu Baolu baolu.lu@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 595572aae3d0c3bf295ea759b74b948e7493a9ff category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Smatch static checker reported below warning:
drivers/iommu/iommufd/fault.c:131 iommufd_device_get_attach_handle() warn: 'handle' is an error pointer or valid
Fix it by checking 'handle' with IS_ERR().
Fixes: b7d8833677ba ("iommufd: Fault-capable hwpt attach/detach/replace") Link: https://lore.kernel.org/r/20240712025819.63147-1-baolu.lu@linux.intel.com Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/linux-iommu/8bb4f37a-4514-4dea-aabb-7380be303895@sta... Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 9c142cefa2d2..a643d5c7c535 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -128,7 +128,7 @@ iommufd_device_get_attach_handle(struct iommufd_device *idev) struct iommu_attach_handle *handle;
handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0); - if (!handle) + if (IS_ERR(handle)) return NULL;
return to_iommufd_handle(handle);
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJc84c5ab76c9c04b5f1c8cc66ee93... CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc (the list of DMA addresses for the L2 entries) is already zero'd.
arm_smmu_init_l1_strtab() goes through and calls arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it again.
Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from arm_smmu_init_strtab_2lvl to allocate the companion struct.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/1-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 ++++++--------------- 1 file changed, 7 insertions(+), 21 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9d798f1fa010..cf621da8d2da 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -4116,25 +4116,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu) PRIQ_ENT_DWORDS, "priq"); }
-static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) -{ - unsigned int i; - struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - void *strtab = smmu->strtab_cfg.strtab; - - cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents, - sizeof(*cfg->l1_desc), GFP_KERNEL); - if (!cfg->l1_desc) - return -ENOMEM; - - for (i = 0; i < cfg->num_l1_ents; ++i) { - arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); - strtab += STRTAB_L1_DESC_DWORDS << 3; - } - - return 0; -} - #ifdef CONFIG_SMMU_BYPASS_DEV static void arm_smmu_install_bypass_ste_for_dev(struct arm_smmu_device *smmu, u32 sid) @@ -4184,7 +4165,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) u64 reg; u32 size, l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - int ret; + int ret = 0;
/* Calculate the L1 size, capped to the SIDSIZE. */ size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); @@ -4213,7 +4194,12 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size); reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); cfg->strtab_base_cfg = reg; - ret = arm_smmu_init_l1_strtab(smmu); + + cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents, + sizeof(*cfg->l1_desc), GFP_KERNEL); + if (!cfg->l1_desc) + ret = -ENOMEM; + #ifdef CONFIG_SMMU_BYPASS_DEV if (!ret && smmu_bypass_devices_num) { ret = bus_for_each_dev(&pci_bus_type, NULL, (void *)smmu,
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit a4d75360f7a6d979edd66af577847b0f4dbf4377 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The top of the 2 level stream table is (at most) 128k entries big, and two high order allocations are required. One of __le64 which is programmed into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds the CPU pointer (3M).
There is no reason to store the l2ptr_dma as nothing reads it. devm stores a copy of it and the DMA memory will be freed via devm mechanisms. span is a constant of 8+1. Remove both.
This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of memory per iommu instance.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/2-v2-318ed5f6983b+198f-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 --- 2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index cf621da8d2da..5d17f52e3f8c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1681,13 +1681,12 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) }
/* Stream table manipulation functions */ -static void -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) +static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma) { u64 val = 0;
- val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span); - val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK; + val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, STRTAB_SPLIT + 1); + val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
/* The HW has 64 bit atomicity with stores to the L2 STE table */ WRITE_ONCE(*dst, cpu_to_le64(val)); @@ -1892,6 +1891,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) { size_t size; void *strtab; + dma_addr_t l2ptr_dma; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
@@ -1901,8 +1901,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
- desc->span = STRTAB_SPLIT + 1; - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma, + desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL); if (!desc->l2ptr) { dev_err(smmu->dev, @@ -1912,7 +1911,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) }
arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); - arm_smmu_write_strtab_l1_desc(strtab, desc); + arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma); return 0; }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index eb3fe79bfa10..70a68c00267a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -631,10 +631,7 @@ struct arm_smmu_priq {
/* High-level stream table and context descriptor structures */ struct arm_smmu_strtab_l1_desc { - u8 span; - struct arm_smmu_ste *l2ptr; - dma_addr_t l2ptr_dma; };
struct arm_smmu_ctx_desc {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJa2bb820e862d61f9ca1499e50091... CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------- Since v5.12 the rbtree has gained some simplifying helpers aimed at making rb tree users write less convoluted boiler plate code. Instead the caller provides a single comparison function and the helpers generate the prior open-coded stuff.
Update smmu->streams to use rb_find_add() and rb_find().
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v3-9fef8cdc2ff6+150d1-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 68 ++++++++++----------- 1 file changed, 31 insertions(+), 37 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5d17f52e3f8c..5cac4f67c98b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1915,26 +1915,37 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return 0; }
+static int arm_smmu_streams_cmp_key(const void *lhs, const struct rb_node *rhs) +{ + struct arm_smmu_stream *stream_rhs = + rb_entry(rhs, struct arm_smmu_stream, node); + const u32 *sid_lhs = lhs; + + if (*sid_lhs < stream_rhs->id) + return -1; + if (*sid_lhs > stream_rhs->id) + return 1; + return 0; +} + +static int arm_smmu_streams_cmp_node(struct rb_node *lhs, + const struct rb_node *rhs) +{ + return arm_smmu_streams_cmp_key( + &rb_entry(lhs, struct arm_smmu_stream, node)->id, rhs); +} + static struct arm_smmu_master * arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid) { struct rb_node *node; - struct arm_smmu_stream *stream;
lockdep_assert_held(&smmu->streams_mutex);
- node = smmu->streams.rb_node; - while (node) { - stream = rb_entry(node, struct arm_smmu_stream, node); - if (stream->id < sid) - node = node->rb_right; - else if (stream->id > sid) - node = node->rb_left; - else - return stream->master; - } - - return NULL; + node = rb_find(&sid, &smmu->streams, arm_smmu_streams_cmp_key); + if (!node) + return NULL; + return rb_entry(node, struct arm_smmu_stream, node)->master; }
/* IRQ and event handlers */ @@ -3449,8 +3460,6 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu, { int i; int ret = 0; - struct arm_smmu_stream *new_stream, *cur_stream; - struct rb_node **new_node, *parent_node = NULL; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams), @@ -3461,9 +3470,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
mutex_lock(&smmu->streams_mutex); for (i = 0; i < fwspec->num_ids; i++) { + struct arm_smmu_stream *new_stream = &master->streams[i]; u32 sid = fwspec->ids[i];
- new_stream = &master->streams[i]; new_stream->id = sid; new_stream->master = master;
@@ -3472,28 +3481,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu, break;
/* Insert into SID tree */ - new_node = &(smmu->streams.rb_node); - while (*new_node) { - cur_stream = rb_entry(*new_node, struct arm_smmu_stream, - node); - parent_node = *new_node; - if (cur_stream->id > new_stream->id) { - new_node = &((*new_node)->rb_left); - } else if (cur_stream->id < new_stream->id) { - new_node = &((*new_node)->rb_right); - } else { - dev_warn(master->dev, - "stream %u already in tree\n", - cur_stream->id); - ret = -EINVAL; - break; - } - } - if (ret) + if (rb_find_add(&new_stream->node, &smmu->streams, + arm_smmu_streams_cmp_node)) { + dev_warn(master->dev, "stream %u already in tree\n", + sid); + ret = -EINVAL; break; - - rb_link_node(&new_stream->node, parent_node, new_node); - rb_insert_color(&new_stream->node, &smmu->streams); + } }
if (ret) {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit ce410410f1a7db0259ca9282a285fb80fd553b8c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Don't open code the calculations of the indexes for each level, provide two functions to do that math and call them in all the places. Update all the places computing indexes.
Calculate the L1 table size directly based on the max required index from the cap. Remove STRTAB_L1_SZ_SHIFT in favour of STRTAB_NUM_L2_STES.
Use STRTAB_NUM_L2_STES to replace remaining open coded 1 << STRTAB_SPLIT.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 45 +++++++++------------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 ++++++- 2 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5cac4f67c98b..1720adc1c9ca 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1890,17 +1890,15 @@ static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) { size_t size; - void *strtab; dma_addr_t l2ptr_dma; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT]; + struct arm_smmu_strtab_l1_desc *desc;
+ desc = &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]; if (desc->l2ptr) return 0;
- size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); - strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS]; - + size = STRTAB_NUM_L2_STES * sizeof(struct arm_smmu_ste); desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL); if (!desc->l2ptr) { @@ -1910,8 +1908,9 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); - arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma); + arm_smmu_init_initial_stes(desc->l2ptr, STRTAB_NUM_L2_STES); + arm_smmu_write_strtab_l1_desc(&cfg->strtab[arm_smmu_strtab_l1_idx(sid)], + l2ptr_dma); return 0; }
@@ -2703,12 +2702,9 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { - unsigned int idx1, idx2; - /* Two-level walk */ - idx1 = (sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS; - idx2 = sid & ((1 << STRTAB_SPLIT) - 1); - return &cfg->l1_desc[idx1].l2ptr[idx2]; + return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)] + .l2ptr[arm_smmu_strtab_l2_idx(sid)]; } else { /* Simple linear lookup */ return (struct arm_smmu_ste *)&cfg @@ -3434,12 +3430,9 @@ struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid) { - unsigned long limit = smmu->strtab_cfg.num_l1_ents; - if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) - limit *= 1UL << STRTAB_SPLIT; - - return sid < limit; + return arm_smmu_strtab_l1_idx(sid) < smmu->strtab_cfg.num_l1_ents; + return sid < smmu->strtab_cfg.num_l1_ents; }
static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid) @@ -4156,20 +4149,19 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { void *strtab; u64 reg; - u32 size, l1size; + u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; int ret = 0; + unsigned int last_sid_idx = + arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */ - size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); - size = min(size, smmu->sid_bits - STRTAB_SPLIT); - cfg->num_l1_ents = 1 << size; - - size += STRTAB_SPLIT; - if (size < smmu->sid_bits) + cfg->num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); + if (cfg->num_l1_ents <= last_sid_idx) dev_warn(smmu->dev, "2-level strtab only covers %u/%u bits of SID\n", - size, smmu->sid_bits); + ilog2(cfg->num_l1_ents * STRTAB_NUM_L2_STES), + smmu->sid_bits);
l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3); strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma, @@ -4184,7 +4176,8 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
/* Configure strtab_base_cfg for 2 levels */ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL); - reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size); + reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, + ilog2(cfg->num_l1_ents) + STRTAB_SPLIT); reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); cfg->strtab_base_cfg = reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 70a68c00267a..74322dc988e0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -234,7 +234,6 @@ * 2lvl: 128k L1 entries, * 256 lazy entries per table (each table covers a PCI bus) */ -#define STRTAB_L1_SZ_SHIFT 20 #define STRTAB_SPLIT 8
#define STRTAB_L1_DESC_DWORDS 1 @@ -247,6 +246,19 @@ struct arm_smmu_ste { __le64 data[STRTAB_STE_DWORDS]; };
+#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT) +#define STRTAB_MAX_L1_ENTRIES (1 << 17) + +static inline u32 arm_smmu_strtab_l1_idx(u32 sid) +{ + return sid / STRTAB_NUM_L2_STES; +} + +static inline u32 arm_smmu_strtab_l2_idx(u32 sid) +{ + return sid % STRTAB_NUM_L2_STES; +} + #define STRTAB_STE_0_V (1UL << 0) #define STRTAB_STE_0_CFG GENMASK_ULL(3, 1) #define STRTAB_STE_0_CFG_ABORT 0
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit abb4f9d323a8d53870cc842d3c5024f71c2d4951 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add types struct arm_smmu_strtab_l1 and l2 to represent the HW layout of the descriptors, and use them in most places, following patches will get the remaing places. The size of the l1 and l2 HW allocations are sizeof(struct arm_smmu_strtab_l1/2).
This provides some more clarity than having raw __le64 *'s and sizes computed via macros.
Remove STRTAB_L1_DESC_DWORDS.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 +++++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 10 ++++++++-- 2 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 1720adc1c9ca..a50fde2e0306 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1681,7 +1681,8 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) }
/* Stream table manipulation functions */ -static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma) +static void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst, + dma_addr_t l2ptr_dma) { u64 val = 0;
@@ -1689,7 +1690,7 @@ static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma) val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
/* The HW has 64 bit atomicity with stores to the L2 STE table */ - WRITE_ONCE(*dst, cpu_to_le64(val)); + WRITE_ONCE(dst->l2ptr, cpu_to_le64(val)); }
struct arm_smmu_ste_writer { @@ -1889,18 +1890,17 @@ static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab,
static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) { - size_t size; dma_addr_t l2ptr_dma; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; struct arm_smmu_strtab_l1_desc *desc; + __le64 *dst;
desc = &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]; if (desc->l2ptr) return 0;
- size = STRTAB_NUM_L2_STES * sizeof(struct arm_smmu_ste); - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, - GFP_KERNEL); + desc->l2ptr = dmam_alloc_coherent(smmu->dev, sizeof(*desc->l2ptr), + &l2ptr_dma, GFP_KERNEL); if (!desc->l2ptr) { dev_err(smmu->dev, "failed to allocate l2 stream table for SID %u\n", @@ -1908,8 +1908,9 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; }
- arm_smmu_init_initial_stes(desc->l2ptr, STRTAB_NUM_L2_STES); - arm_smmu_write_strtab_l1_desc(&cfg->strtab[arm_smmu_strtab_l1_idx(sid)], + arm_smmu_init_initial_stes(desc->l2ptr->stes, STRTAB_NUM_L2_STES); + dst = &cfg->strtab[arm_smmu_strtab_l1_idx(sid)]; + arm_smmu_write_strtab_l1_desc((struct arm_smmu_strtab_l1 *)dst, l2ptr_dma); return 0; } @@ -2704,7 +2705,7 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { /* Two-level walk */ return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)] - .l2ptr[arm_smmu_strtab_l2_idx(sid)]; + .l2ptr->stes[arm_smmu_strtab_l2_idx(sid)]; } else { /* Simple linear lookup */ return (struct arm_smmu_ste *)&cfg @@ -4163,7 +4164,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) ilog2(cfg->num_l1_ents * STRTAB_NUM_L2_STES), smmu->sid_bits);
- l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3); + l1size = cfg->num_l1_ents * sizeof(struct arm_smmu_strtab_l1); strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma, GFP_KERNEL); if (!strtab) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 74322dc988e0..a751529a0225 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -236,7 +236,6 @@ */ #define STRTAB_SPLIT 8
-#define STRTAB_L1_DESC_DWORDS 1 #define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0) #define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
@@ -247,6 +246,13 @@ struct arm_smmu_ste { };
#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT) +struct arm_smmu_strtab_l2 { + struct arm_smmu_ste stes[STRTAB_NUM_L2_STES]; +}; + +struct arm_smmu_strtab_l1 { + __le64 l2ptr; +}; #define STRTAB_MAX_L1_ENTRIES (1 << 17)
static inline u32 arm_smmu_strtab_l1_idx(u32 sid) @@ -643,7 +649,7 @@ struct arm_smmu_priq {
/* High-level stream table and context descriptor structures */ struct arm_smmu_strtab_l1_desc { - struct arm_smmu_ste *l2ptr; + struct arm_smmu_strtab_l2 *l2ptr; };
struct arm_smmu_ctx_desc {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 85196f54743d97b0678e7889df72fdcc58ab2b02 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The members here are being used for both the linear and the 2 level case, with the meaning of each item slightly different in the two cases.
Split it into a clean union where both cases have their own struct with their own logical names and correct types.
Adjust all the users to detect linear/2lvl and use the right sub structure and types consistently.
Remove STRTAB_STE_DWORDS by changing the last places to use sizeof(struct arm_smmu_ste).
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 78 ++++++++++----------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 22 +++--- 2 files changed, 50 insertions(+), 50 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a50fde2e0306..ce9a81caebe5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1892,25 +1892,24 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) { dma_addr_t l2ptr_dma; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - struct arm_smmu_strtab_l1_desc *desc; - __le64 *dst; + struct arm_smmu_strtab_l2 **l2table;
- desc = &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]; - if (desc->l2ptr) + l2table = &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)]; + if (*l2table) return 0;
- desc->l2ptr = dmam_alloc_coherent(smmu->dev, sizeof(*desc->l2ptr), - &l2ptr_dma, GFP_KERNEL); - if (!desc->l2ptr) { + *l2table = dmam_alloc_coherent(smmu->dev, sizeof(**l2table), + &l2ptr_dma, GFP_KERNEL); + if (!*l2table) { dev_err(smmu->dev, "failed to allocate l2 stream table for SID %u\n", sid); return -ENOMEM; }
- arm_smmu_init_initial_stes(desc->l2ptr->stes, STRTAB_NUM_L2_STES); - dst = &cfg->strtab[arm_smmu_strtab_l1_idx(sid)]; - arm_smmu_write_strtab_l1_desc((struct arm_smmu_strtab_l1 *)dst, + arm_smmu_init_initial_stes((*l2table)->stes, + ARRAY_SIZE((*l2table)->stes)); + arm_smmu_write_strtab_l1_desc(&cfg->l2.l1tab[arm_smmu_strtab_l1_idx(sid)], l2ptr_dma); return 0; } @@ -2704,12 +2703,11 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { /* Two-level walk */ - return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)] - .l2ptr->stes[arm_smmu_strtab_l2_idx(sid)]; + return &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)] + ->stes[arm_smmu_strtab_l2_idx(sid)]; } else { /* Simple linear lookup */ - return (struct arm_smmu_ste *)&cfg - ->strtab[sid * STRTAB_STE_DWORDS]; + return &cfg->linear.table[sid]; } }
@@ -3432,8 +3430,8 @@ struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid) { if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) - return arm_smmu_strtab_l1_idx(sid) < smmu->strtab_cfg.num_l1_ents; - return sid < smmu->strtab_cfg.num_l1_ents; + return arm_smmu_strtab_l1_idx(sid) < smmu->strtab_cfg.l2.num_l1_ents; + return sid < smmu->strtab_cfg.linear.num_ents; }
static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid) @@ -4148,7 +4146,6 @@ static int arm_smmu_prepare_init_l2_strtab(struct device *dev, void *data)
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { - void *strtab; u64 reg; u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; @@ -4157,34 +4154,33 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */ - cfg->num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); - if (cfg->num_l1_ents <= last_sid_idx) + cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); + if (cfg->l2.num_l1_ents <= last_sid_idx) dev_warn(smmu->dev, "2-level strtab only covers %u/%u bits of SID\n", - ilog2(cfg->num_l1_ents * STRTAB_NUM_L2_STES), + ilog2(cfg->l2.num_l1_ents * STRTAB_NUM_L2_STES), smmu->sid_bits);
- l1size = cfg->num_l1_ents * sizeof(struct arm_smmu_strtab_l1); - strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma, - GFP_KERNEL); - if (!strtab) { + l1size = cfg->l2.num_l1_ents * sizeof(struct arm_smmu_strtab_l1); + cfg->l2.l1tab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->l2.l1_dma, + GFP_KERNEL); + if (!cfg->l2.l1tab) { dev_err(smmu->dev, "failed to allocate l1 stream table (%u bytes)\n", l1size); return -ENOMEM; } - cfg->strtab = strtab;
/* Configure strtab_base_cfg for 2 levels */ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL); reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, - ilog2(cfg->num_l1_ents) + STRTAB_SPLIT); + ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT); reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); cfg->strtab_base_cfg = reg;
- cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents, - sizeof(*cfg->l1_desc), GFP_KERNEL); - if (!cfg->l1_desc) + cfg->l2.l2ptrs = devm_kcalloc(smmu->dev, cfg->l2.num_l1_ents, + sizeof(*cfg->l2.l2ptrs), GFP_KERNEL); + if (!cfg->l2.l2ptrs) ret = -ENOMEM;
#ifdef CONFIG_SMMU_BYPASS_DEV @@ -4198,29 +4194,28 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) { - void *strtab; u64 reg; u32 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3); - strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma, - GFP_KERNEL); - if (!strtab) { + size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); + cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, + &cfg->linear.ste_dma, + GFP_KERNEL); + if (!cfg->linear.table) { dev_err(smmu->dev, "failed to allocate linear stream table (%u bytes)\n", size); return -ENOMEM; } - cfg->strtab = strtab; - cfg->num_l1_ents = 1 << smmu->sid_bits; + cfg->linear.num_ents = 1 << smmu->sid_bits;
/* Configure strtab_base_cfg for a linear table covering all SIDs */ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR); reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); cfg->strtab_base_cfg = reg;
- arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); + arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents); return 0; }
@@ -4229,16 +4224,17 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu) u64 reg; int ret;
- if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { ret = arm_smmu_init_strtab_2lvl(smmu); - else + reg = smmu->strtab_cfg.l2.l1_dma & STRTAB_BASE_ADDR_MASK; + } else { ret = arm_smmu_init_strtab_linear(smmu); - + reg = smmu->strtab_cfg.linear.ste_dma & STRTAB_BASE_ADDR_MASK; + } if (ret) return ret;
/* Set the strtab base address */ - reg = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK; reg |= STRTAB_BASE_RA; smmu->strtab_cfg.strtab_base = reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a751529a0225..ff1c87300775 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -648,10 +648,6 @@ struct arm_smmu_priq { };
/* High-level stream table and context descriptor structures */ -struct arm_smmu_strtab_l1_desc { - struct arm_smmu_strtab_l2 *l2ptr; -}; - struct arm_smmu_ctx_desc { u16 asid; }; @@ -684,11 +680,19 @@ struct arm_smmu_s2_cfg { };
struct arm_smmu_strtab_cfg { - __le64 *strtab; - dma_addr_t strtab_dma; - struct arm_smmu_strtab_l1_desc *l1_desc; - unsigned int num_l1_ents; - + union { + struct { + struct arm_smmu_ste *table; + dma_addr_t ste_dma; + unsigned int num_ents; + } linear; + struct { + struct arm_smmu_strtab_l1 *l1tab; + struct arm_smmu_strtab_l2 **l2ptrs; + dma_addr_t l1_dma; + unsigned int num_l1_ents; + } l2; + }; u64 strtab_base; u32 strtab_base_cfg; };
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 8c153ef95697242b72646d2c4cf6c4b23ccf35a3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
These values can be computed from the other values already stored in the config. Move the calculation to arm_smmu_write_strtab() and do it directly before writing the registers.
This moves all the logic to calculate the two registers into one function from three and saves an unimportant 16 bytes from the arm_smmu_device.
Suggested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c | 6 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 55 +++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 - 3 files changed, 29 insertions(+), 34 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c index 29d4c0c71041..7fe90d410ba9 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-s-smmu-v3.c @@ -770,8 +770,7 @@ void virtcca_smmu_device_init(struct platform_device *pdev, struct arm_smmu_devi params_ptr->is_cmd_queue = 1; params_ptr->smmu_id = smmu->s_smmu_id; params_ptr->ioaddr = smmu->ioaddr; - params_ptr->strtab_base_RA_bit = - (smmu->strtab_cfg.strtab_base >> S_STRTAB_BASE_RA_SHIFT) & 0x1; + params_ptr->strtab_base_RA_bit = 0x1; params_ptr->q_base_RA_WA_bit = (smmu->cmdq.q.q_base >> S_CMDQ_BASE_RA_SHIFT) & 0x1; if (tmi_smmu_device_reset(__pa(params_ptr)) != 0) { @@ -801,8 +800,7 @@ void virtcca_smmu_device_init(struct platform_device *pdev, struct arm_smmu_devi params_ptr->smmu_id = smmu->s_smmu_id; params_ptr->q_base_RA_WA_bit = (smmu->evtq.q.q_base >> S_EVTQ_BASE_WA_SHIFT) & 0x1; - params_ptr->strtab_base_RA_bit = - (smmu->strtab_cfg.strtab_base >> S_STRTAB_BASE_RA_SHIFT) & 0x1; + params_ptr->strtab_base_RA_bit = 0x1; if (tmi_smmu_device_reset(__pa(params_ptr)) != 0) { dev_err(smmu->dev, "S_SMMU: failed to set s event queue regs\n"); smmu->s_smmu_id = ARM_S_SMMU_INVALID_ID; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ce9a81caebe5..0953abb8093d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -4146,7 +4146,6 @@ static int arm_smmu_prepare_init_l2_strtab(struct device *dev, void *data)
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { - u64 reg; u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; int ret = 0; @@ -4171,13 +4170,6 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) return -ENOMEM; }
- /* Configure strtab_base_cfg for 2 levels */ - reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL); - reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, - ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT); - reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); - cfg->strtab_base_cfg = reg; - cfg->l2.l2ptrs = devm_kcalloc(smmu->dev, cfg->l2.num_l1_ents, sizeof(*cfg->l2.l2ptrs), GFP_KERNEL); if (!cfg->l2.l2ptrs) @@ -4194,7 +4186,6 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) { - u64 reg; u32 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
@@ -4210,34 +4201,21 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) } cfg->linear.num_ents = 1 << smmu->sid_bits;
- /* Configure strtab_base_cfg for a linear table covering all SIDs */ - reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR); - reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); - cfg->strtab_base_cfg = reg; - arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents); return 0; }
static int arm_smmu_init_strtab(struct arm_smmu_device *smmu) { - u64 reg; int ret;
- if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) ret = arm_smmu_init_strtab_2lvl(smmu); - reg = smmu->strtab_cfg.l2.l1_dma & STRTAB_BASE_ADDR_MASK; - } else { + else ret = arm_smmu_init_strtab_linear(smmu); - reg = smmu->strtab_cfg.linear.ste_dma & STRTAB_BASE_ADDR_MASK; - } if (ret) return ret;
- /* Set the strtab base address */ - reg |= STRTAB_BASE_RA; - smmu->strtab_cfg.strtab_base = reg; - ida_init(&smmu->vmid_map);
return 0; @@ -4564,6 +4542,30 @@ static int arm_smmu_ecmdq_reset(struct arm_smmu_device *smmu) } #endif
+static void arm_smmu_write_strtab(struct arm_smmu_device *smmu) +{ + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + dma_addr_t dma; + u32 reg; + + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { + reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, + STRTAB_BASE_CFG_FMT_2LVL) | + FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, + ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT) | + FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); + dma = cfg->l2.l1_dma; + } else { + reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, + STRTAB_BASE_CFG_FMT_LINEAR) | + FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); + dma = cfg->linear.ste_dma; + } + writeq_relaxed((dma & STRTAB_BASE_ADDR_MASK) | STRTAB_BASE_RA, + smmu->base + ARM_SMMU_STRTAB_BASE); + writel_relaxed(reg, smmu->base + ARM_SMMU_STRTAB_BASE_CFG); +} + static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume) { int ret; @@ -4599,10 +4601,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume) writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
/* Stream table */ - writeq_relaxed(smmu->strtab_cfg.strtab_base, - smmu->base + ARM_SMMU_STRTAB_BASE); - writel_relaxed(smmu->strtab_cfg.strtab_base_cfg, - smmu->base + ARM_SMMU_STRTAB_BASE_CFG); + arm_smmu_write_strtab(smmu);
/* Command queue */ writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index ff1c87300775..9b19a9407afc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -693,8 +693,6 @@ struct arm_smmu_strtab_cfg { unsigned int num_l1_ents; } l2; }; - u64 strtab_base; - u32 strtab_base_cfg; };
/* An SMMUv3 instance */
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 47b2de35cab2b683f69d03515c2658c2d8515323 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The master->cd_table is entirely contained within the struct arm_smmu_master which is guaranteed to be freed by the core code under arm_smmu_release_device().
There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably called to free the CD related memory. Remove it and save some memory.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++------------ 1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0953abb8093d..30098154f6bd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1409,8 +1409,8 @@ static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, { size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
- l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, - &l1_desc->l2ptr_dma, GFP_KERNEL); + l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size, + &l1_desc->l2ptr_dma, GFP_KERNEL); if (!l1_desc->l2ptr) { dev_warn(smmu->dev, "failed to allocate context descriptor table\n"); @@ -1622,17 +1622,17 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts, CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = devm_kcalloc(smmu->dev, cd_table->num_l1_ents, - sizeof(*cd_table->l1_desc), - GFP_KERNEL); + cd_table->l1_desc = kcalloc(cd_table->num_l1_ents, + sizeof(*cd_table->l1_desc), + GFP_KERNEL); if (!cd_table->l1_desc) return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); }
- cd_table->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cd_table->cdtab_dma, - GFP_KERNEL); + cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size, + &cd_table->cdtab_dma, GFP_KERNEL); if (!cd_table->cdtab) { dev_warn(smmu->dev, "failed to allocate context descriptor\n"); ret = -ENOMEM; @@ -1643,7 +1643,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
err_free_l1: if (cd_table->l1_desc) { - devm_kfree(smmu->dev, cd_table->l1_desc); + kfree(cd_table->l1_desc); cd_table->l1_desc = NULL; } return ret; @@ -1663,21 +1663,18 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) if (!cd_table->l1_desc[i].l2ptr) continue;
- dmam_free_coherent(smmu->dev, size, - cd_table->l1_desc[i].l2ptr, - cd_table->l1_desc[i].l2ptr_dma); + dma_free_coherent(smmu->dev, size, + cd_table->l1_desc[i].l2ptr, + cd_table->l1_desc[i].l2ptr_dma); } - devm_kfree(smmu->dev, cd_table->l1_desc); - cd_table->l1_desc = NULL; + kfree(cd_table->l1_desc);
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); } else { l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3); }
- dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma); - cd_table->cdtab_dma = 0; - cd_table->cdtab = NULL; + dma_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma); }
/* Stream table manipulation functions */
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit c0a25a96dee9c3af01fbcad227871fc0f222900b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The top of the 2 level CD table is (at most) 1024 entries big, and two high order allocations are required. One of __le64 which is programmed into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the CPU pointer (16k).
There are two copies of the l2ptr_dma, one is stored in the struct arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to use. Instead of storing two copies just decode the value from the __le64.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++------------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - 2 files changed, 18 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 30098154f6bd..ef9516191b4a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1404,29 +1404,17 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master, arm_smmu_preempt_enable(smmu); }
-static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, - struct arm_smmu_l1_ctx_desc *l1_desc) +static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma) { - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
- l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size, - &l1_desc->l2ptr_dma, GFP_KERNEL); - if (!l1_desc->l2ptr) { - dev_warn(smmu->dev, - "failed to allocate context descriptor table\n"); - return -ENOMEM; - } - return 0; + /* The HW has 64 bit atomicity with stores to the L2 CD table */ + WRITE_ONCE(*dst, cpu_to_le64(val)); }
-static void arm_smmu_write_cd_l1_desc(__le64 *dst, - struct arm_smmu_l1_ctx_desc *l1_desc) +static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src) { - u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | - CTXDESC_L1_DESC_V; - - /* The HW has 64 bit atomicity with stores to the L2 CD table */ - WRITE_ONCE(*dst, cpu_to_le64(val)); + return le64_to_cpu(*src) & CTXDESC_L1_DESC_L2PTR_MASK; }
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, @@ -1468,13 +1456,18 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
l1_desc = &cd_table->l1_desc[idx]; if (!l1_desc->l2ptr) { - __le64 *l1ptr; - - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + dma_addr_t l2ptr_dma; + size_t size; + + size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd); + l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size, + &l2ptr_dma, + GFP_KERNEL); + if (!l1_desc->l2ptr) return NULL;
- l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; - arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); + arm_smmu_write_cd_l1_desc(&cd_table->cdtab[idx], + l2ptr_dma); /* An invalid L1CD can be cached */ arm_smmu_sync_cd(master, ssid, false); } @@ -1665,7 +1658,8 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
dma_free_coherent(smmu->dev, size, cd_table->l1_desc[i].l2ptr, - cd_table->l1_desc[i].l2ptr_dma); + arm_smmu_cd_l1_get_desc( + &cd_table->cdtab[i])); } kfree(cd_table->l1_desc);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 9b19a9407afc..b5ac794a7df2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -654,7 +654,6 @@ struct arm_smmu_ctx_desc {
struct arm_smmu_l1_ctx_desc { struct arm_smmu_cd *l2ptr; - dma_addr_t l2ptr_dma; };
struct arm_smmu_ctx_desc_cfg {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 7c567eb1e1d2a835140091ff8d4b73ac5454ba7b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As well as indexing helpers arm_smmu_cdtab_l1/2_idx().
Remove CTXDESC_L1_DESC_DWORDS and CTXDESC_CD_DWORDS replacing them all with type specific calculations.
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 45 +++++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 23 +++++++++-- 2 files changed, 44 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ef9516191b4a..062f9800c20d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1404,17 +1404,18 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master, arm_smmu_preempt_enable(smmu); }
-static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma) +static void arm_smmu_write_cd_l1_desc(struct arm_smmu_cdtab_l1 *dst, + dma_addr_t l2ptr_dma) { u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
/* The HW has 64 bit atomicity with stores to the L2 CD table */ - WRITE_ONCE(*dst, cpu_to_le64(val)); + WRITE_ONCE(dst->l2ptr, cpu_to_le64(val)); }
-static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src) +static dma_addr_t arm_smmu_cd_l1_get_desc(const struct arm_smmu_cdtab_l1 *src) { - return le64_to_cpu(*src) & CTXDESC_L1_DESC_L2PTR_MASK; + return le64_to_cpu(src->l2ptr) & CTXDESC_L1_DESC_L2PTR_MASK; }
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, @@ -1427,13 +1428,12 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, return NULL;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) - return (struct arm_smmu_cd *)(cd_table->cdtab + - ssid * CTXDESC_CD_DWORDS); + return &((struct arm_smmu_cd *)cd_table->cdtab)[ssid];
- l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES]; + l1_desc = &cd_table->l1_desc[arm_smmu_cdtab_l1_idx(ssid)]; if (!l1_desc->l2ptr) return NULL; - return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES]; + return &l1_desc->l2ptr->cds[arm_smmu_cdtab_l2_idx(ssid)]; }
static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, @@ -1451,11 +1451,12 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, }
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) { - unsigned int idx = ssid / CTXDESC_L2_ENTRIES; + unsigned int idx = arm_smmu_cdtab_l1_idx(ssid); struct arm_smmu_l1_ctx_desc *l1_desc;
l1_desc = &cd_table->l1_desc[idx]; if (!l1_desc->l2ptr) { + struct arm_smmu_cdtab_l1 *dst; dma_addr_t l2ptr_dma; size_t size;
@@ -1466,8 +1467,8 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, if (!l1_desc->l2ptr) return NULL;
- arm_smmu_write_cd_l1_desc(&cd_table->cdtab[idx], - l2ptr_dma); + dst = &((struct arm_smmu_cdtab_l1 *)cd_table->cdtab)[idx]; + arm_smmu_write_cd_l1_desc(dst, l2ptr_dma); /* An invalid L1CD can be cached */ arm_smmu_sync_cd(master, ssid, false); } @@ -1609,7 +1610,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; cd_table->num_l1_ents = max_contexts;
- l1size = max_contexts * (CTXDESC_CD_DWORDS << 3); + l1size = max_contexts * sizeof(struct arm_smmu_cd); } else { cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2; cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts, @@ -1621,7 +1622,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) if (!cd_table->l1_desc) return -ENOMEM;
- l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cdtab_l1); }
cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size, @@ -1645,27 +1646,29 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) { int i; - size_t size, l1size; + size_t l1size; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->l1_desc) { - size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); - for (i = 0; i < cd_table->num_l1_ents; i++) { + dma_addr_t dma_handle; + if (!cd_table->l1_desc[i].l2ptr) continue;
- dma_free_coherent(smmu->dev, size, + dma_handle = arm_smmu_cd_l1_get_desc(&( + (struct arm_smmu_cdtab_l1 *)cd_table->cdtab)[i]); + dma_free_coherent(smmu->dev, + sizeof(*cd_table->l1_desc[i].l2ptr), cd_table->l1_desc[i].l2ptr, - arm_smmu_cd_l1_get_desc( - &cd_table->cdtab[i])); + dma_handle); } kfree(cd_table->l1_desc);
- l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cdtab_l1); } else { - l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3); + l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cd); }
dma_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index b5ac794a7df2..fdd9f4555ec2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -330,7 +330,6 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) */ #define CTXDESC_L2_ENTRIES 1024
-#define CTXDESC_L1_DESC_DWORDS 1 #define CTXDESC_L1_DESC_V (1UL << 0) #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
@@ -340,6 +339,24 @@ struct arm_smmu_cd { __le64 data[CTXDESC_CD_DWORDS]; };
+struct arm_smmu_cdtab_l2 { + struct arm_smmu_cd cds[CTXDESC_L2_ENTRIES]; +}; + +struct arm_smmu_cdtab_l1 { + __le64 l2ptr; +}; + +static inline unsigned int arm_smmu_cdtab_l1_idx(unsigned int ssid) +{ + return ssid / CTXDESC_L2_ENTRIES; +} + +static inline unsigned int arm_smmu_cdtab_l2_idx(unsigned int ssid) +{ + return ssid % CTXDESC_L2_ENTRIES; +} + #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6) #define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8) @@ -370,7 +387,7 @@ struct arm_smmu_cd { * When the SMMU only supports linear context descriptor tables, pick a * reasonable size limit (64kB). */ -#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3)) +#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / sizeof(struct arm_smmu_cd))
/* Command queue */ #define CMDQ_ENT_SZ_SHIFT 4 @@ -653,7 +670,7 @@ struct arm_smmu_ctx_desc { };
struct arm_smmu_l1_ctx_desc { - struct arm_smmu_cd *l2ptr; + struct arm_smmu_cdtab_l2 *l2ptr; };
struct arm_smmu_ctx_desc_cfg {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit e3b1be2e73dbe599f8b8886e120d206aa87e90f9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The members here are being used for both the linear and the 2 level case, with the meaning of each item slightly different in the two cases.
Split it into a clean union where both cases have their own struct with their own logical names and correct types.
Adjust all the users to detect linear/2lvl and use the right sub structure and types consistently.
Remove CTXDESC_CD_DWORDS by changing the last places to use sizeof(struct arm_smmu_cd).
Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/8-v4-6416877274e1+1af-smmuv3_tidy_jgg@nvidia.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 115 ++++++++++---------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 ++-- 2 files changed, 72 insertions(+), 67 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 062f9800c20d..f9966cf1f5dd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1421,19 +1421,19 @@ static dma_addr_t arm_smmu_cd_l1_get_desc(const struct arm_smmu_cdtab_l1 *src) struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) { - struct arm_smmu_l1_ctx_desc *l1_desc; + struct arm_smmu_cdtab_l2 *l2; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (!cd_table->cdtab) + if (!arm_smmu_cdtab_allocated(cd_table)) return NULL;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) - return &((struct arm_smmu_cd *)cd_table->cdtab)[ssid]; + return &cd_table->linear.table[ssid];
- l1_desc = &cd_table->l1_desc[arm_smmu_cdtab_l1_idx(ssid)]; - if (!l1_desc->l2ptr) + l2 = cd_table->l2.l2ptrs[arm_smmu_cdtab_l1_idx(ssid)]; + if (!l2) return NULL; - return &l1_desc->l2ptr->cds[arm_smmu_cdtab_l2_idx(ssid)]; + return &l2->cds[arm_smmu_cdtab_l2_idx(ssid)]; }
static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, @@ -1445,30 +1445,25 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, might_sleep(); iommu_group_mutex_assert(master->dev);
- if (!cd_table->cdtab) { + if (!arm_smmu_cdtab_allocated(cd_table)) { if (arm_smmu_alloc_cd_tables(master)) return NULL; }
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) { unsigned int idx = arm_smmu_cdtab_l1_idx(ssid); - struct arm_smmu_l1_ctx_desc *l1_desc; + struct arm_smmu_cdtab_l2 **l2ptr = &cd_table->l2.l2ptrs[idx];
- l1_desc = &cd_table->l1_desc[idx]; - if (!l1_desc->l2ptr) { - struct arm_smmu_cdtab_l1 *dst; + if (!*l2ptr) { dma_addr_t l2ptr_dma; - size_t size;
- size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd); - l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size, - &l2ptr_dma, - GFP_KERNEL); - if (!l1_desc->l2ptr) + *l2ptr = dma_alloc_coherent(smmu->dev, sizeof(**l2ptr), + &l2ptr_dma, GFP_KERNEL); + if (!*l2ptr) return NULL;
- dst = &((struct arm_smmu_cdtab_l1 *)cd_table->cdtab)[idx]; - arm_smmu_write_cd_l1_desc(dst, l2ptr_dma); + arm_smmu_write_cd_l1_desc(&cd_table->l2.l1tab[idx], + l2ptr_dma); /* An invalid L1CD can be cached */ arm_smmu_sync_cd(master, ssid, false); } @@ -1586,7 +1581,7 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) struct arm_smmu_cd target = {}; struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab) + if (!arm_smmu_cdtab_allocated(&master->cd_table)) return; cdptr = arm_smmu_get_cd_ptr(master, ssid); if (WARN_ON(!cdptr)) @@ -1608,70 +1603,70 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) || max_contexts <= CTXDESC_L2_ENTRIES) { cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; - cd_table->num_l1_ents = max_contexts; + cd_table->linear.num_ents = max_contexts;
- l1size = max_contexts * sizeof(struct arm_smmu_cd); + l1size = max_contexts * sizeof(struct arm_smmu_cd), + cd_table->linear.table = dma_alloc_coherent(smmu->dev, l1size, + &cd_table->cdtab_dma, + GFP_KERNEL); + if (!cd_table->linear.table) + return -ENOMEM; } else { cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2; - cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts, - CTXDESC_L2_ENTRIES); + cd_table->l2.num_l1_ents = + DIV_ROUND_UP(max_contexts, CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = kcalloc(cd_table->num_l1_ents, - sizeof(*cd_table->l1_desc), - GFP_KERNEL); - if (!cd_table->l1_desc) + cd_table->l2.l2ptrs = kcalloc(cd_table->l2.num_l1_ents, + sizeof(*cd_table->l2.l2ptrs), + GFP_KERNEL); + if (!cd_table->l2.l2ptrs) return -ENOMEM;
- l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cdtab_l1); - } - - cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size, - &cd_table->cdtab_dma, GFP_KERNEL); - if (!cd_table->cdtab) { - dev_warn(smmu->dev, "failed to allocate context descriptor\n"); - ret = -ENOMEM; - goto err_free_l1; + l1size = cd_table->l2.num_l1_ents * sizeof(struct arm_smmu_cdtab_l1); + cd_table->l2.l1tab = dma_alloc_coherent(smmu->dev, l1size, + &cd_table->cdtab_dma, + GFP_KERNEL); + if (!cd_table->l2.l2ptrs) { + ret = -ENOMEM; + goto err_free_l2ptrs; + } } - return 0;
-err_free_l1: - if (cd_table->l1_desc) { - kfree(cd_table->l1_desc); - cd_table->l1_desc = NULL; - } +err_free_l2ptrs: + kfree(cd_table->l2.l2ptrs); + cd_table->l2.l2ptrs = NULL; return ret; }
static void arm_smmu_free_cd_tables(struct arm_smmu_master *master) { int i; - size_t l1size; struct arm_smmu_device *smmu = master->smmu; struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (cd_table->l1_desc) { - for (i = 0; i < cd_table->num_l1_ents; i++) { - dma_addr_t dma_handle; - - if (!cd_table->l1_desc[i].l2ptr) + if (cd_table->s1fmt != STRTAB_STE_0_S1FMT_LINEAR) { + for (i = 0; i < cd_table->l2.num_l1_ents; i++) { + if (!cd_table->l2.l2ptrs[i]) continue;
- dma_handle = arm_smmu_cd_l1_get_desc(&( - (struct arm_smmu_cdtab_l1 *)cd_table->cdtab)[i]); dma_free_coherent(smmu->dev, - sizeof(*cd_table->l1_desc[i].l2ptr), - cd_table->l1_desc[i].l2ptr, - dma_handle); + sizeof(*cd_table->l2.l2ptrs[i]), + cd_table->l2.l2ptrs[i], + arm_smmu_cd_l1_get_desc(&cd_table->l2.l1tab[i])); } - kfree(cd_table->l1_desc); + kfree(cd_table->l2.l2ptrs);
- l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cdtab_l1); + dma_free_coherent(smmu->dev, + cd_table->l2.num_l1_ents * + sizeof(struct arm_smmu_cdtab_l1), + cd_table->l2.l1tab, cd_table->cdtab_dma); } else { - l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cd); + dma_free_coherent(smmu->dev, + cd_table->linear.num_ents * + sizeof(struct arm_smmu_cd), + cd_table->linear.table, cd_table->cdtab_dma); } - - dma_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma); }
/* Stream table manipulation functions */ @@ -3573,7 +3568,7 @@ static void arm_smmu_release_device(struct device *dev)
arm_smmu_disable_pasid(master); arm_smmu_remove_master(master); - if (master->cd_table.cdtab) + if (arm_smmu_cdtab_allocated(&master->cd_table)) arm_smmu_free_cd_tables(master); kfree(master); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index fdd9f4555ec2..45cad8ba28c4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -669,15 +669,19 @@ struct arm_smmu_ctx_desc { u16 asid; };
-struct arm_smmu_l1_ctx_desc { - struct arm_smmu_cdtab_l2 *l2ptr; -}; - struct arm_smmu_ctx_desc_cfg { - __le64 *cdtab; + union { + struct { + struct arm_smmu_cd *table; + unsigned int num_ents; + } linear; + struct { + struct arm_smmu_cdtab_l1 *l1tab; + struct arm_smmu_cdtab_l2 **l2ptrs; + unsigned int num_l1_ents; + } l2; + }; dma_addr_t cdtab_dma; - struct arm_smmu_l1_ctx_desc *l1_desc; - unsigned int num_l1_ents; unsigned int used_ssids; u8 in_ste; u8 s1fmt; @@ -685,6 +689,12 @@ struct arm_smmu_ctx_desc_cfg { u8 s1cdmax; };
+static inline bool +arm_smmu_cdtab_allocated(struct arm_smmu_ctx_desc_cfg *cfg) +{ + return cfg->linear.table || cfg->l2.l1tab; +} + /* True if the cd table has SSIDS > 0 in use. */ static inline bool arm_smmu_ssids_in_use(struct arm_smmu_ctx_desc_cfg *cd_table) {
From: Nicolin Chen nicolinc@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 1d4684fbe88dc28e2bf79f5e94a432f0469d2dac category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Reorder include files to alphabetic order to simplify maintenance, and separate local headers and global headers with a blank line.
No functional change intended.
Link: https://patch.msgid.link/r/7524b037cc05afe19db3c18f863253e1d1554fa2.17226448... Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 4 ++-- drivers/iommu/iommufd/fault.c | 4 ++-- drivers/iommu/iommufd/io_pagetable.c | 8 ++++---- drivers/iommu/iommufd/io_pagetable.h | 2 +- drivers/iommu/iommufd/ioas.c | 2 +- drivers/iommu/iommufd/iommufd_private.h | 9 +++++---- drivers/iommu/iommufd/iommufd_test.h | 2 +- drivers/iommu/iommufd/iova_bitmap.c | 2 +- drivers/iommu/iommufd/main.c | 8 ++++---- drivers/iommu/iommufd/pages.c | 10 +++++----- drivers/iommu/iommufd/selftest.c | 9 +++++---- include/linux/iommufd.h | 4 ++-- include/uapi/linux/iommufd.h | 2 +- 13 files changed, 34 insertions(+), 32 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 9a7ec5997c61..11573a84f68a 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -1,12 +1,12 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES */ +#include <linux/iommu.h> #include <linux/iommufd.h> #include <linux/slab.h> -#include <linux/iommu.h> #include <uapi/linux/iommufd.h> -#include "../iommu-priv.h"
+#include "../iommu-priv.h" #include "io_pagetable.h" #include "iommufd_private.h"
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index a643d5c7c535..df03411c8728 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -3,14 +3,14 @@ */ #define pr_fmt(fmt) "iommufd: " fmt
+#include <linux/anon_inodes.h> #include <linux/file.h> #include <linux/fs.h> +#include <linux/iommufd.h> #include <linux/module.h> #include <linux/mutex.h> -#include <linux/iommufd.h> #include <linux/pci.h> #include <linux/poll.h> -#include <linux/anon_inodes.h> #include <uapi/linux/iommufd.h>
#include "../iommu-priv.h" diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index 9f193c933de6..4bf7ccd39d46 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -8,17 +8,17 @@ * The datastructure uses the iopt_pages to optimize the storage of the PFNs * between the domains and xarray. */ +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/iommu.h> #include <linux/iommufd.h> #include <linux/lockdep.h> -#include <linux/iommu.h> #include <linux/sched/mm.h> -#include <linux/err.h> #include <linux/slab.h> -#include <linux/errno.h> #include <uapi/linux/iommufd.h>
-#include "io_pagetable.h" #include "double_span.h" +#include "io_pagetable.h"
struct iopt_pages_list { struct iopt_pages *pages; diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h index 0ec3509b7e33..c61d74471684 100644 --- a/drivers/iommu/iommufd/io_pagetable.h +++ b/drivers/iommu/iommufd/io_pagetable.h @@ -6,8 +6,8 @@ #define __IO_PAGETABLE_H
#include <linux/interval_tree.h> -#include <linux/mutex.h> #include <linux/kref.h> +#include <linux/mutex.h> #include <linux/xarray.h>
#include "iommufd_private.h" diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c index 157a89b993e4..2c4b2bb11e78 100644 --- a/drivers/iommu/iommufd/ioas.c +++ b/drivers/iommu/iommufd/ioas.c @@ -3,8 +3,8 @@ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES */ #include <linux/interval_tree.h> -#include <linux/iommufd.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <uapi/linux/iommufd.h>
#include "io_pagetable.h" diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 92efe30a8f0d..017e50574f3b 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -4,13 +4,14 @@ #ifndef __IOMMUFD_PRIVATE_H #define __IOMMUFD_PRIVATE_H
-#include <linux/rwsem.h> -#include <linux/xarray.h> -#include <linux/refcount.h> -#include <linux/uaccess.h> #include <linux/iommu.h> #include <linux/iova_bitmap.h> +#include <linux/refcount.h> +#include <linux/rwsem.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> #include <uapi/linux/iommufd.h> + #include "../iommu-priv.h"
struct iommu_domain; diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index acbbba1c6671..f4bc23a92f9a 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -4,8 +4,8 @@ #ifndef _UAPI_IOMMUFD_TEST_H #define _UAPI_IOMMUFD_TEST_H
-#include <linux/types.h> #include <linux/iommufd.h> +#include <linux/types.h>
enum { IOMMU_TEST_OP_ADD_RESERVED = 1, diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c index 8068cd20f41f..ca5a1757adcd 100644 --- a/drivers/iommu/iommufd/iova_bitmap.c +++ b/drivers/iommu/iommufd/iova_bitmap.c @@ -3,10 +3,10 @@ * Copyright (c) 2022, Oracle and/or its affiliates. * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved */ +#include <linux/highmem.h> #include <linux/iova_bitmap.h> #include <linux/mm.h> #include <linux/slab.h> -#include <linux/highmem.h>
#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_BYTE)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 83bbd7c5d160..b5f5d27ee963 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -8,15 +8,15 @@ */ #define pr_fmt(fmt) "iommufd: " fmt
+#include <linux/bug.h> #include <linux/file.h> #include <linux/fs.h> -#include <linux/module.h> -#include <linux/slab.h> +#include <linux/iommufd.h> #include <linux/miscdevice.h> +#include <linux/module.h> #include <linux/mutex.h> -#include <linux/bug.h> +#include <linux/slab.h> #include <uapi/linux/iommufd.h> -#include <linux/iommufd.h>
#include "io_pagetable.h" #include "iommufd_private.h" diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index 528f356238b3..e4b484e87943 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -45,16 +45,16 @@ * last_iova + 1 can overflow. An iopt_pages index will always be much less than * ULONG_MAX so last_index + 1 cannot overflow. */ +#include <linux/highmem.h> +#include <linux/iommu.h> +#include <linux/iommufd.h> +#include <linux/kthread.h> #include <linux/overflow.h> #include <linux/slab.h> -#include <linux/iommu.h> #include <linux/sched/mm.h> -#include <linux/highmem.h> -#include <linux/kthread.h> -#include <linux/iommufd.h>
-#include "io_pagetable.h" #include "double_span.h" +#include "io_pagetable.h"
#ifndef CONFIG_IOMMUFD_TEST #define TEMP_MEMORY_LIMIT 65536 diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 1133f1b2362f..d53e579013fd 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -3,13 +3,14 @@ * * Kernel side components to support tools/testing/selftests/iommu */ -#include <linux/slab.h> -#include <linux/iommu.h> -#include <linux/xarray.h> -#include <linux/file.h> #include <linux/anon_inodes.h> +#include <linux/debugfs.h> #include <linux/fault-inject.h> +#include <linux/file.h> +#include <linux/iommu.h> #include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/xarray.h> #include <uapi/linux/iommufd.h>
#include "../iommu-priv.h" diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index ffc3a949f837..c2f2f6b9148e 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -6,9 +6,9 @@ #ifndef __LINUX_IOMMUFD_H #define __LINUX_IOMMUFD_H
-#include <linux/types.h> -#include <linux/errno.h> #include <linux/err.h> +#include <linux/errno.h> +#include <linux/types.h>
struct device; struct iommufd_device; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index e31385b75d0b..7079269dd265 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -4,8 +4,8 @@ #ifndef _UAPI_IOMMUFD_H #define _UAPI_IOMMUFD_H
-#include <linux/types.h> #include <linux/ioctl.h> +#include <linux/types.h>
#define IOMMUFD_TYPE (';')
From: Nicolin Chen nicolinc@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit 3e6a7e3cda773adacc2fa4b9623d0a8c0f904d50 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Reorder struct forward declarations to alphabetic order to simplify maintenance, as upcoming patches will add more to the list.
No functional change intended.
Link: https://patch.msgid.link/r/c5dd87100f6f01389b838c63237e28c5dd373358.17247763... Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommufd.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index c2f2f6b9148e..30f832a60ccb 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -11,12 +11,12 @@ #include <linux/types.h>
struct device; -struct iommufd_device; -struct page; -struct iommufd_ctx; -struct iommufd_access; struct file; struct iommu_group; +struct iommufd_access; +struct iommufd_ctx; +struct iommufd_device; +struct page;
struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id);
From: Nicolin Chen nicolinc@nvidia.com
mainline inclusion from mainline-v6.11-rc5 commit 950aeefb34923fe3c28ade35fe05f24e2c5b1d55 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The rewind routine should remove the reserved iovas added to the new hwpt.
Fixes: 89db31635c87 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/20240718050130.1956804-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 11573a84f68a..895f2a59fde1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -526,7 +526,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, err_unresv: if (hwpt_is_paging(hwpt)) iommufd_group_remove_reserved_iova(igroup, - to_hwpt_paging(old_hwpt)); + to_hwpt_paging(hwpt)); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc);
From: Nicolin Chen nicolinc@nvidia.com
mainline inclusion from mainline-v6.12-rc1 commit b2f44814680b569be98e58111bd582fd3a689d4d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Currently, device reserved regions are only enforced when the device is attached to an hwpt_paging. In other words, if the device gets attached to an hwpt_nested directly, the parent hwpt_paging of the hwpt_nested's would not enforce those reserved IOVAs. This works for most of reserved region types, but not for IOMMU_RESV_SW_MSI, which is a unique software defined window, required by a nesting case too to setup an MSI doorbell on the parent stage-2 hwpt/domain.
Kevin pointed out in 1 that: 1) there is no usage using up closely the entire IOVA space yet,
2) guest may change the viommu mode to switch between nested and paging then VMM has to take all devices' reserved regions into consideration anyway, when composing the GPA space.
So it would be actually convenient for us to also enforce reserved IOVA onto the parent hwpt_paging, when attaching a device to an hwpt_nested.
Repurpose the existing attach/replace_paging helpers to attach device's reserved IOVAs exclusively.
Add a new find_hwpt_paging helper, which is only used by these reserved IOVA functions, to allow an IOMMUFD_OBJ_HWPT_NESTED hwpt to redirect to its parent hwpt_paging. Return a NULL in these two helpers for any new HWPT type in the future.
Link: https://patch.msgid.link/r/20240807003446.3740368-1-nicolinc@nvidia.com Link: https://lore.kernel.org/all/BN9PR11MB5276497781C96415272E6FED8CB12@BN9PR11MB... #1 Suggested-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 52 ++++++++++++------------- drivers/iommu/iommufd/iommufd_private.h | 19 +++++++++ 2 files changed, 45 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 895f2a59fde1..5fd3dd420290 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -327,8 +327,9 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, return 0; }
-static int iommufd_hwpt_paging_attach(struct iommufd_hwpt_paging *hwpt_paging, - struct iommufd_device *idev) +static int +iommufd_device_attach_reserved_iova(struct iommufd_device *idev, + struct iommufd_hwpt_paging *hwpt_paging) { int rc;
@@ -354,6 +355,7 @@ static int iommufd_hwpt_paging_attach(struct iommufd_hwpt_paging *hwpt_paging, int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { + struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt); int rc;
mutex_lock(&idev->igroup->lock); @@ -363,8 +365,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unlock; }
- if (hwpt_is_paging(hwpt)) { - rc = iommufd_hwpt_paging_attach(to_hwpt_paging(hwpt), idev); + if (hwpt_paging) { + rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging); if (rc) goto err_unlock; } @@ -387,9 +389,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, mutex_unlock(&idev->igroup->lock); return 0; err_unresv: - if (hwpt_is_paging(hwpt)) - iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, - idev->dev); + if (hwpt_paging) + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev); err_unlock: mutex_unlock(&idev->igroup->lock); return rc; @@ -399,6 +400,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev) { struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt; + struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
mutex_lock(&idev->igroup->lock); list_del(&idev->group_item); @@ -406,9 +408,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) iommufd_hwpt_detach_device(hwpt, idev); idev->igroup->hwpt = NULL; } - if (hwpt_is_paging(hwpt)) - iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, - idev->dev); + if (hwpt_paging) + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev); mutex_unlock(&idev->igroup->lock);
/* Caller must destroy hwpt */ @@ -440,17 +441,17 @@ iommufd_group_remove_reserved_iova(struct iommufd_group *igroup, }
static int -iommufd_group_do_replace_paging(struct iommufd_group *igroup, - struct iommufd_hwpt_paging *hwpt_paging) +iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup, + struct iommufd_hwpt_paging *hwpt_paging) { - struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt; + struct iommufd_hwpt_paging *old_hwpt_paging; struct iommufd_device *cur; int rc;
lockdep_assert_held(&igroup->lock);
- if (!hwpt_is_paging(old_hwpt) || - hwpt_paging->ioas != to_hwpt_paging(old_hwpt)->ioas) { + old_hwpt_paging = find_hwpt_paging(igroup->hwpt); + if (!old_hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas) { list_for_each_entry(cur, &igroup->device_list, group_item) { rc = iopt_table_enforce_dev_resv_regions( &hwpt_paging->ioas->iopt, cur->dev, NULL); @@ -473,6 +474,8 @@ static struct iommufd_hw_pagetable * iommufd_device_do_replace(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt) { + struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt); + struct iommufd_hwpt_paging *old_hwpt_paging; struct iommufd_group *igroup = idev->igroup; struct iommufd_hw_pagetable *old_hwpt; unsigned int num_devices; @@ -491,9 +494,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, }
old_hwpt = igroup->hwpt; - if (hwpt_is_paging(hwpt)) { - rc = iommufd_group_do_replace_paging(igroup, - to_hwpt_paging(hwpt)); + if (hwpt_paging) { + rc = iommufd_group_do_replace_reserved_iova(igroup, hwpt_paging); if (rc) goto err_unlock; } @@ -502,11 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, if (rc) goto err_unresv;
- if (hwpt_is_paging(old_hwpt) && - (!hwpt_is_paging(hwpt) || - to_hwpt_paging(hwpt)->ioas != to_hwpt_paging(old_hwpt)->ioas)) - iommufd_group_remove_reserved_iova(igroup, - to_hwpt_paging(old_hwpt)); + old_hwpt_paging = find_hwpt_paging(old_hwpt); + if (old_hwpt_paging && + (!hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas)) + iommufd_group_remove_reserved_iova(igroup, old_hwpt_paging);
igroup->hwpt = hwpt;
@@ -524,9 +525,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, /* Caller must destroy old_hwpt */ return old_hwpt; err_unresv: - if (hwpt_is_paging(hwpt)) - iommufd_group_remove_reserved_iova(igroup, - to_hwpt_paging(hwpt)); + if (hwpt_paging) + iommufd_group_remove_reserved_iova(igroup, hwpt_paging); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 017e50574f3b..5d3768d77099 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -325,6 +325,25 @@ to_hwpt_paging(struct iommufd_hw_pagetable *hwpt) return container_of(hwpt, struct iommufd_hwpt_paging, common); }
+static inline struct iommufd_hwpt_nested * +to_hwpt_nested(struct iommufd_hw_pagetable *hwpt) +{ + return container_of(hwpt, struct iommufd_hwpt_nested, common); +} + +static inline struct iommufd_hwpt_paging * +find_hwpt_paging(struct iommufd_hw_pagetable *hwpt) +{ + switch (hwpt->obj.type) { + case IOMMUFD_OBJ_HWPT_PAGING: + return to_hwpt_paging(hwpt); + case IOMMUFD_OBJ_HWPT_NESTED: + return to_hwpt_nested(hwpt)->parent; + default: + return NULL; + } +} + static inline struct iommufd_hwpt_paging * iommufd_get_hwpt_paging(struct iommufd_ucmd *ucmd, u32 id) {
From: Suravee Suthikulpanit suravee.suthikulpanit@amd.com
mainline inclusion from mainline-v6.9-rc1 commit 6f35fe5d8a0ad0125c52fb20f5d67000e369eb3a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Introduce get_amd_iommu_from_dev() and get_amd_iommu_from_dev_data(). And replace rlookup_amd_iommu() with the new helper function where applicable to avoid unnecessary loop to look up struct amd_iommu from struct device.
Suggested-by: Jason Gunthorpe jgg@ziepe.ca Signed-off-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20240205115615.6053-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommu.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 41b3014230bc..8c6dc1018664 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -863,6 +863,22 @@ static inline struct iommu_device *dev_to_iommu_device(struct device *dev) return (struct iommu_device *)dev_get_drvdata(dev); }
+/** + * iommu_get_iommu_dev - Get iommu_device for a device + * @dev: an end-point device + * + * Note that this function must be called from the iommu_ops + * to retrieve the iommu_device for a device, which the core code + * guarentees it will not invoke the op without an attached iommu. + */ +static inline struct iommu_device *__iommu_get_iommu_dev(struct device *dev) +{ + return dev->iommu->iommu_dev; +} + +#define iommu_get_iommu_dev(dev, type, member) \ + container_of(__iommu_get_iommu_dev(dev), type, member) + static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) { *gather = (struct iommu_iotlb_gather) {
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.10-rc1 commit da55da5a42d4247d7a48b843fa5fcd9a4a10f4fe category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It turns out kconfig has problems ensuring the SMMU module and the KUNIT module are consistently y/m to allow linking. It will permit KUNIT to be a module while SMMU is built in.
Also, Fedora apparently enables kunit on production kernels.
So, put the entire kunit in its own module using the VISIBLE_IF_KUNIT/EXPORT_SYMBOL_IF_KUNIT machinery. This keeps it out of vmlinus on Fedora and makes the kconfig work in the normal way. There is no cost if kunit is disabled.
Fixes: 56e1a4cc2588 ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry") Reported-by: Thorsten Leemhuis linux@leemhuis.info Link: https://lore.kernel.org/all/aeea8546-5bce-4c51-b506-5d2008e52fef@leemhuis.in... Signed-off-by: Jason Gunthorpe jgg@nvidia.com Tested-by: Thorsten Leemhuis linux@leemhuis.info Acked-by: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/0-v1-24cba6c0f404+2ae-smmu_kunit_module_jgg@nvidia... Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/Kconfig | 2 +- drivers/iommu/arm/arm-smmu-v3/Makefile | 3 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 3 +++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++++++++ 5 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index b035e042abb1..9a28b0f824e8 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -438,7 +438,7 @@ config ARM_SMMU_V3_ECMDQ If not sure, say no.
config ARM_SMMU_V3_KUNIT_TEST - bool "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS + tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS depends on KUNIT depends on ARM_SMMU_V3_SVA default KUNIT_ALL_TESTS diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index feada86b1cc7..306f4002ae8f 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -3,5 +3,6 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-objs-y += arm-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_HISI_VIRTCCA_CODA) += arm-s-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o -arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) + +obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 5ab49b59a22e..9342fac71801 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -118,6 +118,7 @@ void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, */ target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_sva_cd);
/* * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c index ee90dee4865c..e0fce31eba54 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c @@ -565,3 +565,6 @@ static struct kunit_suite arm_smmu_v3_test_module = { .test_cases = arm_smmu_v3_test_cases, }; kunit_test_suites(&arm_smmu_v3_test_module); + +MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f9966cf1f5dd..2fe4a147be76 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1243,6 +1243,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) if (cfg == STRTAB_STE_0_CFG_BYPASS) used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used);
/* * Figure out if we can do a hitless update of entry to become target. Returns a @@ -1377,6 +1378,7 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry, entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS)); } } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry);
static void arm_smmu_sync_cd(struct arm_smmu_master *master, int ssid, bool leaf) @@ -1496,6 +1498,7 @@ void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); } } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_cd_used);
static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer) { @@ -1575,6 +1578,7 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, CTXDESC_CD_1_TTB0_MASK); target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.mair); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_s1_cd);
void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) { @@ -1742,6 +1746,7 @@ void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) STRTAB_STE_0_V | FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_abort_ste);
VISIBLE_IF_KUNIT void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, @@ -1756,6 +1761,7 @@ void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, target->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_bypass_ste);
VISIBLE_IF_KUNIT void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, @@ -1813,6 +1819,7 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0)); } } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_cdtable_ste);
VISIBLE_IF_KUNIT void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, @@ -1861,6 +1868,7 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr & STRTAB_STE_3_S2TTB_MASK); } +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_s2_domain_ste);
/* * This can safely directly manipulate the STE memory without a sync sequence
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
mainline inclusion from mainline-v6.11-rc1 commit 16c0bad7ae04e4a1e7361fbb91573248de06a008 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
*-objs suffix is reserved rather for (user-space) host programs while usually *-y suffix is used for kernel drivers (although *-objs works for that purpose for now).
Let's correct the old usages of *-objs in Makefiles.
Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20240508151611.1444352-1-andriy.shevchenko@linux.i... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/Makefile | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 306f4002ae8f..82beca51e6b2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o -arm_smmu_v3-objs-y += arm-smmu-v3.o -arm_smmu_v3-objs-$(CONFIG_HISI_VIRTCCA_CODA) += arm-s-smmu-v3.o -arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o -arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) +arm_smmu_v3-y := arm-smmu-v3.o +arm_smmu_v3-$(CONFIG_HISI_VIRTCCA_HOST) += arm-s-smmu-v3.o +arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
From: Jason Gunthorpe jgg@nvidia.com
mainline inclusion from mainline-v6.11-rc1 commit 136a8066676e593cd29627219467fc222c8f3b04 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Relying on position in the enum makes it subtly harder when doing merge resolutions or backporting as it is easy to grab a patch and not notice it is a uAPI change with a differently ordered enum. This may become a bigger problem in next cycles when iommu_hwpt_invalidate_data_type and other per-driver enums have patches flowing through different trees.
So lets start including constants for all the uAPI enums to make this safer.
No functional change.
Link: https://lore.kernel.org/r/0-v1-2c06ec044924+133-iommufd_uapi_const_jgg@nvidi... Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Yi Liu yi.l.liu@intel.com Tested-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/uapi/linux/iommufd.h | 40 ++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 7079269dd265..72010f71c5e4 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -37,20 +37,20 @@ enum { IOMMUFD_CMD_BASE = 0x80, IOMMUFD_CMD_DESTROY = IOMMUFD_CMD_BASE, - IOMMUFD_CMD_IOAS_ALLOC, - IOMMUFD_CMD_IOAS_ALLOW_IOVAS, - IOMMUFD_CMD_IOAS_COPY, - IOMMUFD_CMD_IOAS_IOVA_RANGES, - IOMMUFD_CMD_IOAS_MAP, - IOMMUFD_CMD_IOAS_UNMAP, - IOMMUFD_CMD_OPTION, - IOMMUFD_CMD_VFIO_IOAS, - IOMMUFD_CMD_HWPT_ALLOC, - IOMMUFD_CMD_GET_HW_INFO, - IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, - IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, - IOMMUFD_CMD_HWPT_INVALIDATE, - IOMMUFD_CMD_FAULT_QUEUE_ALLOC, + IOMMUFD_CMD_IOAS_ALLOC = 0x81, + IOMMUFD_CMD_IOAS_ALLOW_IOVAS = 0x82, + IOMMUFD_CMD_IOAS_COPY = 0x83, + IOMMUFD_CMD_IOAS_IOVA_RANGES = 0x84, + IOMMUFD_CMD_IOAS_MAP = 0x85, + IOMMUFD_CMD_IOAS_UNMAP = 0x86, + IOMMUFD_CMD_OPTION = 0x87, + IOMMUFD_CMD_VFIO_IOAS = 0x88, + IOMMUFD_CMD_HWPT_ALLOC = 0x89, + IOMMUFD_CMD_GET_HW_INFO = 0x8a, + IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING = 0x8b, + IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, + IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, + IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, };
/** @@ -400,8 +400,8 @@ struct iommu_hwpt_vtd_s1 { * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table */ enum iommu_hwpt_data_type { - IOMMU_HWPT_DATA_NONE, - IOMMU_HWPT_DATA_VTD_S1, + IOMMU_HWPT_DATA_NONE = 0, + IOMMU_HWPT_DATA_VTD_S1 = 1, };
/** @@ -491,8 +491,8 @@ struct iommu_hw_info_vtd { * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type */ enum iommu_hw_info_type { - IOMMU_HW_INFO_TYPE_NONE, - IOMMU_HW_INFO_TYPE_INTEL_VTD, + IOMMU_HW_INFO_TYPE_NONE = 0, + IOMMU_HW_INFO_TYPE_INTEL_VTD = 1, };
/** @@ -629,7 +629,7 @@ struct iommu_hwpt_get_dirty_bitmap { * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 */ enum iommu_hwpt_invalidate_data_type { - IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, + IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0, };
/** @@ -768,7 +768,7 @@ struct iommu_hwpt_pgfault { */ enum iommufd_page_response_code { IOMMUFD_PAGE_RESP_SUCCESS = 0, - IOMMUFD_PAGE_RESP_INVALID, + IOMMUFD_PAGE_RESP_INVALID = 1, };
/**
From: Pranjal Shrivastava praan@google.com
mainline inclusion from mainline-v6.12-rc1 commit 8c153ef95697242b72646d2c4cf6c4b23ccf35a3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The iommu_report_device_fault function was updated to return void while assuming that drivers only need to call iommu_report_device_fault() for reporting an iopf. This implementation causes following problems:
1. The drivers rely on the core code to call it's page_reponse, however, when a fault is received and no fault capable domain is attached / iopf_param is NULL, the ops->page_response is NOT called causing the device to stall in case the fault type was PAGE_REQ.
2. The arm_smmu_v3 driver relies on the returned value to log errors returning void from iommu_report_device_fault causes these events to be missed while logging.
Modify the iommu_report_device_fault function to return -EINVAL for cases where no fault capable domain is attached or iopf_param was NULL and calls back to the driver (ops->page_response) in case the fault type was IOMMU_FAULT_PAGE_REQ. The returned value can be used by the drivers to log the fault/event as needed.
Reported-by: Kunkun Jiang jiangkunkun@huawei.com Closes: https://lore.kernel.org/all/6147caf0-b9a0-30ca-795e-a1aa502a5c51@huawei.com/ Fixes: 3dfa64aecbaf ("iommu: Make iommu_report_device_fault() return void") Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Pranjal Shrivastava praan@google.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240816104906.1010626-1-praan@google.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- drivers/iommu/io-pgfault.c | 121 ++++++++++++++------ include/linux/iommu.h | 5 +- 3 files changed, 87 insertions(+), 41 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2fe4a147be76..2c7ab074cc2b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2003,7 +2003,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) goto out_unlock; }
- iommu_report_device_fault(master->dev, &fault_evt); + ret = iommu_report_device_fault(master->dev, &fault_evt); out_unlock: mutex_unlock(&smmu->streams_mutex); return ret; diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index 81e9cc6e3164..4674e618797c 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -115,6 +115,59 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, return group; }
+static struct iommu_attach_handle *find_fault_handler(struct device *dev, + struct iopf_fault *evt) +{ + struct iommu_fault *fault = &evt->fault; + struct iommu_attach_handle *attach_handle; + + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { + attach_handle = iommu_attach_handle_get(dev->iommu_group, + fault->prm.pasid, 0); + if (IS_ERR(attach_handle)) { + const struct iommu_ops *ops = dev_iommu_ops(dev); + + if (!ops->user_pasid_table) + return NULL; + /* + * The iommu driver for this device supports user- + * managed PASID table. Therefore page faults for + * any PASID should go through the NESTING domain + * attached to the device RID. + */ + attach_handle = iommu_attach_handle_get( + dev->iommu_group, IOMMU_NO_PASID, + IOMMU_DOMAIN_NESTED); + if (IS_ERR(attach_handle)) + return NULL; + } + } else { + attach_handle = iommu_attach_handle_get(dev->iommu_group, + IOMMU_NO_PASID, 0); + + if (IS_ERR(attach_handle)) + return NULL; + } + + if (!attach_handle->domain->iopf_handler) + return NULL; + + return attach_handle; +} + +static void iopf_error_response(struct device *dev, struct iopf_fault *evt) +{ + const struct iommu_ops *ops = dev_iommu_ops(dev); + struct iommu_fault *fault = &evt->fault; + struct iommu_page_response resp = { + .pasid = fault->prm.pasid, + .grpid = fault->prm.grpid, + .code = IOMMU_PAGE_RESP_INVALID + }; + + ops->page_response(dev, evt, &resp); +} + /** * iommu_report_device_fault() - Report fault event to device driver * @dev: the device @@ -153,24 +206,39 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param, * handling framework should guarantee that the iommu domain could only be * freed after the device has stopped generating page faults (or the iommu * hardware has been set to block the page faults) and the pending page faults - * have been flushed. + * have been flushed. In case no page fault handler is attached or no iopf params + * are setup, then the ops->page_response() is called to complete the evt. + * + * Returns 0 on success, or an error in case of a bad/failed iopf setup. */ -void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) +int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) { + struct iommu_attach_handle *attach_handle; struct iommu_fault *fault = &evt->fault; struct iommu_fault_param *iopf_param; struct iopf_group abort_group = {}; struct iopf_group *group;
+ attach_handle = find_fault_handler(dev, evt); + if (!attach_handle) + goto err_bad_iopf; + + /* + * Something has gone wrong if a fault capable domain is attached but no + * iopf_param is setup + */ iopf_param = iopf_get_dev_fault_param(dev); if (WARN_ON(!iopf_param)) - return; + goto err_bad_iopf;
if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { - report_partial_fault(iopf_param, fault); + int ret; + + ret = report_partial_fault(iopf_param, fault); iopf_put_dev_fault_param(iopf_param); /* A request that is not the last does not need to be ack'd */ - return; + + return ret; }
/* @@ -185,38 +253,7 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) if (group == &abort_group) goto err_abort;
- if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) { - group->attach_handle = iommu_attach_handle_get(dev->iommu_group, - fault->prm.pasid, - 0); - if (IS_ERR(group->attach_handle)) { - const struct iommu_ops *ops = dev_iommu_ops(dev); - - if (!ops->user_pasid_table) - goto err_abort; - - /* - * The iommu driver for this device supports user- - * managed PASID table. Therefore page faults for - * any PASID should go through the NESTING domain - * attached to the device RID. - */ - group->attach_handle = - iommu_attach_handle_get(dev->iommu_group, - IOMMU_NO_PASID, - IOMMU_DOMAIN_NESTED); - if (IS_ERR(group->attach_handle)) - goto err_abort; - } - } else { - group->attach_handle = - iommu_attach_handle_get(dev->iommu_group, IOMMU_NO_PASID, 0); - if (IS_ERR(group->attach_handle)) - goto err_abort; - } - - if (!group->attach_handle->domain->iopf_handler) - goto err_abort; + group->attach_handle = attach_handle;
/* * On success iopf_handler must call iopf_group_response() and @@ -225,7 +262,7 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) if (group->attach_handle->domain->iopf_handler(group)) goto err_abort;
- return; + return 0;
err_abort: dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n", @@ -235,6 +272,14 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) __iopf_free_group(group); else iopf_free_group(group); + + return 0; + +err_bad_iopf: + if (fault->type == IOMMU_FAULT_PAGE_REQ) + iopf_error_response(dev, evt); + + return -EINVAL; } EXPORT_SYMBOL_GPL(iommu_report_device_fault);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 8c6dc1018664..0150dc2bec6a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1749,7 +1749,7 @@ struct iopf_queue *iopf_queue_alloc(const char *name); void iopf_queue_free(struct iopf_queue *queue); int iopf_queue_discard_partial(struct iopf_queue *queue); void iopf_free_group(struct iopf_group *group); -void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); +int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); void iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status); #else @@ -1787,9 +1787,10 @@ static inline void iopf_free_group(struct iopf_group *group) { }
-static inline void +static inline int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) { + return -ENODEV; }
static inline void iopf_group_response(struct iopf_group *group,
From: Nicolin Chen nicolinc@nvidia.com
mainline inclusion from mainline-v6.9-rc1 commit category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The previous IOMMUFD_OBJ_HW_PAGETABLE has been reworked to two separated objects: IOMMUFD_OBJ_HWPT_PAGING and IOMMUFD_OBJ_HWPT_NESTED in order to support a nested translation context.
Corresponding to the latest iommufd APIs and uAPIs, update the doc so as to reflect the new design.
Link: https://patch.msgid.link/r/20240913052519.2153-1-nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Bagas Sanjaya bagasdotme@gmail.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- Documentation/userspace-api/iommufd.rst | 134 +++++++++++++++--------- 1 file changed, 87 insertions(+), 47 deletions(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index aa004faed5fd..2deba93bf159 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -41,46 +41,65 @@ Following IOMMUFD objects are exposed to userspace: - IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an external driver.
-- IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table - (i.e. a single struct iommu_domain) managed by the iommu driver. - - The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and - it will synchronize its mapping with each member HW_PAGETABLE. +- IOMMUFD_OBJ_HWPT_PAGING, representing an actual hardware I/O page table + (i.e. a single struct iommu_domain) managed by the iommu driver. "PAGING" + primarly indicates this type of HWPT should be linked to an IOAS. It also + indicates that it is backed by an iommu_domain with __IOMMU_DOMAIN_PAGING + feature flag. This can be either an UNMANAGED stage-1 domain for a device + running in the user space, or a nesting parent stage-2 domain for mappings + from guest-level physical addresses to host-level physical addresses. + + The IOAS has a list of HWPT_PAGINGs that share the same IOVA mapping and + it will synchronize its mapping with each member HWPT_PAGING. + +- IOMMUFD_OBJ_HWPT_NESTED, representing an actual hardware I/O page table + (i.e. a single struct iommu_domain) managed by user space (e.g. guest OS). + "NESTED" indicates that this type of HWPT should be linked to an HWPT_PAGING. + It also indicates that it is backed by an iommu_domain that has a type of + IOMMU_DOMAIN_NESTED. This must be a stage-1 domain for a device running in + the user space (e.g. in a guest VM enabling the IOMMU nested translation + feature.) As such, it must be created with a given nesting parent stage-2 + domain to associate to. This nested stage-1 page table managed by the user + space usually has mappings from guest-level I/O virtual addresses to guest- + level physical addresses.
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
-The diagram below shows relationship between user-visible objects and kernel +The diagrams below show relationships between user-visible objects and kernel datastructures (external to iommufd), with numbers referred to operations creating the objects and links::
- _________________________________________________________ - | iommufd | - | [1] | - | _________________ | - | | | | - | | | | - | | | | - | | | | - | | | | - | | | | - | | | [3] [2] | - | | | ____________ __________ | - | | IOAS |<--| |<------| | | - | | | |HW_PAGETABLE| | DEVICE | | - | | | |____________| |__________| | - | | | | | | - | | | | | | - | | | | | | - | | | | | | - | | | | | | - | |_________________| | | | - | | | | | - |_________|___________________|___________________|_______| - | | | - | _____v______ _______v_____ - | PFN storage | | | | - |------------>|iommu_domain| |struct device| - |____________| |_____________| + _______________________________________________________________________ + | iommufd (HWPT_PAGING only) | + | | + | [1] [3] [2] | + | ________________ _____________ ________ | + | | | | | | | | + | | IOAS |<---| HWPT_PAGING |<---------------------| DEVICE | | + | |________________| |_____________| |________| | + | | | | | + |_________|____________________|__________________________________|_____| + | | | + | ______v_____ ___v__ + | PFN storage | (paging) | |struct| + |------------>|iommu_domain|<-----------------------|device| + |____________| |______| + + _______________________________________________________________________ + | iommufd (with HWPT_NESTED) | + | | + | [1] [3] [4] [2] | + | ________________ _____________ _____________ ________ | + | | | | | | | | | | + | | IOAS |<---| HWPT_PAGING |<---| HWPT_NESTED |<--| DEVICE | | + | |________________| |_____________| |_____________| |________| | + | | | | | | + |_________|____________________|__________________|_______________|_____| + | | | | + | ______v_____ ______v_____ ___v__ + | PFN storage | (paging) | | (nested) | |struct| + |------------>|iommu_domain|<----|iommu_domain|<----|device| + |____________| |____________| |______|
1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can hold multiple IOAS objects. IOAS is the most generic object and does not @@ -94,21 +113,41 @@ creating the objects and links:: device. The driver must also set the driver_managed_dma flag and must not touch the device until this operation succeeds.
-3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD - kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI - allows userspace to initiate the attaching operation. If a compatible - pagetable already exists then it is reused for the attachment. Otherwise a - new pagetable object and iommu_domain is created. Successful completion of - this operation sets up the linkages among IOAS, device and iommu_domain. Once - this completes the device could do DMA. - - Every iommu_domain inside the IOAS is also represented to userspace as a - HW_PAGETABLE object. +3. IOMMUFD_OBJ_HWPT_PAGING can be created in two ways: + + * IOMMUFD_OBJ_HWPT_PAGING is automatically created when an external driver + calls the IOMMUFD kAPI to attach a bound device to an IOAS. Similarly the + external driver uAPI allows userspace to initiate the attaching operation. + If a compatible member HWPT_PAGING object exists in the IOAS's HWPT_PAGING + list, then it will be reused. Otherwise a new HWPT_PAGING that represents + an iommu_domain to userspace will be created, and then added to the list. + Successful completion of this operation sets up the linkages among IOAS, + device and iommu_domain. Once this completes the device could do DMA. + + * IOMMUFD_OBJ_HWPT_PAGING can be manually created via the IOMMU_HWPT_ALLOC + uAPI, provided an ioas_id via @pt_id to associate the new HWPT_PAGING to + the corresponding IOAS object. The benefit of this manual allocation is to + allow allocation flags (defined in enum iommufd_hwpt_alloc_flags), e.g. it + allocates a nesting parent HWPT_PAGING if the IOMMU_HWPT_ALLOC_NEST_PARENT + flag is set. + +4. IOMMUFD_OBJ_HWPT_NESTED can be only manually created via the IOMMU_HWPT_ALLOC + uAPI, provided an hwpt_id via @pt_id to associate the new HWPT_NESTED object + to the corresponding HWPT_PAGING object. The associating HWPT_PAGING object + must be a nesting parent manually allocated via the same uAPI previously with + an IOMMU_HWPT_ALLOC_NEST_PARENT flag, otherwise the allocation will fail. The + allocation will be further validated by the IOMMU driver to ensure that the + nesting parent domain and the nested domain being allocated are compatible. + Successful completion of this operation sets up linkages among IOAS, device, + and iommu_domains. Once this completes the device could do DMA via a 2-stage + translation, a.k.a nested translation. Note that multiple HWPT_NESTED objects + can be allocated by (and then associated to) the same nesting parent.
.. note::
- Future IOMMUFD updates will provide an API to create and manipulate the - HW_PAGETABLE directly. + Either a manual IOMMUFD_OBJ_HWPT_PAGING or an IOMMUFD_OBJ_HWPT_NESTED is + created via the same IOMMU_HWPT_ALLOC uAPI. The difference is at the type + of the object passed in via the @pt_id field of struct iommufd_hwpt_alloc.
A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet). @@ -120,7 +159,8 @@ User visible objects are backed by following datastructures:
- iommufd_ioas for IOMMUFD_OBJ_IOAS. - iommufd_device for IOMMUFD_OBJ_DEVICE. -- iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE. +- iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. +- iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
Several terminologies when looking at these datastructures:
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Prepare for an embedded structure design for driver-level iommufd_viommu objects: // include/linux/iommufd.h struct iommufd_viommu { struct iommufd_object obj; .... };
// Some IOMMU driver struct iommu_driver_viommu { struct iommufd_viommu core; .... };
It has to expose struct iommufd_object and enum iommufd_object_type from the core-level private header to the public iommufd header.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_private.h | 25 +------------------------ include/linux/iommufd.h | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 5d3768d77099..7f2e5a6164ac 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -5,8 +5,8 @@ #define __IOMMUFD_PRIVATE_H
#include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/iova_bitmap.h> -#include <linux/refcount.h> #include <linux/rwsem.h> #include <linux/uaccess.h> #include <linux/xarray.h> @@ -122,29 +122,6 @@ static inline int iommufd_ucmd_respond(struct iommufd_ucmd *ucmd, return 0; }
-enum iommufd_object_type { - IOMMUFD_OBJ_NONE, - IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, - IOMMUFD_OBJ_DEVICE, - IOMMUFD_OBJ_HWPT_PAGING, - IOMMUFD_OBJ_HWPT_NESTED, - IOMMUFD_OBJ_IOAS, - IOMMUFD_OBJ_ACCESS, - IOMMUFD_OBJ_FAULT, -#ifdef CONFIG_IOMMUFD_TEST - IOMMUFD_OBJ_SELFTEST, -#endif - IOMMUFD_OBJ_MAX, -}; - -/* Base struct for all objects with a userspace ID handle. */ -struct iommufd_object { - refcount_t shortterm_users; - refcount_t users; - enum iommufd_object_type type; - unsigned int id; -}; - static inline bool iommufd_lock_obj(struct iommufd_object *obj) { if (!refcount_inc_not_zero(&obj->users)) diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 30f832a60ccb..22948dd03d67 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -8,6 +8,7 @@
#include <linux/err.h> #include <linux/errno.h> +#include <linux/refcount.h> #include <linux/types.h>
struct device; @@ -18,6 +19,29 @@ struct iommufd_ctx; struct iommufd_device; struct page;
+enum iommufd_object_type { + IOMMUFD_OBJ_NONE, + IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, + IOMMUFD_OBJ_DEVICE, + IOMMUFD_OBJ_HWPT_PAGING, + IOMMUFD_OBJ_HWPT_NESTED, + IOMMUFD_OBJ_IOAS, + IOMMUFD_OBJ_ACCESS, + IOMMUFD_OBJ_FAULT, +#ifdef CONFIG_IOMMUFD_TEST + IOMMUFD_OBJ_SELFTEST, +#endif + IOMMUFD_OBJ_MAX, +}; + +/* Base struct for all objects with a userspace ID handle. */ +struct iommufd_object { + refcount_t shortterm_users; + refcount_t users; + enum iommufd_object_type type; + unsigned int id; +}; + struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id); void iommufd_device_unbind(struct iommufd_device *idev);
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
The following patch will add a new vIOMMU allocator that will require this _iommufd_object_alloc to be sharable with IOMMU drivers (and iommufd too).
Add a new driver.c file that will be built with CONFIG_IOMMUFD_DRIVER_CORE selected by CONFIG_IOMMUFD, and put the CONFIG_DRIVER under that remaining to be selectable for drivers to build the existing iova_bitmap.c file.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/Kconfig | 4 +++ drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/driver.c | 40 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 4 --- drivers/iommu/iommufd/main.c | 32 -------------------- include/linux/iommufd.h | 13 ++++++++ 6 files changed, 58 insertions(+), 36 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig index 99d4b075df49..abdee10f9085 100644 --- a/drivers/iommu/iommufd/Kconfig +++ b/drivers/iommu/iommufd/Kconfig @@ -1,4 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only +config IOMMUFD_DRIVER_CORE + tristate + default (IOMMUFD_DRIVER || IOMMUFD) if IOMMUFD!=n + config IOMMUFD tristate "IOMMU Userspace API" select INTERVAL_TREE diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index cf4605962bea..de675df52913 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -13,3 +13,4 @@ iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o +obj-$(CONFIG_IOMMUFD_DRIVER_CORE) += driver.o diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c new file mode 100644 index 000000000000..2bc47d92a0ab --- /dev/null +++ b/drivers/iommu/iommufd/driver.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "iommufd_private.h" + +struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type) +{ + struct iommufd_object *obj; + int rc; + + obj = kzalloc(size, GFP_KERNEL_ACCOUNT); + if (!obj) + return ERR_PTR(-ENOMEM); + obj->type = type; + /* Starts out bias'd by 1 until it is removed from the xarray */ + refcount_set(&obj->shortterm_users, 1); + refcount_set(&obj->users, 1); + + /* + * Reserve an ID in the xarray but do not publish the pointer yet since + * the caller hasn't initialized it yet. Once the pointer is published + * in the xarray and visible to other threads we can't reliably destroy + * it anymore, so the caller must complete all errorable operations + * before calling iommufd_object_finalize(). + */ + rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, xa_limit_31b, + GFP_KERNEL_ACCOUNT); + if (rc) + goto out_free; + return obj; +out_free: + kfree(obj); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD); + +MODULE_DESCRIPTION("iommufd code shared with builtin modules"); +MODULE_LICENSE("GPL"); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 7f2e5a6164ac..27ff0d6db47c 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -202,10 +202,6 @@ iommufd_object_put_and_try_destroy(struct iommufd_ctx *ictx, iommufd_object_remove(ictx, obj, obj->id, 0); }
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type); - #define __iommufd_object_alloc(ictx, ptr, type, obj) \ container_of(_iommufd_object_alloc( \ ictx, \ diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b5f5d27ee963..92bd075108e5 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -29,38 +29,6 @@ struct iommufd_object_ops { static const struct iommufd_object_ops iommufd_object_ops[]; static struct miscdevice vfio_misc_dev;
-struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, - size_t size, - enum iommufd_object_type type) -{ - struct iommufd_object *obj; - int rc; - - obj = kzalloc(size, GFP_KERNEL_ACCOUNT); - if (!obj) - return ERR_PTR(-ENOMEM); - obj->type = type; - /* Starts out bias'd by 1 until it is removed from the xarray */ - refcount_set(&obj->shortterm_users, 1); - refcount_set(&obj->users, 1); - - /* - * Reserve an ID in the xarray but do not publish the pointer yet since - * the caller hasn't initialized it yet. Once the pointer is published - * in the xarray and visible to other threads we can't reliably destroy - * it anymore, so the caller must complete all errorable operations - * before calling iommufd_object_finalize(). - */ - rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, - xa_limit_31b, GFP_KERNEL_ACCOUNT); - if (rc) - goto out_free; - return obj; -out_free: - kfree(obj); - return ERR_PTR(rc); -} - /* * Allow concurrent access to the object. * diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 22948dd03d67..94522d4029ca 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -135,4 +135,17 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) return -EOPNOTSUPP; } #endif /* CONFIG_IOMMUFD */ + +#if IS_ENABLED(CONFIG_IOMMUFD_DRIVER_CORE) +struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, + size_t size, + enum iommufd_object_type type); +#else /* !CONFIG_IOMMUFD_DRIVER_CORE */ +static inline struct iommufd_object * +_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, + enum iommufd_object_type type) +{ + return ERR_PTR(-EOPNOTSUPP); +} +#endif /* CONFIG_IOMMUFD_DRIVER_CORE */ #endif
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a new IOMMUFD_OBJ_VIOMMU with an iommufd_viommu structure to represent a slice of physical IOMMU device passed to or shared with a user space VM. This slice, now a vIOMMU object, is a group of virtualization resources of a physical IOMMU's, such as: - Security namespace for guest owned ID, e.g. guest-controlled cache tags - Non-device-affiliated event reporting, e.g. invalidation queue errors - Access to a sharable nesting parent pagetable across physical IOMMUs - Virtualization of various platforms IDs, e.g. RIDs and others - Delivery of paravirtualized invalidation - Direct assigned invalidation queues - Direct assigned interrupts
Add a new viommu_alloc op in iommu_ops, for drivers to allocate their own vIOMMU structures. And this allocation also needs a free(), so add struct iommufd_viommu_ops.
To simplify a vIOMMU allocation, provide a iommufd_viommu_alloc() helper. It's suggested that a driver should embed a core-level viommu structure in its driver-level viommu struct and call the iommufd_viommu_alloc() helper, meanwhile the driver can also implement a viommu ops: struct my_driver_viommu { struct iommufd_viommu core; /* driver-owned properties/features */ .... };
static const struct iommufd_viommu_ops my_driver_viommu_ops = { .free = my_driver_viommu_free, /* future ops for virtualization features */ .... };
static struct iommufd_viommu my_driver_viommu_alloc(...) { struct my_driver_viommu *my_viommu = iommufd_viommu_alloc(ictx, my_driver_viommu, core, my_driver_viommu_ops); /* Init my_viommu and related HW feature */ .... return &my_viommu->core; }
static struct iommu_domain_ops my_driver_domain_ops = { .... .viommu_alloc = my_driver_viommu_alloc, };
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommu.h | 14 ++++++++++++++ include/linux/iommufd.h | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0150dc2bec6a..aab3016958d0 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -42,6 +42,8 @@ struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; struct iommu_fault_param; +struct iommufd_ctx; +struct iommufd_viommu;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -594,6 +596,14 @@ static inline int __iommu_copy_struct_from_user_array( * @remove_dev_pasid: Remove any translation configurations of a specific * pasid, so that any DMA transactions with this pasid * will be blocked by the hardware. + * @viommu_alloc: Allocate an iommufd_viommu on a physical IOMMU instance behind + * the @dev, as the set of virtualization resources shared/passed + * to user space IOMMU instance. And associate it with a nesting + * @parent_domain. The @viommu_type must be defined in the header + * include/uapi/linux/iommufd.h + * It is required to call iommufd_viommu_alloc() helper for + * a bundled allocation of the core and the driver structures, + * using the given @ictx pointer. * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops * @identity_domain: An always available, always attachable identity @@ -644,6 +654,10 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, struct iommu_domain *domain);
+ struct iommufd_viommu *(*viommu_alloc)( + struct device *dev, struct iommu_domain *parent_domain, + struct iommufd_ctx *ictx, unsigned int viommu_type); + const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; struct module *owner; diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 94522d4029ca..4fc2ce332f10 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -17,6 +17,7 @@ struct iommu_group; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; +struct iommufd_viommu_ops; struct page;
enum iommufd_object_type { @@ -28,6 +29,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, + IOMMUFD_OBJ_VIOMMU, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -78,6 +80,26 @@ void iommufd_access_detach(struct iommufd_access *access);
void iommufd_ctx_get(struct iommufd_ctx *ictx);
+struct iommufd_viommu { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommu_device *iommu_dev; + struct iommufd_hwpt_paging *hwpt; + + const struct iommufd_viommu_ops *ops; + + unsigned int type; +}; + +/** + * struct iommufd_viommu_ops - vIOMMU specific operations + * @destroy: Clean up all driver-specific parts of an iommufd_viommu. The memory + * of the vIOMMU will be free-ed by iommufd core after calling this op + */ +struct iommufd_viommu_ops { + void (*destroy)(struct iommufd_viommu *viommu); +}; + #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); struct iommufd_ctx *iommufd_ctx_from_fd(int fd); @@ -148,4 +170,22 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, return ERR_PTR(-EOPNOTSUPP); } #endif /* CONFIG_IOMMUFD_DRIVER_CORE */ + +/* + * Helpers for IOMMU driver to allocate driver structures that will be freed by + * the iommufd core. The free op will be called prior to freeing the memory. + */ +#define iommufd_viommu_alloc(ictx, drv_struct, member, viommu_ops) \ + ({ \ + drv_struct *ret; \ + \ + static_assert(__same_type(struct iommufd_viommu, \ + ((drv_struct *)NULL)->member)); \ + static_assert(offsetof(drv_struct, member.obj) == 0); \ + ret = (drv_struct *)_iommufd_object_alloc( \ + ictx, sizeof(drv_struct), IOMMUFD_OBJ_VIOMMU); \ + if (!IS_ERR(ret)) \ + ret->member.ops = viommu_ops; \ + ret; \ + }) #endif
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
To support driver-allocated vIOMMU objects, it's required for IOMMU driver to call the provided iommufd_viommu_alloc helper to embed the core struct. However, there is no guarantee that every driver will call it and allocate objects properly.
Make the iommufd_object_finalize/abort functions more robust to verify if the xarray slot indexed by the input obj->id is having an XA_ZERO_ENTRY, which is the reserved value stored by xa_alloc via iommufd_object_alloc.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/main.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 92bd075108e5..33efe156f7ce 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -41,20 +41,26 @@ static struct miscdevice vfio_misc_dev; void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old;
- old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); - /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, obj); + xa_unlock(&ictx->objects); + /* obj->id was returned from xa_alloc() so the xas_store() cannot fail */ + WARN_ON(old != XA_ZERO_ENTRY); }
/* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old;
- old = xa_erase(&ictx->objects, obj->id); - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, NULL); + xa_unlock(&ictx->objects); + WARN_ON(old != XA_ZERO_ENTRY); kfree(obj); }
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount.
IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/iommufd_private.h | 3 + drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 81 +++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 40 ++++++++++++ 5 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/iommufd/viommu.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index de675df52913..9e321140cccd 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -7,7 +7,8 @@ iommufd-y := \ ioas.o \ main.o \ pages.o \ - vfio_compat.o + vfio_compat.o \ + viommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 27ff0d6db47c..a6c87c3512c7 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -499,6 +499,9 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_viommu_destroy(struct iommufd_object *obj); + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 33efe156f7ce..c0c5f1daf549 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -307,6 +307,7 @@ union ucmd_buffer { struct iommu_ioas_unmap unmap; struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; + struct iommu_viommu_alloc viommu; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -358,6 +359,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { val64), IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, __reserved), + IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, + struct iommu_viommu_alloc, out_viommu_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -493,6 +496,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, }, + [IOMMUFD_OBJ_VIOMMU] = { + .destroy = iommufd_viommu_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c new file mode 100644 index 000000000000..888239b78667 --- /dev/null +++ b/drivers/iommu/iommufd/viommu.c @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "iommufd_private.h" + +void iommufd_viommu_destroy(struct iommufd_object *obj) +{ + struct iommufd_viommu *viommu = + container_of(obj, struct iommufd_viommu, obj); + + if (viommu->ops && viommu->ops->destroy) + viommu->ops->destroy(viommu); + refcount_dec(&viommu->hwpt->common.obj.users); +} + +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_alloc *cmd = ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + const struct iommu_ops *ops; + int rc; + + if (cmd->flags || cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + ops = dev_iommu_ops(idev->dev); + if (!ops->viommu_alloc) { + rc = -EOPNOTSUPP; + goto out_put_idev; + } + + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); + goto out_put_idev; + } + + if (!hwpt_paging->nest_parent) { + rc = -EINVAL; + goto out_put_hwpt; + } + + viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain, + ucmd->ictx, cmd->type); + if (IS_ERR(viommu)) { + rc = PTR_ERR(viommu); + goto out_put_hwpt; + } + + viommu->type = cmd->type; + viommu->ictx = ucmd->ictx; + viommu->hwpt = hwpt_paging; + refcount_inc(&viommu->hwpt->common.obj.users); + /* + * It is the most likely case that a physical IOMMU is unpluggable. A + * pluggable IOMMU instance (if exists) is responsible for refcounting + * on its own. + */ + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev); + + cmd->out_viommu_id = viommu->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &viommu->obj); + goto out_put_hwpt; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &viommu->obj); +out_put_hwpt: + iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 72010f71c5e4..9dfc30e3e849 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -51,6 +51,7 @@ enum { IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, + IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, };
/** @@ -797,4 +798,43 @@ struct iommu_fault_alloc { __u32 out_fault_fd; }; #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) + +/** + * enum iommu_viommu_type - Virtual IOMMU Type + * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use + */ +enum iommu_viommu_type { + IOMMU_VIOMMU_TYPE_DEFAULT = 0, +}; + +/** + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) + * @size: sizeof(struct iommu_viommu_alloc) + * @flags: Must be 0 + * @type: Type of the virtual IOMMU. Must be defined in enum iommu_viommu_type + * @dev_id: The device's physical IOMMU will be used to back the virtual IOMMU + * @hwpt_id: ID of a nesting parent HWPT to associate to + * @out_viommu_id: Output virtual IOMMU ID for the allocated object + * + * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's + * virtualization support that is a security-isolated slice of the real IOMMU HW + * that is unique to a specific VM. Operations global to the IOMMU are connected + * to the vIOMMU, such as: + * - Security namespace for guest owned ID, e.g. guest-controlled cache tags + * - Non-device-affiliated event reporting, e.g. invalidation queue errors + * - Access to a sharable nesting parent pagetable across physical IOMMUs + * - Virtualization of various platforms IDs, e.g. RIDs and others + * - Delivery of paravirtualized invalidation + * - Direct assigned invalidation queues + * - Direct assigned interrupts + */ +struct iommu_viommu_alloc { + __u32 size; + __u32 flags; + __u32 type; + __u32 dev_id; + __u32 hwpt_id; + __u32 out_viommu_id; +}; +#define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) #endif
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Allow IOMMU driver to use a vIOMMU object that holds a nesting parent hwpt/domain to allocate a nested domain.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommufd.h | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 4fc2ce332f10..de9b56265c9c 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -14,6 +14,7 @@ struct device; struct file; struct iommu_group; +struct iommu_user_data; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; @@ -95,9 +96,17 @@ struct iommufd_viommu { * struct iommufd_viommu_ops - vIOMMU specific operations * @destroy: Clean up all driver-specific parts of an iommufd_viommu. The memory * of the vIOMMU will be free-ed by iommufd core after calling this op + * @alloc_domain_nested: Allocate a IOMMU_DOMAIN_NESTED on a vIOMMU that holds a + * nesting parent domain (IOMMU_DOMAIN_PAGING). @user_data + * must be defined in include/uapi/linux/iommufd.h. + * It must fully initialize the new iommu_domain before + * returning. Upon failure, ERR_PTR must be returned. */ struct iommufd_viommu_ops { void (*destroy)(struct iommufd_viommu *viommu); + struct iommu_domain *(*alloc_domain_nested)( + struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data); };
#if IS_ENABLED(CONFIG_IOMMUFD)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT.
Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc.
Also, add an iommufd_viommu_alloc_hwpt_nested helper to allocate a nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/hw_pagetable.c | 73 ++++++++++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 1 + include/uapi/linux/iommufd.h | 14 +++-- 3 files changed, 81 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 5118335f1b94..c5bfa842d4e2 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -57,7 +57,10 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hwpt_nested, common.obj);
__iommufd_hwpt_destroy(&hwpt_nested->common); - refcount_dec(&hwpt_nested->parent->common.obj.users); + if (hwpt_nested->viommu) + refcount_dec(&hwpt_nested->viommu->obj.users); + else + refcount_dec(&hwpt_nested->parent->common.obj.users); }
void iommufd_hwpt_nested_abort(struct iommufd_object *obj) @@ -258,6 +261,58 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, return ERR_PTR(rc); }
+/** + * iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU + * @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with + * @flags: Flags from userspace + * @user_data: user_data pointer. Must be valid + * + * Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED + * hw_pagetable. + */ +static struct iommufd_hwpt_nested * +iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data) +{ + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_hw_pagetable *hwpt; + int rc; + + if (!user_data->len) + return ERR_PTR(-EOPNOTSUPP); + if (!viommu->ops || !viommu->ops->alloc_domain_nested) + return ERR_PTR(-EOPNOTSUPP); + + hwpt_nested = __iommufd_object_alloc( + viommu->ictx, hwpt_nested, IOMMUFD_OBJ_HWPT_NESTED, common.obj); + if (IS_ERR(hwpt_nested)) + return ERR_CAST(hwpt_nested); + hwpt = &hwpt_nested->common; + + hwpt_nested->viommu = viommu; + refcount_inc(&viommu->obj.users); + hwpt_nested->parent = viommu->hwpt; + + hwpt->domain = + viommu->ops->alloc_domain_nested(viommu, flags, user_data); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + hwpt->domain->owner = viommu->iommu_dev->ops; + + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + rc = -EINVAL; + goto out_abort; + } + return hwpt_nested; + +out_abort: + iommufd_object_abort_and_destroy(viommu->ictx, &hwpt->obj); + return ERR_PTR(rc); +} + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; @@ -314,6 +369,22 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_unlock; } hwpt = &hwpt_nested->common; + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_viommu *viommu; + + viommu = container_of(pt_obj, struct iommufd_viommu, obj); + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_unlock; + } + hwpt_nested = iommufd_viommu_alloc_hwpt_nested( + viommu, cmd->flags, &user_data); + if (IS_ERR(hwpt_nested)) { + rc = PTR_ERR(hwpt_nested); + goto out_unlock; + } + hwpt = &hwpt_nested->common; } else { rc = -EINVAL; goto out_put_pt; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index a6c87c3512c7..9d82e093a244 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -285,6 +285,7 @@ struct iommufd_hwpt_paging { struct iommufd_hwpt_nested { struct iommufd_hw_pagetable common; struct iommufd_hwpt_paging *parent; + struct iommufd_viommu *viommu; };
static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 9dfc30e3e849..831265b4371c 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -410,7 +410,7 @@ enum iommu_hwpt_data_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS or HWPT to connect this HWPT to + * @pt_id: The IOAS or HWPT or vIOMMU to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 * @data_type: One of enum iommu_hwpt_data_type @@ -429,11 +429,13 @@ enum iommu_hwpt_data_type { * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. * - * A user-managed nested HWPT will be created from a given parent HWPT via - * @pt_id, in which the parent HWPT must be allocated previously via the - * same ioctl from a given IOAS (@pt_id). In this case, the @data_type - * must be set to a pre-defined type corresponding to an I/O page table - * type supported by the underlying IOMMU hardware. + * A user-managed nested HWPT will be created from a given vIOMMU (wrapping a + * parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be + * allocated previously via the same ioctl from a given IOAS (@pt_id). In this + * case, the @data_type must be set to a pre-defined type corresponding to an + * I/O page table type supported by the underlying IOMMU hardware. The device + * via @dev_id and the vIOMMU via @pt_id must be associated to the same IOMMU + * instance. * * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Use these inline helpers to shorten those container_of lines.
Note that one of them goes back and forth between iommu_domain and mock_iommu_domain, which isn't necessary. So drop its container_of.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/selftest.c | 75 ++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 33 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index d53e579013fd..4d8170010a6b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -126,12 +126,24 @@ struct mock_iommu_domain { struct xarray pfns; };
+static inline struct mock_iommu_domain * +to_mock_domain(struct iommu_domain *domain) +{ + return container_of(domain, struct mock_iommu_domain, domain); +} + struct mock_iommu_domain_nested { struct iommu_domain domain; struct mock_iommu_domain *parent; u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; };
+static inline struct mock_iommu_domain_nested * +to_mock_nested(struct iommu_domain *domain) +{ + return container_of(domain, struct mock_iommu_domain_nested, domain); +} + enum selftest_obj_type { TYPE_IDEV, }; @@ -142,6 +154,11 @@ struct mock_dev { int id; };
+static inline struct mock_dev *to_mock_dev(struct device *dev) +{ + return container_of(dev, struct mock_dev, dev); +} + struct selftest_obj { struct iommufd_object obj; enum selftest_obj_type type; @@ -155,10 +172,15 @@ struct selftest_obj { }; };
+static inline struct selftest_obj *to_selftest_obj(struct iommufd_object *obj) +{ + return container_of(obj, struct selftest_obj, obj); +} + static int mock_domain_nop_attach(struct iommu_domain *domain, struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) return -EINVAL; @@ -193,8 +215,7 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type) static int mock_domain_set_dirty_tracking(struct iommu_domain *domain, bool enable) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long flags = mock->flags;
if (enable && !domain->dirty_ops) @@ -243,8 +264,7 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain, unsigned long flags, struct iommu_dirty_bitmap *dirty) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long end = iova + size; void *ent;
@@ -281,7 +301,7 @@ const struct iommu_dirty_ops dirty_ops = {
static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev); struct mock_iommu_domain *mock;
mock = kzalloc(sizeof(*mock), GFP_KERNEL); @@ -327,7 +347,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
/* must be mock_domain */ if (!parent) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev); bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; struct iommu_domain *domain; @@ -341,8 +361,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!domain) return ERR_PTR(-ENOMEM); if (has_dirty_flag) - container_of(domain, struct mock_iommu_domain, domain) - ->domain.dirty_ops = &dirty_ops; + domain->dirty_ops = &dirty_ops; return domain; }
@@ -352,7 +371,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!parent || parent->ops != mock_ops.default_domain_ops) return ERR_PTR(-EINVAL);
- mock_parent = container_of(parent, struct mock_iommu_domain, domain); + mock_parent = to_mock_domain(parent); if (!mock_parent) return ERR_PTR(-EINVAL);
@@ -366,8 +385,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
static void mock_domain_free(struct iommu_domain *domain) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain);
WARN_ON(!xa_empty(&mock->pfns)); kfree(mock); @@ -378,8 +396,7 @@ static int mock_domain_map_pages(struct iommu_domain *domain, size_t pgsize, size_t pgcount, int prot, gfp_t gfp, size_t *mapped) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); unsigned long flags = MOCK_PFN_START_IOVA; unsigned long start_iova = iova;
@@ -430,8 +447,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain, size_t pgcount, struct iommu_iotlb_gather *iotlb_gather) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); bool first = true; size_t ret = 0; void *ent; @@ -479,8 +495,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain, static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { - struct mock_iommu_domain *mock = - container_of(domain, struct mock_iommu_domain, domain); + struct mock_iommu_domain *mock = to_mock_domain(domain); void *ent;
WARN_ON(iova % MOCK_IO_PAGE_SIZE); @@ -491,7 +506,7 @@ static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain,
static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: @@ -571,18 +586,14 @@ static const struct iommu_ops mock_ops = {
static void mock_domain_free_nested(struct iommu_domain *domain) { - struct mock_iommu_domain_nested *mock_nested = - container_of(domain, struct mock_iommu_domain_nested, domain); - - kfree(mock_nested); + kfree(to_mock_nested(domain)); }
static int mock_domain_cache_invalidate_user(struct iommu_domain *domain, struct iommu_user_data_array *array) { - struct mock_iommu_domain_nested *mock_nested = - container_of(domain, struct mock_iommu_domain_nested, domain); + struct mock_iommu_domain_nested *mock_nested = to_mock_nested(domain); struct iommu_hwpt_invalidate_selftest inv; u32 processed = 0; int i = 0, j; @@ -657,7 +668,7 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, iommufd_put_object(ucmd->ictx, &hwpt->obj); return ERR_PTR(-EINVAL); } - *mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); + *mock = to_mock_domain(hwpt->domain); return hwpt; }
@@ -675,14 +686,13 @@ get_md_pagetable_nested(struct iommufd_ucmd *ucmd, u32 mockpt_id, iommufd_put_object(ucmd->ictx, &hwpt->obj); return ERR_PTR(-EINVAL); } - *mock_nested = container_of(hwpt->domain, - struct mock_iommu_domain_nested, domain); + *mock_nested = to_mock_nested(hwpt->domain); return hwpt; }
static void mock_dev_release(struct device *dev) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + struct mock_dev *mdev = to_mock_dev(dev);
ida_free(&mock_dev_ida, mdev->id); kfree(mdev); @@ -813,7 +823,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd, if (IS_ERR(dev_obj)) return PTR_ERR(dev_obj);
- sobj = container_of(dev_obj, struct selftest_obj, obj); + sobj = to_selftest_obj(dev_obj); if (sobj->type != TYPE_IDEV) { rc = -EINVAL; goto out_dev_obj; @@ -951,8 +961,7 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, if (IS_ERR(hwpt)) return PTR_ERR(hwpt);
- mock_nested = container_of(hwpt->domain, - struct mock_iommu_domain_nested, domain); + mock_nested = to_mock_nested(hwpt->domain);
if (iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX || mock_nested->iotlb[iotlb_id] != iotlb) @@ -1432,7 +1441,7 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
void iommufd_selftest_destroy(struct iommufd_object *obj) { - struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); + struct selftest_obj *sobj = to_selftest_obj(obj);
switch (sobj->type) { case TYPE_IDEV:
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
A nested domain now can be allocated for a parent domain or for a vIOMMU object. Rework the existing allocators to prepare for the latter case.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/selftest.c | 89 ++++++++++++++++++-------------- 1 file changed, 50 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 4d8170010a6b..85ff45f1776d 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -318,55 +318,39 @@ static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) return &mock->domain; }
-static struct iommu_domain * -__mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, - const struct iommu_hwpt_selftest *user_cfg) +static struct mock_iommu_domain_nested * +__mock_domain_alloc_nested(const struct iommu_user_data *user_data) { struct mock_iommu_domain_nested *mock_nested; - int i; + struct iommu_hwpt_selftest user_cfg; + int rc, i; + + if (user_data->type != IOMMU_HWPT_DATA_SELFTEST) + return ERR_PTR(-EOPNOTSUPP); + + rc = iommu_copy_struct_from_user(&user_cfg, user_data, + IOMMU_HWPT_DATA_SELFTEST, iotlb); + if (rc) + return ERR_PTR(rc);
mock_nested = kzalloc(sizeof(*mock_nested), GFP_KERNEL); if (!mock_nested) return ERR_PTR(-ENOMEM); - mock_nested->parent = mock_parent; mock_nested->domain.ops = &domain_nested_ops; mock_nested->domain.type = IOMMU_DOMAIN_NESTED; for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) - mock_nested->iotlb[i] = user_cfg->iotlb; - return &mock_nested->domain; + mock_nested->iotlb[i] = user_cfg.iotlb; + return mock_nested; }
static struct iommu_domain * -mock_domain_alloc_user(struct device *dev, u32 flags, - struct iommu_domain *parent, - const struct iommu_user_data *user_data) +mock_domain_alloc_nested(struct iommu_domain *parent, u32 flags, + const struct iommu_user_data *user_data) { + struct mock_iommu_domain_nested *mock_nested; struct mock_iommu_domain *mock_parent; - struct iommu_hwpt_selftest user_cfg; - int rc; - - /* must be mock_domain */ - if (!parent) { - struct mock_dev *mdev = to_mock_dev(dev); - bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; - bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; - struct iommu_domain *domain; - - if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT | - IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) - return ERR_PTR(-EOPNOTSUPP); - if (user_data || (has_dirty_flag && no_dirty_ops)) - return ERR_PTR(-EOPNOTSUPP); - domain = mock_domain_alloc_paging(dev); - if (!domain) - return ERR_PTR(-ENOMEM); - if (has_dirty_flag) - domain->dirty_ops = &dirty_ops; - return domain; - }
- /* must be mock_domain_nested */ - if (user_data->type != IOMMU_HWPT_DATA_SELFTEST || flags) + if (flags) return ERR_PTR(-EOPNOTSUPP); if (!parent || parent->ops != mock_ops.default_domain_ops) return ERR_PTR(-EINVAL); @@ -375,12 +359,39 @@ mock_domain_alloc_user(struct device *dev, u32 flags, if (!mock_parent) return ERR_PTR(-EINVAL);
- rc = iommu_copy_struct_from_user(&user_cfg, user_data, - IOMMU_HWPT_DATA_SELFTEST, iotlb); - if (rc) - return ERR_PTR(rc); + mock_nested = __mock_domain_alloc_nested(user_data); + if (IS_ERR(mock_nested)) + return ERR_CAST(mock_nested); + mock_nested->parent = mock_parent; + return &mock_nested->domain; +} + +static struct iommu_domain * +mock_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) +{ + bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | + IOMMU_HWPT_ALLOC_NEST_PARENT; + bool no_dirty_ops = to_mock_dev(dev)->flags & + MOCK_FLAGS_DEVICE_NO_DIRTY; + struct iommu_domain *domain; + + if (parent) + return mock_domain_alloc_nested(parent, flags, user_data);
- return __mock_domain_alloc_nested(mock_parent, &user_cfg); + if (user_data) + return ERR_PTR(-EOPNOTSUPP); + if ((flags & ~PAGING_FLAGS) || (has_dirty_flag && no_dirty_ops)) + return ERR_PTR(-EOPNOTSUPP); + + domain = mock_domain_alloc_paging(dev); + if (!domain) + return ERR_PTR(-ENOMEM); + if (has_dirty_flag) + domain->dirty_ops = &dirty_ops; + return domain; }
static void mock_domain_free(struct iommu_domain *domain)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
For an iommu_dev that can unplug (so far only this selftest does so), the viommu->iommu_dev pointer has no guarantee of its life cycle after it is copied from the idev->dev->iommu->iommu_dev.
Track the user count of the iommu_dev. Postpone the exit routine using a completion, if refcount is unbalanced. The refcount inc/dec will be added in the following patch.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/selftest.c | 39 +++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 85ff45f1776d..1ad2d090b10b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -533,14 +533,17 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
static struct iopf_queue *mock_iommu_iopf_queue;
-static struct iommu_device mock_iommu_device = { -}; +static struct mock_iommu_device { + struct iommu_device iommu_dev; + struct completion complete; + refcount_t users; +} mock_iommu;
static struct iommu_device *mock_probe_device(struct device *dev) { if (dev->bus != &iommufd_mock_bus_type.bus) return ERR_PTR(-ENODEV); - return &mock_iommu_device; + return &mock_iommu.iommu_dev; }
static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, @@ -1557,24 +1560,27 @@ int __init iommufd_test_init(void) if (rc) goto err_platform;
- rc = iommu_device_sysfs_add(&mock_iommu_device, + rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev, &selftest_iommu_dev->dev, NULL, "%s", dev_name(&selftest_iommu_dev->dev)); if (rc) goto err_bus;
- rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops, + rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); if (rc) goto err_sysfs;
+ refcount_set(&mock_iommu.users, 1); + init_completion(&mock_iommu.complete); + mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
return 0;
err_sysfs: - iommu_device_sysfs_remove(&mock_iommu_device); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); err_bus: bus_unregister(&iommufd_mock_bus_type.bus); err_platform: @@ -1584,6 +1590,22 @@ int __init iommufd_test_init(void) return rc; }
+static void iommufd_test_wait_for_users(void) +{ + if (refcount_dec_and_test(&mock_iommu.users)) + return; + /* + * Time out waiting for iommu device user count to become 0. + * + * Note that this is just making an example here, since the selftest is + * built into the iommufd module, i.e. it only unplugs the iommu device + * when unloading the module. So, it is expected that this WARN_ON will + * not trigger, as long as any iommufd FDs are open. + */ + WARN_ON(!wait_for_completion_timeout(&mock_iommu.complete, + msecs_to_jiffies(10000))); +} + void iommufd_test_exit(void) { if (mock_iommu_iopf_queue) { @@ -1591,8 +1613,9 @@ void iommufd_test_exit(void) mock_iommu_iopf_queue = NULL; }
- iommu_device_sysfs_remove(&mock_iommu_device); - iommu_device_unregister_bus(&mock_iommu_device, + iommufd_test_wait_for_users(); + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); + iommu_device_unregister_bus(&mock_iommu.iommu_dev, &iommufd_mock_bus_type.bus, &iommufd_mock_bus_type.nb); bus_unregister(&iommufd_mock_bus_type.bus);
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Implement the viommu alloc/free functions to increase/reduce refcount of its dependent mock iommu device. User space can verify this loop via the IOMMU_VIOMMU_TYPE_SELFTEST.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_test.h | 2 + drivers/iommu/iommufd/selftest.c | 67 ++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index f4bc23a92f9a..edced4ac7cd3 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -180,4 +180,6 @@ struct iommu_hwpt_invalidate_selftest { __u32 iotlb_id; };
+#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 1ad2d090b10b..e73353652d0b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -134,6 +134,7 @@ to_mock_domain(struct iommu_domain *domain)
struct mock_iommu_domain_nested { struct iommu_domain domain; + struct mock_viommu *mock_viommu; struct mock_iommu_domain *parent; u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; }; @@ -144,6 +145,16 @@ to_mock_nested(struct iommu_domain *domain) return container_of(domain, struct mock_iommu_domain_nested, domain); }
+struct mock_viommu { + struct iommufd_viommu core; + struct mock_iommu_domain *s2_parent; +}; + +static inline struct mock_viommu *to_mock_viommu(struct iommufd_viommu *viommu) +{ + return container_of(viommu, struct mock_viommu, core); +} + enum selftest_obj_type { TYPE_IDEV, }; @@ -569,6 +580,61 @@ static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features fea return 0; }
+static void mock_viommu_destroy(struct iommufd_viommu *viommu) +{ + struct mock_iommu_device *mock_iommu = container_of( + viommu->iommu_dev, struct mock_iommu_device, iommu_dev); + + if (refcount_dec_and_test(&mock_iommu->users)) + complete(&mock_iommu->complete); + + /* iommufd core frees mock_viommu and viommu */ +} + +static struct iommu_domain * +mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data) +{ + struct mock_viommu *mock_viommu = to_mock_viommu(viommu); + struct mock_iommu_domain_nested *mock_nested; + + if (flags & ~IOMMU_HWPT_FAULT_ID_VALID) + return ERR_PTR(-EOPNOTSUPP); + + mock_nested = __mock_domain_alloc_nested(user_data); + if (IS_ERR(mock_nested)) + return ERR_CAST(mock_nested); + mock_nested->mock_viommu = mock_viommu; + mock_nested->parent = mock_viommu->s2_parent; + return &mock_nested->domain; +} + +static struct iommufd_viommu_ops mock_viommu_ops = { + .destroy = mock_viommu_destroy, + .alloc_domain_nested = mock_viommu_alloc_domain_nested, +}; + +static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, + struct iommu_domain *domain, + struct iommufd_ctx *ictx, + unsigned int viommu_type) +{ + struct mock_iommu_device *mock_iommu = + iommu_get_iommu_dev(dev, struct mock_iommu_device, iommu_dev); + struct mock_viommu *mock_viommu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_SELFTEST) + return ERR_PTR(-EOPNOTSUPP); + + mock_viommu = iommufd_viommu_alloc(ictx, struct mock_viommu, core, + &mock_viommu_ops); + if (IS_ERR(mock_viommu)) + return ERR_CAST(mock_viommu); + + refcount_inc(&mock_iommu->users); + return &mock_viommu->core; +} + static const struct iommu_ops mock_ops = { /* * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() @@ -588,6 +654,7 @@ static const struct iommu_ops mock_ops = { .dev_enable_feat = mock_dev_enable_feat, .dev_disable_feat = mock_dev_disable_feat, .user_pasid_table = true, + .viommu_alloc = mock_viommu_alloc, .default_domain_ops = &(struct iommu_domain_ops){ .free = mock_domain_free,
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a new iommufd_viommu FIXTURE and setup it up with a vIOMMU object.
Any new vIOMMU feature will be added as a TEST_F under that.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- tools/testing/selftests/iommu/iommufd.c | 137 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 ++ tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++ 3 files changed, 176 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 9ebe55b875f4..f512846f6361 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -128,6 +128,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); + TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); #undef TEST_LENGTH }
@@ -2368,4 +2369,140 @@ TEST_F(vfio_compat_mock_domain, huge_map) } }
+FIXTURE(iommufd_viommu) +{ + int fd; + uint32_t ioas_id; + uint32_t stdev_id; + uint32_t hwpt_id; + uint32_t nested_hwpt_id; + uint32_t device_id; + uint32_t viommu_id; +}; + +FIXTURE_VARIANT(iommufd_viommu) +{ + unsigned int viommu; +}; + +FIXTURE_SETUP(iommufd_viommu) +{ + self->fd = open("/dev/iommu", O_RDWR); + ASSERT_NE(-1, self->fd); + test_ioctl_ioas_alloc(&self->ioas_id); + test_ioctl_set_default_memory_limit(); + + if (variant->viommu) { + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, NULL, + &self->device_id); + + /* Allocate a nesting parent hwpt */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &self->hwpt_id); + + /* Allocate a vIOMMU taking refcount of the parent hwpt */ + test_cmd_viommu_alloc(self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, + &self->viommu_id); + + /* Allocate a regular nested hwpt */ + test_cmd_hwpt_alloc_nested(self->device_id, self->viommu_id, 0, + &self->nested_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + } +} + +FIXTURE_TEARDOWN(iommufd_viommu) +{ + teardown_iommufd(self->fd, _metadata); +} + +FIXTURE_VARIANT_ADD(iommufd_viommu, no_viommu) +{ + .viommu = 0, +}; + +FIXTURE_VARIANT_ADD(iommufd_viommu, mock_viommu) +{ + .viommu = 1, +}; + +TEST_F(iommufd_viommu, viommu_auto_destroy) +{ +} + +TEST_F(iommufd_viommu, viommu_negative_tests) +{ + uint32_t device_id = self->device_id; + uint32_t ioas_id = self->ioas_id; + uint32_t hwpt_id; + + if (self->device_id) { + /* Negative test -- invalid hwpt (hwpt_id=0) */ + test_err_viommu_alloc(ENOENT, device_id, 0, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + + /* Negative test -- not a nesting parent hwpt */ + test_cmd_hwpt_alloc(device_id, ioas_id, 0, &hwpt_id); + test_err_viommu_alloc(EINVAL, device_id, hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + test_ioctl_destroy(hwpt_id); + + /* Negative test -- unsupported viommu type */ + test_err_viommu_alloc(EOPNOTSUPP, device_id, self->hwpt_id, + 0xdead, NULL); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, self->hwpt_id)); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, self->viommu_id)); + } else { + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + } +} + +TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) +{ + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t iopf_hwpt_id; + uint32_t fault_id; + uint32_t fault_fd; + + if (self->device_id) { + test_ioctl_fault_alloc(&fault_id, &fault_fd); + test_err_hwpt_alloc_iopf( + ENOENT, dev_id, viommu_id, UINT32_MAX, + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_err_hwpt_alloc_iopf( + EOPNOTSUPP, dev_id, viommu_id, fault_id, + IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_alloc_iopf( + dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID, + &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); + test_cmd_trigger_iopf(dev_id, fault_fd); + + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(iopf_hwpt_id); + close(fault_fd); + test_ioctl_destroy(fault_id); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index c5d5e69452b0..e9a980b7729b 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -582,6 +582,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t stdev_id; uint32_t idev_id; uint32_t hwpt_id; + uint32_t viommu_id; __u64 iova;
self->fd = open("/dev/iommu", O_RDWR); @@ -624,6 +625,16 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL)) return -1; + + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id, + IOMMU_HWPT_DATA_NONE, 0, 0)) + return -1; + + if (_test_cmd_viommu_alloc(self->fd, idev_id, hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) + return -1; + return 0; }
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index a70e35a78584..caedfc154a89 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -760,3 +760,31 @@ static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
#define test_cmd_trigger_iopf(device_id, fault_fd) \ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd)) + +static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, + __u32 type, __u32 flags, __u32 *viommu_id) +{ + struct iommu_viommu_alloc cmd = { + .size = sizeof(cmd), + .flags = flags, + .type = type, + .dev_id = device_id, + .hwpt_id = hwpt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &cmd); + if (ret) + return ret; + if (viommu_id) + *viommu_id = cmd.out_viommu_id; + return 0; +} + +#define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id)) +#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ + type, 0, viommu_id))
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
With the introduction of the new object and its infrastructure, update the doc to reflect that and add a new graph.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- Documentation/userspace-api/iommufd.rst | 69 ++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 2deba93bf159..a8b7766c2849 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,37 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses.
+- IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, + passed to or shared with a VM. It may be some HW-accelerated virtualization + features and some SW resources used by the VM. For examples: + * Security namespace for guest owned ID, e.g. guest-controlled cache tags + * Non-device-affiliated event reporting, e.g. invalidation queue errors + * Access to a sharable nesting parent pagetable across physical IOMMUs + * Virtualization of various platforms IDs, e.g. RIDs and others + * Delivery of paravirtualized invalidation + * Direct assigned invalidation queues + * Direct assigned interrupts + Such a vIOMMU object generally has the access to a nesting parent pagetable + to support some HW-accelerated virtualization features. So, a vIOMMU object + must be created given a nesting parent HWPT_PAGING object, and then it would + encapsulate that HWPT_PAGING object. Therefore, a vIOMMU object can be used + to allocate an HWPT_NESTED object in place of the encapsulated HWPT_PAGING. + + .. note:: + + The name "vIOMMU" isn't necessarily identical to a virtualized IOMMU in a + VM. A VM can have one giant virtualized IOMMU running on a machine having + multiple physical IOMMUs, in which case the VMM will dispatch the requests + or configurations from this single virtualized IOMMU instance to multiple + vIOMMU objects created for individual slices of different physical IOMMUs. + In other words, a vIOMMU object is always a representation of one physical + IOMMU, not necessarily of a virtualized IOMMU. For VMMs that want the full + virtualization features from physical IOMMUs, it is suggested to build the + same number of virtualized IOMMUs as the number of physical IOMMUs, so the + passed-through devices would be connected to their own virtualized IOMMUs + backed by corresponding vIOMMU objects, in which case a guest OS would do + the "dispatch" naturally instead of VMM trappings. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -101,6 +132,28 @@ creating the objects and links:: |------------>|iommu_domain|<----|iommu_domain|<----|device| |____________| |____________| |______|
+ _______________________________________________________________________ + | iommufd (with vIOMMU) | + | | + | [5] | + | _____________ | + | | | | + | |----------------| vIOMMU | | + | | | | | + | | | | | + | | [1] | | [4] [2] | + | | ______ | | _____________ ________ | + | | | | | [3] | | | | | | + | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | + | | |______| |_____________| |_____________| |________| | + | | | | | | | + |______|________|______________|__________________|_______________|_____| + | | | | | + ______v_____ | ______v_____ ______v_____ ___v__ + | struct | | PFN | (paging) | | (nested) | |struct| + |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| + |____________| storage|____________| |____________| |______| + 1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can hold multiple IOAS objects. IOAS is the most generic object and does not expose interfaces that are specific to single IOMMU drivers. All operations @@ -132,7 +185,8 @@ creating the objects and links:: flag is set.
4. IOMMUFD_OBJ_HWPT_NESTED can be only manually created via the IOMMU_HWPT_ALLOC - uAPI, provided an hwpt_id via @pt_id to associate the new HWPT_NESTED object + uAPI, provided an hwpt_id or a viommu_id of a vIOMMU object encapsulating a + nesting parent HWPT_PAGING via @pt_id to associate the new HWPT_NESTED object to the corresponding HWPT_PAGING object. The associating HWPT_PAGING object must be a nesting parent manually allocated via the same uAPI previously with an IOMMU_HWPT_ALLOC_NEST_PARENT flag, otherwise the allocation will fail. The @@ -149,6 +203,18 @@ creating the objects and links:: created via the same IOMMU_HWPT_ALLOC uAPI. The difference is at the type of the object passed in via the @pt_id field of struct iommufd_hwpt_alloc.
+5. IOMMUFD_OBJ_VIOMMU can be only manually created via the IOMMU_VIOMMU_ALLOC + uAPI, provided a dev_id (for the device's physical IOMMU to back the vIOMMU) + and an hwpt_id (to associate the vIOMMU to a nesting parent HWPT_PAGING). The + iommufd core will link the vIOMMU object to the struct iommu_device that the + struct device is behind. And an IOMMU driver can implement a viommu_alloc op + to allocate its own vIOMMU data structure embedding the core-level structure + iommufd_viommu and some driver-specific data. If necessary, the driver can + also configure its HW virtualization feature for that vIOMMU (and thus for + the VM). Successful completion of this operation sets up the linkages between + the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used + as a nesting parent object to allocate an HWPT_NESTED object described above. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -161,6 +227,7 @@ User visible objects are backed by following datastructures: - iommufd_device for IOMMUFD_OBJ_DEVICE. - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. +- iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
Several terminologies when looking at these datastructures:
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct device) against a vIOMMU (struct iommufd_viommu) object in a VM.
This vDEVICE object (and its structure) holds all the infos and attributes in the VM, regarding the device related to the vIOMMU.
As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure would hold some vData for Confidential Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray that belongs to a vIOMMU object.
Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection of a device object and an iommufd_viommu object, take two refcounts in the ioctl handler.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_private.h | 18 ++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 76 +++++++++++++++++++++++++ include/linux/iommufd.h | 4 ++ include/uapi/linux/iommufd.h | 22 +++++++ 5 files changed, 126 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 9d82e093a244..85a2d4c1cd83 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -500,8 +500,26 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); }
+static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vdevice_destroy(struct iommufd_object *obj); + +struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_viommu *viommu; + struct device *dev; + u64 id; /* per-vIOMMU virtual ID */ +};
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index c0c5f1daf549..3872abbd8729 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -308,6 +308,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_vdevice_alloc vdev; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -361,6 +362,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, virt_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -499,6 +502,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, + [IOMMUFD_OBJ_VDEVICE] = { + .destroy = iommufd_vdevice_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 888239b78667..69b88e8c7c26 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -11,6 +11,7 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) if (viommu->ops && viommu->ops->destroy) viommu->ops->destroy(viommu); refcount_dec(&viommu->hwpt->common.obj.users); + xa_destroy(&viommu->vdevs); }
int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -53,6 +54,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_hwpt; }
+ xa_init(&viommu->vdevs); viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; @@ -79,3 +81,77 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + + /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + refcount_dec(&viommu->obj.users); + put_device(vdev->dev); +} + +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vdevice_alloc *cmd = ucmd->cmd; + struct iommufd_vdevice *vdev, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 virt_id = cmd->virt_id; + int rc = 0; + + /* virt_id indexes an xarray */ + if (virt_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_put_idev; + } + + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); + if (IS_ERR(vdev)) { + rc = PTR_ERR(vdev); + goto out_put_idev; + } + + vdev->id = virt_id; + vdev->dev = idev->dev; + get_device(idev->dev); + vdev->viommu = viommu; + refcount_inc(&viommu->obj.users); + + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ?: -EEXIST; + goto out_abort; + } + + cmd->out_vdevice_id = vdev->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vdev->obj); + goto out_put_idev; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index de9b56265c9c..71fa1e343023 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -10,6 +10,7 @@ #include <linux/errno.h> #include <linux/refcount.h> #include <linux/types.h> +#include <linux/xarray.h>
struct device; struct file; @@ -31,6 +32,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -89,6 +91,8 @@ struct iommufd_viommu {
const struct iommufd_viommu_ops *ops;
+ struct xarray vdevs; + unsigned int type; };
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 831265b4371c..79b626a4e3bd 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,7 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, + IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, };
/** @@ -839,4 +840,25 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) + * @size: sizeof(struct iommu_vdevice_alloc) + * @viommu_id: vIOMMU ID to associate with the virtual device + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table + * + * Allocate a virtual device instance (for a physical device) against a vIOMMU. + * This instance holds the device's information (related to its vIOMMU) in a VM. + */ +struct iommu_vdevice_alloc { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 out_vdevice_id; + __aligned_u64 virt_id; +}; +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) #endif
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a vdevice_alloc op to the viommu mock_viommu_ops for the coverage of IOMMU_VIOMMU_TYPE_SELFTEST allocations. Then, add a vdevice_alloc TEST_F to cover the IOMMU_VDEVICE_ALLOC ioctl.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- tools/testing/selftests/iommu/iommufd.c | 20 ++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 4 +++ tools/testing/selftests/iommu/iommufd_utils.h | 27 +++++++++++++++++++ 3 files changed, 51 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index f512846f6361..5897d16404fc 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); + TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, virt_id); #undef TEST_LENGTH }
@@ -2505,4 +2506,23 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) } }
+TEST_F(iommufd_viommu, vdevice_alloc) +{ + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + + if (dev_id) { + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99, + &vdev_id); + test_ioctl_destroy(vdev_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); + test_ioctl_destroy(vdev_id); + } else { + test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index e9a980b7729b..28f11b26f836 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -583,6 +583,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) uint32_t idev_id; uint32_t hwpt_id; uint32_t viommu_id; + uint32_t vdev_id; __u64 iova;
self->fd = open("/dev/iommu", O_RDWR); @@ -635,6 +636,9 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) return -1;
+ if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id)) + return -1; + return 0; }
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index caedfc154a89..997bf1b2c7a8 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -788,3 +788,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, EXPECT_ERRNO(_errno, \ _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ type, 0, viommu_id)) + +static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, + __u64 virt_id, __u32 *vdev_id) +{ + struct iommu_vdevice_alloc cmd = { + .size = sizeof(cmd), + .dev_id = idev_id, + .viommu_id = viommu_id, + .virt_id = virt_id, + }; + int ret; + + ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd); + if (ret) + return ret; + if (vdev_id) + *vdev_id = cmd.out_vdevice_id; + return 0; +} + +#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id) \ + ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id)) +#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ + virt_id, vdev_id))
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI ATC invaldiations).
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommufd.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 71fa1e343023..2bc735ff9511 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -16,6 +16,7 @@ struct device; struct file; struct iommu_group; struct iommu_user_data; +struct iommu_user_data_array; struct iommufd_access; struct iommufd_ctx; struct iommufd_device; @@ -105,12 +106,21 @@ struct iommufd_viommu { * must be defined in include/uapi/linux/iommufd.h. * It must fully initialize the new iommu_domain before * returning. Upon failure, ERR_PTR must be returned. + * @cache_invalidate: Flush hardware cache used by a vIOMMU. It can be used for + * any IOMMU hardware specific cache: TLB and device cache. + * The @array passes in the cache invalidation requests, in + * form of a driver data structure. A driver must update the + * array->entry_num to report the number of handled requests. + * The data structure of the array entry must be defined in + * include/uapi/linux/iommufd.h */ struct iommufd_viommu_ops { void (*destroy)(struct iommufd_viommu *viommu); struct iommu_domain *(*alloc_domain_nested)( struct iommufd_viommu *viommu, u32 flags, const struct iommu_user_data *user_data); + int (*cache_invalidate)(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array); };
#if IS_ENABLED(CONFIG_IOMMUFD)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
With a vIOMMU object, use space can flush any IOMMU related cache that can be directed via a vIOMMU object. It is similar to the IOMMU_HWPT_INVALIDATE uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.
Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id, and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers can define different structures for vIOMMU invalidations v.s. HWPT ones.
Since both the HWPT-based and vIOMMU-based invalidation pathways check own cache invalidation op, remove the WARN_ON_ONCE in the allocator.
Update the uAPI, kdoc, and selftest case accordingly.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/hw_pagetable.c | 37 ++++++++++++++++++++----- include/uapi/linux/iommufd.h | 9 ++++-- tools/testing/selftests/iommu/iommufd.c | 4 +-- 3 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index c5bfa842d4e2..9e059eecef4e 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -481,7 +481,7 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) .entry_len = cmd->entry_len, .entry_num = cmd->entry_num, }; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_object *pt_obj; u32 done_num = 0; int rc;
@@ -495,17 +495,40 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out; }
- hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id); - if (IS_ERR(hwpt)) { - rc = PTR_ERR(hwpt); + pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = PTR_ERR(pt_obj); goto out; } + if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) { + struct iommufd_hw_pagetable *hwpt = + container_of(pt_obj, struct iommufd_hw_pagetable, obj); + + if (!hwpt->domain->ops || + !hwpt->domain->ops->cache_invalidate_user) { + rc = -EOPNOTSUPP; + goto out_put_pt; + } + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, + &data_array); + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { + struct iommufd_viommu *viommu = + container_of(pt_obj, struct iommufd_viommu, obj); + + if (!viommu->ops || !viommu->ops->cache_invalidate) { + rc = -EOPNOTSUPP; + goto out_put_pt; + } + rc = viommu->ops->cache_invalidate(viommu, &data_array); + } else { + rc = -EINVAL; + goto out_put_pt; + }
- rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, - &data_array); done_num = data_array.entry_num;
- iommufd_put_object(ucmd->ictx, &hwpt->obj); +out_put_pt: + iommufd_put_object(ucmd->ictx, pt_obj); out: cmd->entry_num = done_num; if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 79b626a4e3bd..e266dfa6a38d 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -675,7 +675,7 @@ struct iommu_hwpt_vtd_s1_invalidate { /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate) - * @hwpt_id: ID of a nested HWPT for cache invalidation + * @hwpt_id: ID of a nested HWPT or a vIOMMU, for cache invalidation * @data_uptr: User pointer to an array of driver-specific cache invalidation * data. * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data @@ -686,8 +686,11 @@ struct iommu_hwpt_vtd_s1_invalidate { * Output the number of requests successfully handled by kernel. * @__reserved: Must be 0. * - * Invalidate the iommu cache for user-managed page table. Modifications on a - * user-managed page table should be followed by this operation to sync cache. + * Invalidate iommu cache for user-managed page table or vIOMMU. Modifications + * on a user-managed page table should be followed by this operation, if a HWPT + * is passed in via @hwpt_id. Other caches, such as device cache or descriptor + * cache can be flushed if a vIOMMU is passed in via the @hwpt_id field. + * * Each ioctl can support one or more cache invalidation requests in the array * that has a total size of @entry_len * @entry_num. * diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 5897d16404fc..8a4b961acae3 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -362,9 +362,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
- /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + /* hwpt_invalidate does not support a parent hwpt */ num_inv = 1; - test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs, + test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs, IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, sizeof(*inv_reqs), &num_inv); assert(!num_inv);
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
The iommu_copy_struct_from_user_array helper can be used to copy a single entry from a user array which might not be efficient if the array is big.
Add a new iommu_copy_struct_from_full_user_array to copy the entire user array at once. Update the existing iommu_copy_struct_from_user_array kdoc accordingly.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/linux/iommu.h | 48 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index aab3016958d0..a2a83ad29962 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -545,7 +545,9 @@ static inline int __iommu_copy_struct_from_user_array( * @index: Index to the location in the array to copy user data from * @min_last: The last member of the data structure @kdst points in the * initial version. - * Return 0 for success, otherwise -error. + * + * Copy a single entry from a user array. Return 0 for success, otherwise + * -error. */ #define iommu_copy_struct_from_user_array(kdst, user_array, data_type, index, \ min_last) \ @@ -553,6 +555,50 @@ static inline int __iommu_copy_struct_from_user_array( kdst, user_array, data_type, index, sizeof(*(kdst)), \ offsetofend(typeof(*(kdst)), min_last))
+/** + * iommu_copy_struct_from_full_user_array - Copy iommu driver specific user + * space data from an iommu_user_data_array + * @kdst: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @kdst_entry_size: sizeof(*kdst) + * @user_array: Pointer to a struct iommu_user_data_array for a user space + * array + * @data_type: The data type of the @kdst. Must match with @user_array->type + * + * Copy the entire user array. kdst must have room for kdst_entry_size * + * user_array->entry_num bytes. Return 0 for success, otherwise -error. + */ +static inline int +iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, + struct iommu_user_data_array *user_array, + unsigned int data_type) +{ + unsigned int i; + int ret; + + if (user_array->type != data_type) + return -EINVAL; + if (!user_array->entry_num) + return -EINVAL; + if (likely(user_array->entry_len == kdst_entry_size)) { + if (copy_from_user(kdst, user_array->uptr, + user_array->entry_num * + user_array->entry_len)) + return -EFAULT; + } + + /* Copy item by item */ + for (i = 0; i != user_array->entry_num; i++) { + ret = copy_struct_from_user( + kdst + kdst_entry_size * i, kdst_entry_size, + user_array->uptr + user_array->entry_len * i, + user_array->entry_len); + if (ret) + return ret; + } + return 0; +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
This avoids a bigger trouble of exposing struct iommufd_device and struct iommufd_vdevice in the public header.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/driver.c | 13 +++++++++++++ include/linux/iommufd.h | 8 ++++++++ 2 files changed, 21 insertions(+)
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 2bc47d92a0ab..7b67fdf44134 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -36,5 +36,18 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, } EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD);
+/* Caller should xa_lock(&viommu->vdevs) to protect the return value */ +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, + unsigned long vdev_id) +{ + struct iommufd_vdevice *vdev; + + lockdep_assert_held(&viommu->vdevs.xa_lock); + + vdev = xa_load(&viommu->vdevs, vdev_id); + return vdev ? vdev->dev : NULL; +} +EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, IOMMUFD); + MODULE_DESCRIPTION("iommufd code shared with builtin modules"); MODULE_LICENSE("GPL"); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 2bc735ff9511..11110c749200 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -185,6 +185,8 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type); +struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, + unsigned long vdev_id); #else /* !CONFIG_IOMMUFD_DRIVER_CORE */ static inline struct iommufd_object * _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, @@ -192,6 +194,12 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, { return ERR_PTR(-EOPNOTSUPP); } + +static inline struct device * +iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) +{ + return NULL; +} #endif /* CONFIG_IOMMUFD_DRIVER_CORE */
/*
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Similar to the coverage of cache_invalidate_user for iotlb invalidation, add a device cache and a viommu_cache_invalidate function to test it out.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_test.h | 25 +++++++++ drivers/iommu/iommufd/selftest.c | 76 +++++++++++++++++++++++++++- 2 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index edced4ac7cd3..46558f83e734 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -54,6 +54,11 @@ enum { MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, };
+enum { + MOCK_DEV_CACHE_ID_MAX = 3, + MOCK_DEV_CACHE_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -152,6 +157,7 @@ struct iommu_test_hw_info { /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ #define IOMMU_HWPT_DATA_SELFTEST 0xdead #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef +#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
/** * struct iommu_hwpt_selftest @@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest {
#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
+/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef +#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef + +/** + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) + * @flags: Invalidate flags + * @cache_id: Invalidate cache entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored + */ +struct iommu_viommu_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) + __u32 flags; + __u32 vdev_id; + __u32 cache_id; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index e73353652d0b..6c382b235a4d 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -163,6 +163,7 @@ struct mock_dev { struct device dev; unsigned long flags; int id; + u32 cache[MOCK_DEV_CACHE_NUM]; };
static inline struct mock_dev *to_mock_dev(struct device *dev) @@ -609,9 +610,80 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, return &mock_nested->domain; }
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct iommu_viommu_invalidate_selftest *cmds; + struct iommu_viommu_invalidate_selftest *cur; + struct iommu_viommu_invalidate_selftest *end; + int rc; + + /* A zero-length array is allowed to validate the array type */ + if (array->entry_num == 0 && + array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { + array->entry_num = 0; + return 0; + } + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 3 * sizeof(u32)); + rc = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST); + if (rc) + goto out; + + while (cur != end) { + struct mock_dev *mdev; + struct device *dev; + int i; + + if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { + rc = -EOPNOTSUPP; + goto out; + } + + if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) { + rc = -EINVAL; + goto out; + } + + xa_lock(&viommu->vdevs); + dev = iommufd_viommu_find_dev(viommu, + (unsigned long)cur->vdev_id); + if (!dev) { + xa_unlock(&viommu->vdevs); + rc = -EINVAL; + goto out; + } + mdev = container_of(dev, struct mock_dev, dev); + + if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { + /* Invalidate all cache entries and ignore cache_id */ + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = 0; + } else { + mdev->cache[cur->cache_id] = 0; + } + xa_unlock(&viommu->vdevs); + + cur++; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return rc; +} + static struct iommufd_viommu_ops mock_viommu_ops = { .destroy = mock_viommu_destroy, .alloc_domain_nested = mock_viommu_alloc_domain_nested, + .cache_invalidate = mock_viommu_cache_invalidate, };
static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, @@ -782,7 +854,7 @@ static void mock_dev_release(struct device *dev) static struct mock_dev *mock_dev_create(unsigned long dev_flags) { struct mock_dev *mdev; - int rc; + int rc, i;
if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA)) @@ -796,6 +868,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags) mdev->flags = dev_flags; mdev->dev.release = mock_dev_release; mdev->dev.bus = &iommufd_mock_bus_type.bus; + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) + mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT;
rc = ida_alloc(&mock_dev_ida, GFP_KERNEL); if (rc < 0)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb, IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ drivers/iommu/iommufd/selftest.c | 22 +++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 7 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++++ 4 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 46558f83e734..a6b7a163f636 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -23,6 +23,7 @@ enum { IOMMU_TEST_OP_DIRTY, IOMMU_TEST_OP_MD_CHECK_IOTLB, IOMMU_TEST_OP_TRIGGER_IOPF, + IOMMU_TEST_OP_DEV_CHECK_CACHE, };
enum { @@ -140,6 +141,10 @@ struct iommu_test_cmd { __u32 perm; __u64 addr; } trigger_iopf; + struct { + __u32 id; + __u32 cache; + } check_dev_cache; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 6c382b235a4d..8e727bdca877 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -1125,6 +1125,24 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, return rc; }
+static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd, u32 idev_id, + unsigned int cache_id, u32 cache) +{ + struct iommufd_device *idev; + struct mock_dev *mdev; + int rc = 0; + + idev = iommufd_get_device(ucmd, idev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + mdev = container_of(idev->dev, struct mock_dev, dev); + + if (cache_id > MOCK_DEV_CACHE_ID_MAX || mdev->cache[cache_id] != cache) + rc = -EINVAL; + iommufd_put_object(ucmd->ictx, &idev->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1635,6 +1653,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_iotlb(ucmd, cmd->id, cmd->check_iotlb.id, cmd->check_iotlb.iotlb); + case IOMMU_TEST_OP_DEV_CHECK_CACHE: + return iommufd_test_dev_check_cache(ucmd, cmd->id, + cmd->check_dev_cache.id, + cmd->check_dev_cache.cache); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 8a4b961acae3..89dbea97f652 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -222,6 +222,8 @@ FIXTURE_SETUP(iommufd_ioas) for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id, &self->device_id); + test_cmd_dev_check_cache_all(self->device_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); self->base_iova = MOCK_APERTURE_START; } } @@ -1386,9 +1388,12 @@ FIXTURE_SETUP(iommufd_mock_domain)
ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
- for (i = 0; i != variant->mock_domains; i++) + for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], &self->hwpt_ids[i], &self->idev_ids[i]); + test_cmd_dev_check_cache_all(self->idev_ids[0], + IOMMU_TEST_DEV_CACHE_DEFAULT); + } self->hwpt_id = self->hwpt_ids[0];
self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 997bf1b2c7a8..3591c5c8de88 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -232,6 +232,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_i test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+#define test_cmd_dev_check_cache(device_id, cache_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_DEV_CHECK_CACHE, \ + .id = device_id, \ + .check_dev_cache = { \ + .id = cache_id, \ + .cache = expected, \ + }, \ + }; \ + ASSERT_EQ(0, ioctl(self->fd, \ + _IOMMU_TEST_CMD( \ + IOMMU_TEST_OP_DEV_CHECK_CACHE), \ + &test_cmd)); \ + }) + +#define test_cmd_dev_check_cache_all(device_id, expected) \ + ({ \ + int c; \ + for (c = 0; c < MOCK_DEV_CACHE_NUM; c++) \ + test_cmd_dev_check_cache(device_id, c, expected); \ + }) + static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, uint32_t data_type, uint32_t lreq, uint32_t *nreqs)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a viommu_cache test function to cover vIOMMU invalidations using the updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU via its hwpt_id field.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- tools/testing/selftests/iommu/iommufd.c | 173 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 32 ++++ 2 files changed, 205 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 89dbea97f652..8b37f56c2f51 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2530,4 +2530,177 @@ TEST_F(iommufd_viommu, vdevice_alloc) } }
+TEST_F(iommufd_viommu, vdevice_cache) +{ + struct iommu_viommu_invalidate_selftest inv_reqs[2] = {}; + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + uint32_t num_inv; + + if (dev_id) { + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + + test_cmd_dev_check_cache_all(dev_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Check data_type by passing zero-length array */ + num_inv = 0; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid data_type */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid data_uptr when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid entry_len when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 0, &num_inv); + assert(!num_inv); + + /* Negative test: invalid cache_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid vdev_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x9; + inv_reqs[0].cache_id = 0; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid flags configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0xffffffff; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 1; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid cache_id configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 2nd cache entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 1; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, 0); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 3rd and 4th cache entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 3; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + test_cmd_dev_check_cache_all(dev_id, 0); + + /* Invalidate all cache entries for nested_dev_id[1] and verify */ + num_inv = 1; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache_all(dev_id, 0); + test_ioctl_destroy(vdev_id); + } +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 3591c5c8de88..93039ff02fbf 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -287,6 +287,38 @@ static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, data_type, lreq, nreqs)); \ })
+static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs, + uint32_t data_type, uint32_t lreq, + uint32_t *nreqs) +{ + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = viommu_id, + .data_type = data_type, + .data_uptr = (uint64_t)reqs, + .entry_len = lreq, + .entry_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); + *nreqs = cmd.entry_num; + return rc; +} + +#define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs) \ + ({ \ + ASSERT_EQ(0, \ + _test_cmd_viommu_invalidate(self->fd, viommu, reqs, \ + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \ + lreq, nreqs)); \ + }) +#define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq, \ + nreqs) \ + ({ \ + EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate( \ + self->fd, viommu_id, reqs, \ + data_type, lreq, nreqs)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
With the introduction of the new object and its infrastructure, update the doc and the vIOMMU graph to reflect that.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- Documentation/userspace-api/iommufd.rst | 41 +++++++++++++++++++------ 1 file changed, 32 insertions(+), 9 deletions(-)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index a8b7766c2849..8ba868ce7960 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -94,6 +94,19 @@ Following IOMMUFD objects are exposed to userspace: backed by corresponding vIOMMU objects, in which case a guest OS would do the "dispatch" naturally instead of VMM trappings.
+- IOMMUFD_OBJ_VDEVICE, representing a virtual device for an IOMMUFD_OBJ_DEVICE + against an IOMMUFD_OBJ_VIOMMU. This virtual device holds the device's virtual + information or attributes (related to the vIOMMU) in a VM. An immediate vDATA + example can be the virtual ID of the device on a vIOMMU, which is a unique ID + that VMM assigns to the device for a translation channel/port of the vIOMMU, + e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to a + Context Table. Potential use cases of some advanced security information can + be forwarded via this object too, such as security level or realm information + in a Confidential Compute Architecture. A VMM should create a vDEVICE object + to forward all the device information in a VM, when it connects a device to a + vIOMMU, which is a separate ioctl call from attaching the same device to an + HWPT_PAGING that the vIOMMU holds. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -133,16 +146,16 @@ creating the objects and links:: |____________| |____________| |______|
_______________________________________________________________________ - | iommufd (with vIOMMU) | + | iommufd (with vIOMMU/vDEVICE) | | | - | [5] | - | _____________ | - | | | | - | |----------------| vIOMMU | | - | | | | | - | | | | | - | | [1] | | [4] [2] | - | | ______ | | _____________ ________ | + | [5] [6] | + | _____________ _____________ | + | | | | | | + | |----------------| vIOMMU |<---| vDEVICE |<----| | + | | | | |_____________| | | + | | | | | | + | | [1] | | [4] | [2] | + | | ______ | | _____________ _|______ | | | | | | [3] | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | @@ -215,6 +228,15 @@ creating the objects and links:: the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used as a nesting parent object to allocate an HWPT_NESTED object described above.
+6. IOMMUFD_OBJ_VDEVICE can be only manually created via the IOMMU_VDEVICE_ALLOC + uAPI, provided a viommu_id for an iommufd_viommu object and a dev_id for an + iommufd_device object. The vDEVICE object will be the binding between these + two parent objects. Another @virt_id will be also set via the uAPI providing + the iommufd core an index to store the vDEVICE object to a vDEVICE array per + vIOMMU. If necessary, the IOMMU driver may choose to implement a vdevce_alloc + op to init its HW for virtualization feature related to a vDEVICE. Successful + completion of this operation sets up the linkages between vIOMMU and device. + A device can only bind to an iommufd due to DMA ownership claim and attach to at most one IOAS object (no support of PASID yet).
@@ -228,6 +250,7 @@ User visible objects are backed by following datastructures: - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. +- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE.
Several terminologies when looking at these datastructures:
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known.
The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code.
Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation.
Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op.
Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more.
Acked-by: Alex Williamson alex.williamson@redhat.com Reviewed-by: Mostafa Saleh smostafa@google.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Donald Dutile ddutile@redhat.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ---------------- drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 ---------------- drivers/iommu/iommu.c | 10 ---------- drivers/iommu/iommufd/vfio_compat.c | 7 +------ drivers/vfio/vfio_iommu_type1.c | 12 +----------- include/linux/iommu.h | 3 --- include/uapi/linux/vfio.h | 2 +- 7 files changed, 3 insertions(+), 63 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2c7ab074cc2b..45bb8c43c304 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3619,21 +3619,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; }
-static int arm_smmu_enable_nesting(struct iommu_domain *domain) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - int ret = 0; - - mutex_lock(&smmu_domain->init_mutex); - if (smmu_domain->smmu) - ret = -EPERM; - else - smmu_domain->stage = ARM_SMMU_DOMAIN_S2; - mutex_unlock(&smmu_domain->init_mutex); - - return ret; -} - #ifdef CONFIG_ARM_SMMU_V3_HTTU static int arm_smmu_split_block(struct iommu_domain *domain, unsigned long iova, size_t size) @@ -3965,7 +3950,6 @@ static struct iommu_ops arm_smmu_ops = { .iotlb_sync_map = arm_smmu_iotlb_sync_map, #endif .iova_to_phys = arm_smmu_iova_to_phys, - .enable_nesting = arm_smmu_enable_nesting, #ifdef CONFIG_ARM_SMMU_V3_HTTU .support_dirty_log = arm_smmu_support_dirty_log, .switch_dirty_log = arm_smmu_switch_dirty_log, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2b57ae0e2166..29a68dacd56b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1538,21 +1538,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; }
-static int arm_smmu_enable_nesting(struct iommu_domain *domain) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - int ret = 0; - - mutex_lock(&smmu_domain->init_mutex); - if (smmu_domain->smmu) - ret = -EPERM; - else - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED; - mutex_unlock(&smmu_domain->init_mutex); - - return ret; -} - static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks) { @@ -1635,7 +1620,6 @@ static struct iommu_ops arm_smmu_ops = { .flush_iotlb_all = arm_smmu_flush_iotlb_all, .iotlb_sync = arm_smmu_iotlb_sync, .iova_to_phys = arm_smmu_iova_to_phys, - .enable_nesting = arm_smmu_enable_nesting, .set_pgtable_quirks = arm_smmu_set_pgtable_quirks, .free = arm_smmu_domain_free, } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 90f63e4a7ea0..9cdc28dc2cf8 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2713,16 +2713,6 @@ static int __init iommu_init(void) } core_initcall(iommu_init);
-int iommu_enable_nesting(struct iommu_domain *domain) -{ - if (domain->type != IOMMU_DOMAIN_UNMANAGED) - return -EINVAL; - if (!domain->ops->enable_nesting) - return -EINVAL; - return domain->ops->enable_nesting(domain); -} -EXPORT_SYMBOL_GPL(iommu_enable_nesting); - int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirk) { diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c index a3ad5f0b6c59..514aacd64009 100644 --- a/drivers/iommu/iommufd/vfio_compat.c +++ b/drivers/iommu/iommufd/vfio_compat.c @@ -291,12 +291,7 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx, case VFIO_DMA_CC_IOMMU: return iommufd_vfio_cc_iommu(ictx);
- /* - * This is obsolete, and to be removed from VFIO. It was an incomplete - * idea that got merged. - * https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.... - */ - case VFIO_TYPE1_NESTING_IOMMU: + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: return 0;
/* diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index af4af1bb02c3..128ed8ac8eb6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -79,7 +79,6 @@ struct vfio_iommu { uint64_t num_non_pinned_groups; uint64_t num_non_hwdbm_domains; bool v2; - bool nesting; bool dirty_page_tracking; struct list_head emulated_iommu_groups; bool dirty_log_get_no_clear; @@ -2694,12 +2693,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (!domain->domain) goto out_free_domain;
- if (iommu->nesting) { - ret = iommu_enable_nesting(domain->domain); - if (ret) - goto out_domain; - } - #ifdef CONFIG_HISI_VIRTCCA_CODA if (is_virtcca_cvm_enable() && iommu->secure) domain->domain->secure = true; @@ -3060,9 +3053,7 @@ static void *vfio_iommu_type1_open(unsigned long arg) switch (arg) { case VFIO_TYPE1_IOMMU: break; - case VFIO_TYPE1_NESTING_IOMMU: - iommu->nesting = true; - fallthrough; + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: case VFIO_TYPE1v2_IOMMU: iommu->v2 = true; break; @@ -3166,7 +3157,6 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, #ifdef CONFIG_HISI_VIRTCCA_CODA case VFIO_TYPE1v2_S_IOMMU: #endif - case VFIO_TYPE1_NESTING_IOMMU: case VFIO_UNMAP_ALL: return 1; case VFIO_UPDATE_VADDR: diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a2a83ad29962..6cdc80fde05c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -757,7 +757,6 @@ struct iommu_ops { * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform * specific mechanisms. - * @enable_nesting: Enable nesting * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @support_dirty_log: Check whether domain supports dirty log tracking * @switch_dirty_log: Perform actions to start|stop dirty log tracking @@ -789,7 +788,6 @@ struct iommu_domain_ops { dma_addr_t iova);
bool (*enforce_cache_coherency)(struct iommu_domain *domain); - int (*enable_nesting)(struct iommu_domain *domain); int (*set_pgtable_quirks)(struct iommu_domain *domain, unsigned long quirks);
@@ -1010,7 +1008,6 @@ extern void iommu_group_put(struct iommu_group *group); extern int iommu_group_id(struct iommu_group *group); extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
-int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index df0790085d3d..6e116724196d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -36,7 +36,7 @@ #define VFIO_EEH 5
/* Two-stage IOMMU */ -#define VFIO_TYPE1_NESTING_IOMMU 6 /* Implies v2 */ +#define __VFIO_RESERVED_TYPE1_NESTING_IOMMU 6 /* Implies v2 */
#define VFIO_SPAPR_TCE_v2_IOMMU 7
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
ACPICA commit c4f5c083d24df9ddd71d5782c0988408cf0fc1ab
The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory Access Flag field in the Memory Access Properties table, mainly for a PCI Root Complex.
This CANWBS defines the coherency of memory accesses to be not marked IOWB cacheable/shareable. Its value further implies the coherency impact from a pair of mismatched memory attributes (e.g. in a nested translation case): 0x0: Use of mismatched memory attributes for accesses made by this device may lead to a loss of coherency. 0x1: Coherency of accesses made by this device to locations in Conventional memory are ensured as follows, even if the memory attributes for the accesses presented by the device or provided by the SMMU are different from Inner and Outer Write-back cacheable, Shareable.
Link: https://github.com/acpica/acpica/commit/c4f5c083 Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Acked-by: Hanjun Guo guohanjun@huawei.com Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Donald Dutile ddutile@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/2-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- include/acpi/actbl2.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index f805bc581003..7eaaefe304c9 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -376,7 +376,7 @@ struct acpi_table_ccel { * IORT - IO Remapping Table * * Conforms to "IO Remapping Table System Software on ARM Platforms", - * Document number: ARM DEN 0049E.e, Sep 2022 + * Document number: ARM DEN 0049E.f, Apr 2024 * ******************************************************************************/
@@ -447,6 +447,7 @@ struct acpi_iort_memory_access {
#define ACPI_IORT_MF_COHERENCY (1) #define ACPI_IORT_MF_ATTRIBUTES (1<<1) +#define ACPI_IORT_MF_CANWBS (1<<2)
/* * IORT node specific subtables
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory Access Flag field in the Memory Access Properties table, mainly for a PCI Root Complex.
This CANWBS defines the coherency of memory accesses to be not marked IOWB cacheable/shareable. Its value further implies the coherency impact from a pair of mismatched memory attributes (e.g. in a nested translation case): 0x0: Use of mismatched memory attributes for accesses made by this device may lead to a loss of coherency. 0x1: Coherency of accesses made by this device to locations in Conventional memory are ensured as follows, even if the memory attributes for the accesses presented by the device or provided by the SMMU are different from Inner and Outer Write-back cacheable, Shareable.
Note that the loss of coherency on a CANWBS-unsupported HW typically could occur to an SMMU that doesn't implement the S2FWB feature where additional cache flush operations would be required to prevent that from happening.
Add a new ACPI_IORT_MF_CANWBS flag and set IOMMU_FWSPEC_PCI_RC_CANWBS upon the presence of this new flag.
CANWBS and S2FWB are similar features, in that they both guarantee the VM can not violate coherency, however S2FWB can be bypassed by PCI No Snoop TLPs, while CANWBS cannot. Thus CANWBS meets the requirements to set IOMMU_CAP_ENFORCE_CACHE_COHERENCY.
Architecturally ARM has expected that VFIO would disable No Snoop through PCI Config space, if this is done then the two would have the same protections.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Acked-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Donald Dutile ddutile@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/3-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/acpi/arm64/iort.c | 13 +++++++++++++ include/linux/iommu.h | 2 ++ 2 files changed, 15 insertions(+)
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index b1f483845bc0..ef6816ee8723 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -1218,6 +1218,17 @@ static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node) return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED; }
+static bool iort_pci_rc_supports_canwbs(struct acpi_iort_node *node) +{ + struct acpi_iort_memory_access *memory_access; + struct acpi_iort_root_complex *pci_rc; + + pci_rc = (struct acpi_iort_root_complex *)node->node_data; + memory_access = + (struct acpi_iort_memory_access *)&pci_rc->memory_properties; + return memory_access->memory_flags & ACPI_IORT_MF_CANWBS; +} + static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, u32 streamid) { @@ -1344,6 +1355,8 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in) fwspec = dev_iommu_fwspec_get(dev); if (fwspec && iort_pci_rc_supports_ats(node)) fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS; + if (fwspec && iort_pci_rc_supports_canwbs(node)) + fwspec->flags |= IOMMU_FWSPEC_PCI_RC_CANWBS; } else { node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT, iort_match_node_callback, dev); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6cdc80fde05c..c616ba00bb86 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1188,6 +1188,8 @@ struct iommu_fwspec {
/* ATS is supported */ #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0) +/* CANWBS is supported */ +#define IOMMU_FWSPEC_PCI_RC_CANWBS (1 << 1)
/* * An iommu attach handle represents a relationship between an iommu domain
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
HW with CANWBS is always cache coherent and ignores PCI No Snoop requests as well. This meets the requirement for IOMMU_CAP_ENFORCE_CACHE_COHERENCY, so let's return it.
Implement the enforce_cache_coherency() op to reject attaching devices that don't have CANWBS.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Mostafa Saleh smostafa@google.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Donald Dutile ddutile@redhat.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 31 +++++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +++++ 2 files changed, 38 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 45bb8c43c304..44b26704b481 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2505,6 +2505,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) case IOMMU_CAP_CACHE_COHERENCY: /* Assume that a coherent TCU implies coherent TBUs */ return master->smmu->features & ARM_SMMU_FEAT_COHERENCY; + case IOMMU_CAP_ENFORCE_CACHE_COHERENCY: + return arm_smmu_master_canwbs(master); case IOMMU_CAP_NOEXEC: case IOMMU_CAP_DEFERRED_FLUSH: return true; @@ -2515,6 +2517,26 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) } }
+static bool arm_smmu_enforce_cache_coherency(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master_domain *master_domain; + unsigned long flags; + bool ret = true; + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master_domain, &smmu_domain->devices, + devices_elm) { + if (!arm_smmu_master_canwbs(master_domain->master)) { + ret = false; + break; + } + } + smmu_domain->enforce_cache_coherency = ret; + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + return ret; +} + struct arm_smmu_domain *arm_smmu_domain_alloc(void) { struct arm_smmu_domain *smmu_domain; @@ -2956,6 +2978,14 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, * one of them. */ spin_lock_irqsave(&smmu_domain->devices_lock, flags); + if (smmu_domain->enforce_cache_coherency && + !arm_smmu_master_canwbs(master)) { + spin_unlock_irqrestore(&smmu_domain->devices_lock, + flags); + kfree(master_domain); + return -EINVAL; + } + if (state->ats_enabled) atomic_inc(&smmu_domain->nr_ats_masters); list_add(&master_domain->devices_elm, &smmu_domain->devices); @@ -3941,6 +3971,7 @@ static struct iommu_ops arm_smmu_ops = { .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = arm_smmu_attach_dev, + .enforce_cache_coherency = arm_smmu_enforce_cache_coherency, .set_dev_pasid = arm_smmu_s1_set_dev_pasid, .map_pages = arm_smmu_map_pages, .unmap_pages = arm_smmu_unmap_pages, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 45cad8ba28c4..77dfb396ad9c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -856,6 +856,7 @@ struct arm_smmu_domain { /* List of struct arm_smmu_master_domain */ struct list_head devices; spinlock_t devices_lock; + bool enforce_cache_coherency : 1;
struct mmu_notifier mmu_notifier;
@@ -934,6 +935,12 @@ void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, unsigned long iova, size_t size);
+static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master) +{ + return dev_iommu_fwspec_get(master->dev)->flags & + IOMMU_FWSPEC_PCI_RC_CANWBS; +} + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU instance need to be available to the VMM so it can construct an appropriate vSMMUv3 that reflects the correct HW capabilities.
For userspace page tables these values are required to constrain the valid values within the CD table and the IOPTEs.
The kernel does not sanitize these values. If building a VMM then userspace is required to only forward bits into a VM that it knows it can implement. Some bits will also require a VMM to detect if appropriate kernel support is available such as for ATS and BTM.
Start a new file and kconfig for the advanced iommufd support. This lets it be compiled out for kernels that are not intended to support virtualization, and allows distros to leave it disabled until they are shipping a matching qemu too.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Donald Dutile ddutile@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/Kconfig | 9 +++++ drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++ include/uapi/linux/iommufd.h | 35 +++++++++++++++++++ 6 files changed, 86 insertions(+) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 9a28b0f824e8..3dc1c6bd434e 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -437,6 +437,15 @@ config ARM_SMMU_V3_ECMDQ
If not sure, say no.
+config ARM_SMMU_V3_IOMMUFD + bool "Enable IOMMUFD features for ARM SMMUv3 (EXPERIMENTAL)" + depends on IOMMUFD + help + Support for IOMMUFD features intended to support virtual machines + with accelerated virtual IOMMUs. + + Say Y here if you are doing development and testing on this feature. + config ARM_SMMU_V3_KUNIT_TEST tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS depends on KUNIT diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 82beca51e6b2..92aaf97b5ced 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-y := arm-smmu-v3.o arm_smmu_v3-$(CONFIG_HISI_VIRTCCA_HOST) += arm-s-smmu-v3.o +arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_IOMMUFD) += arm-smmu-v3-iommufd.o arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c new file mode 100644 index 000000000000..3d2671031c9b --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#include <uapi/linux/iommufd.h> + +#include "arm-smmu-v3.h" + +void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct iommu_hw_info_arm_smmuv3 *info; + u32 __iomem *base_idr; + unsigned int i; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + base_idr = master->smmu->base + ARM_SMMU_IDR0; + for (i = 0; i <= 5; i++) + info->idr[i] = readl_relaxed(base_idr + i); + info->iidr = readl_relaxed(master->smmu->base + ARM_SMMU_IIDR); + info->aidr = readl_relaxed(master->smmu->base + ARM_SMMU_AIDR); + + *length = sizeof(*info); + *type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3; + + return info; +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 44b26704b481..7e4f88f1a256 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3954,6 +3954,7 @@ static struct iommu_ops arm_smmu_ops = { .identity_domain = &arm_smmu_identity_domain, .blocked_domain = &arm_smmu_blocked_domain, .capable = arm_smmu_capable, + .hw_info = arm_smmu_hw_info, .domain_alloc_paging = arm_smmu_domain_alloc_paging, .domain_alloc_sva = arm_smmu_sva_domain_alloc, .domain_alloc_user = arm_smmu_domain_alloc_user, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 77dfb396ad9c..e290d9eae41d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -88,6 +88,8 @@ #define IIDR_REVISION GENMASK(15, 12) #define IIDR_IMPLEMENTER GENMASK(11, 0)
+#define ARM_SMMU_AIDR 0x1C + #define ARM_SMMU_CR0 0x20 #define CR0_ATSCHK (1 << 4) #define CR0_CMDQEN (1 << 3) @@ -992,4 +994,11 @@ static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, { } #endif /* CONFIG_ARM_SMMU_V3_SVA */ + +#if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD) +void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); +#else +#define arm_smmu_hw_info NULL +#endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */ + #endif /* _ARM_SMMU_V3_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index e266dfa6a38d..b227ac16333f 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -488,15 +488,50 @@ struct iommu_hw_info_vtd { __aligned_u64 ecap_reg; };
+/** + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) + * + * @flags: Must be set to 0 + * @__reserved: Must be 0 + * @idr: Implemented features for ARM SMMU Non-secure programming interface + * @iidr: Information about the implementation and implementer of ARM SMMU, + * and architecture version supported + * @aidr: ARM SMMU architecture version + * + * For the details of @idr, @iidr and @aidr, please refer to the chapters + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. + * + * User space should read the underlying ARM SMMUv3 hardware information for + * the list of supported features. + * + * Note that these values reflect the raw HW capability, without any insight if + * any required kernel driver support is present. Bits may be set indicating the + * HW has functionality that is lacking kernel software support, such as BTM. If + * a VMM is using this information to construct emulated copies of these + * registers it should only forward bits that it knows it can support. + * + * In future, presence of required kernel support will be indicated in flags. + */ +struct iommu_hw_info_arm_smmuv3 { + __u32 flags; + __u32 __reserved; + __u32 idr[6]; + __u32 iidr; + __u32 aidr; +}; + /** * enum iommu_hw_info_type - IOMMU Hardware Info Types * @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that do not report hardware * info * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type + * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type */ enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_NONE = 0, IOMMU_HW_INFO_TYPE_INTEL_VTD = 1, + IOMMU_HW_INFO_TYPE_ARM_SMMUV3 = 2, };
/**
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
For SMMUv3 the parent must be a S2 domain, which can be composed into a IOMMU_DOMAIN_NESTED.
In future the S2 parent will also need a VMID linked to the VIOMMU and even to KVM.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Mostafa Saleh smostafa@google.com Reviewed-by: Donald Dutile ddutile@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/6-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 7e4f88f1a256..0406f96ec021 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3339,7 +3339,8 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, const struct iommu_user_data *user_data) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); - const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING; + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | + IOMMU_HWPT_ALLOC_NEST_PARENT; struct arm_smmu_domain *smmu_domain; int ret;
@@ -3352,6 +3353,14 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, if (!smmu_domain) return ERR_PTR(-ENOMEM);
+ if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) { + if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) { + ret = -EOPNOTSUPP; + goto err_free; + } + smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + } + smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops; ret = arm_smmu_domain_finalise(smmu_domain, master->smmu, flags);
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
The arm-smmuv3-iommufd.c file will need to call these functions too. Remove statics and put them in the header file. Remove the kunit visibility protections from arm_smmu_make_abort_ste() and arm_smmu_make_s2_domain_ste().
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Mostafa Saleh smostafa@google.com Reviewed-by: Donald Dutile ddutile@redhat.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.co... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++------------- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 27 +++++++++++++++++---- 2 files changed, 27 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0406f96ec021..eb6f65303bf5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1738,7 +1738,6 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, } }
-VISIBLE_IF_KUNIT void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) { memset(target, 0, sizeof(*target)); @@ -1821,7 +1820,6 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, } EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_cdtable_ste);
-VISIBLE_IF_KUNIT void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, @@ -2730,8 +2728,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) } }
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, - const struct arm_smmu_ste *target) +void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, + const struct arm_smmu_ste *target) { int i, j; struct arm_smmu_device *smmu = master->smmu; @@ -2896,16 +2894,6 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); }
-struct arm_smmu_attach_state { - /* Inputs */ - struct iommu_domain *old_domain; - struct arm_smmu_master *master; - bool cd_needs_ats; - ioasid_t ssid; - /* Resulting state */ - bool ats_enabled; -}; - /* * Start the sequence to attach a domain to a master. The sequence contains three * steps: @@ -2926,8 +2914,8 @@ struct arm_smmu_attach_state { * new_domain can be a non-paging domain. In this case ATS will not be enabled, * and invalidations won't be tracked. */ -static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, - struct iommu_domain *new_domain) +int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, + struct iommu_domain *new_domain) { struct arm_smmu_master *master = state->master; struct arm_smmu_master_domain *master_domain; @@ -3009,7 +2997,7 @@ static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, * completes synchronizing the PCI device's ATC and finishes manipulating the * smmu_domain->devices list. */ -static void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) +void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) { struct arm_smmu_master *master = state->master;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index e290d9eae41d..9a00026dff28 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -880,21 +880,22 @@ struct arm_smmu_entry_writer_ops { void (*sync)(struct arm_smmu_entry_writer *writer); };
+void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, + struct arm_smmu_master *master, + struct arm_smmu_domain *smmu_domain, + bool ats_enabled); + #if IS_ENABLED(CONFIG_KUNIT) void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits); void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur, const __le64 *target); void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits); -void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, struct arm_smmu_ste *target); void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, struct arm_smmu_master *master, bool ats_enabled, unsigned int s1dss); -void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, - struct arm_smmu_master *master, - struct arm_smmu_domain *smmu_domain, - bool ats_enabled); void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, struct arm_smmu_master *master, struct mm_struct *mm, u16 asid); @@ -943,6 +944,22 @@ static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master) IOMMU_FWSPEC_PCI_RC_CANWBS; }
+struct arm_smmu_attach_state { + /* Inputs */ + struct iommu_domain *old_domain; + struct arm_smmu_master *master; + bool cd_needs_ats; + ioasid_t ssid; + /* Resulting state */ + bool ats_enabled; +}; + +int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, + struct iommu_domain *new_domain); +void arm_smmu_attach_commit(struct arm_smmu_attach_state *state); +void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, + const struct arm_smmu_ste *target); + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Add a new driver-type for ARM SMMUv3 to enum iommu_viommu_type. Implement an arm_vsmmu_alloc().
As an initial step, copy the VMID from s2_parent. A followup series is required to give the VIOMMU object it's own VMID that will be used in all nesting configurations.
Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 45 +++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 ++++++ include/uapi/linux/iommufd.h | 4 ++ 4 files changed, 63 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 3d2671031c9b..60dd9e907595 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -29,3 +29,48 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type)
return info; } + +static const struct iommufd_viommu_ops arm_vsmmu_ops = { +}; + +struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, + struct iommu_domain *parent, + struct iommufd_ctx *ictx, + unsigned int viommu_type) +{ + struct arm_smmu_device *smmu = + iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *s2_parent = to_smmu_domain(parent); + struct arm_vsmmu *vsmmu; + + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) + return ERR_PTR(-EOPNOTSUPP); + + if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) + return ERR_PTR(-EOPNOTSUPP); + + if (s2_parent->smmu != master->smmu) + return ERR_PTR(-EINVAL); + + /* + * Must support some way to prevent the VM from bypassing the cache + * because VFIO currently does not do any cache maintenance. canwbs + * indicates the device is fully coherent and no cache maintenance is + * ever required, even for PCI No-Snoop. + */ + if (!arm_smmu_master_canwbs(master)) + return ERR_PTR(-EOPNOTSUPP); + + vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, + &arm_vsmmu_ops); + if (IS_ERR(vsmmu)) + return ERR_CAST(vsmmu); + + vsmmu->smmu = smmu; + vsmmu->s2_parent = s2_parent; + /* FIXME Move VMID allocation from the S2 domain allocation to here */ + vsmmu->vmid = s2_parent->s2_cfg.vmid; + + return &vsmmu->core; +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index eb6f65303bf5..33282873d8bc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3965,6 +3965,7 @@ static struct iommu_ops arm_smmu_ops = { .dev_disable_feat = arm_smmu_dev_disable_feature, .page_response = arm_smmu_page_response, .def_domain_type = arm_smmu_def_domain_type, + .viommu_alloc = arm_vsmmu_alloc, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 9a00026dff28..6f99f4fb5fc4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -10,6 +10,7 @@
#include <linux/bitfield.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/kernel.h> #include <linux/mmzone.h> #include <linux/sizes.h> @@ -1012,10 +1013,22 @@ static inline void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain, } #endif /* CONFIG_ARM_SMMU_V3_SVA */
+struct arm_vsmmu { + struct iommufd_viommu core; + struct arm_smmu_device *smmu; + struct arm_smmu_domain *s2_parent; + u16 vmid; +}; + #if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD) void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); +struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, + struct iommu_domain *parent, + struct iommufd_ctx *ictx, + unsigned int viommu_type); #else #define arm_smmu_hw_info NULL +#define arm_vsmmu_alloc NULL #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
#endif /* _ARM_SMMU_V3_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index b227ac16333f..27c5117db985 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -400,10 +400,12 @@ struct iommu_hwpt_vtd_s1 { * enum iommu_hwpt_data_type - IOMMU HWPT Data Type * @IOMMU_HWPT_DATA_NONE: no data * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table + * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table */ enum iommu_hwpt_data_type { IOMMU_HWPT_DATA_NONE = 0, IOMMU_HWPT_DATA_VTD_S1 = 1, + IOMMU_HWPT_DATA_ARM_SMMUV3 = 2, };
/** @@ -843,9 +845,11 @@ struct iommu_fault_alloc { /** * enum iommu_viommu_type - Virtual IOMMU Type * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use + * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type */ enum iommu_viommu_type { IOMMU_VIOMMU_TYPE_DEFAULT = 0, + IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1, };
/**
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting as the parent and a user provided STE fragment that defines the CD table and related data with addresses translated by the S2 iommu_domain.
The kernel only permits userspace to control certain allowed bits of the STE that are safe for user/guest control.
IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2 translation, but there is no way of knowing which S1 entries refer to a range of S2.
For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to flush all ASIDs from the VMID after flushing the S2 on any change to the S2.
The IOMMU_DOMAIN_NESTED can only be created from inside a VIOMMU as the invalidation path relies on the VIOMMU to translate virtual stream ID used in the invalidation commands for the CD table and ATS.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 163 ++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++ include/uapi/linux/iommufd.h | 20 +++ 4 files changed, 225 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 60dd9e907595..91247a2a2d2c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -30,7 +30,170 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type) return info; }
+static void arm_smmu_make_nested_cd_table_ste( + struct arm_smmu_ste *target, struct arm_smmu_master *master, + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) +{ + arm_smmu_make_s2_domain_ste( + target, master, nested_domain->vsmmu->s2_parent, ats_enabled); + + target->data[0] = cpu_to_le64(STRTAB_STE_0_V | + FIELD_PREP(STRTAB_STE_0_CFG, + STRTAB_STE_0_CFG_NESTED)); + target->data[0] |= nested_domain->ste[0] & + ~cpu_to_le64(STRTAB_STE_0_CFG); + target->data[1] |= nested_domain->ste[1]; +} + +/* + * Create a physical STE from the virtual STE that userspace provided when it + * created the nested domain. Using the vSTE userspace can request: + * - Non-valid STE + * - Abort STE + * - Bypass STE (install the S2, no CD table) + * - CD table STE (install the S2 and the userspace CD table) + */ +static void arm_smmu_make_nested_domain_ste( + struct arm_smmu_ste *target, struct arm_smmu_master *master, + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) +{ + unsigned int cfg = + FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0])); + + /* + * Userspace can request a non-valid STE through the nesting interface. + * We relay that into an abort physical STE with the intention that + * C_BAD_STE for this SID can be generated to userspace. + */ + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) + cfg = STRTAB_STE_0_CFG_ABORT; + + switch (cfg) { + case STRTAB_STE_0_CFG_S1_TRANS: + arm_smmu_make_nested_cd_table_ste(target, master, nested_domain, + ats_enabled); + break; + case STRTAB_STE_0_CFG_BYPASS: + arm_smmu_make_s2_domain_ste(target, master, + nested_domain->vsmmu->s2_parent, + ats_enabled); + break; + case STRTAB_STE_0_CFG_ABORT: + default: + arm_smmu_make_abort_ste(target); + break; + } +} + +static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, + struct device *dev) +{ + struct arm_smmu_nested_domain *nested_domain = + to_smmu_nested_domain(domain); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_attach_state state = { + .master = master, + .old_domain = iommu_get_domain_for_dev(dev), + .ssid = IOMMU_NO_PASID, + /* Currently invalidation of ATC is not supported */ + .disable_ats = true, + }; + struct arm_smmu_ste ste; + int ret; + + if (nested_domain->vsmmu->smmu != master->smmu) + return -EINVAL; + if (arm_smmu_ssids_in_use(&master->cd_table)) + return -EBUSY; + + mutex_lock(&arm_smmu_asid_lock); + ret = arm_smmu_attach_prepare(&state, domain); + if (ret) { + mutex_unlock(&arm_smmu_asid_lock); + return ret; + } + + arm_smmu_make_nested_domain_ste(&ste, master, nested_domain, + state.ats_enabled); + arm_smmu_install_ste_for_dev(master, &ste); + arm_smmu_attach_commit(&state); + mutex_unlock(&arm_smmu_asid_lock); + return 0; +} + +static void arm_smmu_domain_nested_free(struct iommu_domain *domain) +{ + kfree(to_smmu_nested_domain(domain)); +} + +static const struct iommu_domain_ops arm_smmu_nested_ops = { + .attach_dev = arm_smmu_attach_dev_nested, + .free = arm_smmu_domain_nested_free, +}; + +static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) +{ + unsigned int cfg; + + if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { + memset(arg->ste, 0, sizeof(arg->ste)); + return 0; + } + + /* EIO is reserved for invalid STE data. */ + if ((arg->ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) || + (arg->ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED)) + return -EIO; + + cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(arg->ste[0])); + if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS && + cfg != STRTAB_STE_0_CFG_S1_TRANS) + return -EIO; + return 0; +} + +static struct iommu_domain * +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data) +{ + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); + const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID; + struct arm_smmu_nested_domain *nested_domain; + struct iommu_hwpt_arm_smmuv3 arg; + int ret; + + /* + * Faults delivered to the nested domain are faults that originated by + * the S1 in the domain. The core code will match all PASIDs when + * delivering the fault due to user_pasid_table + */ + if (flags & ~SUPPORTED_FLAGS) + return ERR_PTR(-EOPNOTSUPP); + + ret = iommu_copy_struct_from_user(&arg, user_data, + IOMMU_HWPT_DATA_ARM_SMMUV3, ste); + if (ret) + return ERR_PTR(ret); + + ret = arm_smmu_validate_vste(&arg); + if (ret) + return ERR_PTR(ret); + + nested_domain = kzalloc(sizeof(*nested_domain), GFP_KERNEL_ACCOUNT); + if (!nested_domain) + return ERR_PTR(-ENOMEM); + + nested_domain->domain.type = IOMMU_DOMAIN_NESTED; + nested_domain->domain.ops = &arm_smmu_nested_ops; + nested_domain->vsmmu = vsmmu; + nested_domain->ste[0] = arg.ste[0]; + nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); + + return &nested_domain->domain; +} + static const struct iommufd_viommu_ops arm_vsmmu_ops = { + .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, };
struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 33282873d8bc..9a4476f6c5b2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -377,6 +377,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) case CMDQ_OP_TLBI_NH_ASID: cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); fallthrough; + case CMDQ_OP_TLBI_NH_ALL: case CMDQ_OP_TLBI_S12_VMALL: cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); break; @@ -2434,6 +2435,15 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, } __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
+ if (smmu_domain->nest_parent) { + /* + * When the S2 domain changes all the nested S1 ASIDs have to be + * flushed too. + */ + cmd.opcode = CMDQ_OP_TLBI_NH_ALL; + arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd); + } + /* * Unfortunately, this can't be leaf-only since we may have * zapped an entire table. @@ -2869,6 +2879,8 @@ to_smmu_domain_devices(struct iommu_domain *domain) if ((domain->type & __IOMMU_DOMAIN_PAGING) || domain->type == IOMMU_DOMAIN_SVA) return to_smmu_domain(domain); + if (domain->type == IOMMU_DOMAIN_NESTED) + return to_smmu_nested_domain(domain)->vsmmu->s2_parent; return NULL; }
@@ -2941,7 +2953,8 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, * enabled if we have arm_smmu_domain, those always have page * tables. */ - state->ats_enabled = arm_smmu_ats_supported(master); + state->ats_enabled = !state->disable_ats && + arm_smmu_ats_supported(master); }
if (smmu_domain) { @@ -3347,6 +3360,7 @@ arm_smmu_domain_alloc_user(struct device *dev, u32 flags, goto err_free; } smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + smmu_domain->nest_parent = true; }
smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; @@ -3966,6 +3980,7 @@ static struct iommu_ops arm_smmu_ops = { .page_response = arm_smmu_page_response, .def_domain_type = arm_smmu_def_domain_type, .viommu_alloc = arm_vsmmu_alloc, + .user_pasid_table = 1, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 6f99f4fb5fc4..3bb9c157a6c8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -274,6 +274,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) #define STRTAB_STE_0_CFG_BYPASS 4 #define STRTAB_STE_0_CFG_S1_TRANS 5 #define STRTAB_STE_0_CFG_S2_TRANS 6 +#define STRTAB_STE_0_CFG_NESTED 7
#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4) #define STRTAB_STE_0_S1FMT_LINEAR 0 @@ -324,6 +325,15 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
+/* These bits can be controlled by userspace for STRTAB_STE_0_CFG_NESTED */ +#define STRTAB_STE_0_NESTING_ALLOWED \ + cpu_to_le64(STRTAB_STE_0_V | STRTAB_STE_0_CFG | STRTAB_STE_0_S1FMT | \ + STRTAB_STE_0_S1CTXPTR_MASK | STRTAB_STE_0_S1CDMAX) +#define STRTAB_STE_1_NESTING_ALLOWED \ + cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ + STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ + STRTAB_STE_1_S1STALLD) + /* * Context descriptors. * @@ -543,6 +553,7 @@ struct arm_smmu_cmdq_ent { }; } cfgi;
+ #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID 0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 #define CMDQ_OP_TLBI_EL2_ALL 0x20 @@ -860,6 +871,7 @@ struct arm_smmu_domain { struct list_head devices; spinlock_t devices_lock; bool enforce_cache_coherency : 1; + bool nest_parent : 1;
struct mmu_notifier mmu_notifier;
@@ -869,6 +881,13 @@ struct arm_smmu_domain { #endif };
+struct arm_smmu_nested_domain { + struct iommu_domain domain; + struct arm_vsmmu *vsmmu; + + __le64 ste[2]; +}; + /* The following are exposed for testing purposes. */ struct arm_smmu_entry_writer_ops; struct arm_smmu_entry_writer { @@ -913,6 +932,12 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) return container_of(dom, struct arm_smmu_domain, domain); }
+static inline struct arm_smmu_nested_domain * +to_smmu_nested_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_nested_domain, domain); +} + extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock;
@@ -950,6 +975,7 @@ struct arm_smmu_attach_state { struct iommu_domain *old_domain; struct arm_smmu_master *master; bool cd_needs_ats; + bool disable_ats; ioasid_t ssid; /* Resulting state */ bool ats_enabled; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 27c5117db985..47ee35ce050b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -396,6 +396,26 @@ struct iommu_hwpt_vtd_s1 { __u32 __reserved; };
+/** + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 nested STE + * (IOMMU_HWPT_DATA_ARM_SMMUV3) + * + * @ste: The first two double words of the user space Stream Table Entry for + * the translation. Must be little-endian. + * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) + * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax + * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * + * -EIO will be returned if @ste is not legal or contains any non-allowed field. + * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass + * nested domain will translate the same as the nesting parent. The S1 will + * install a Context Descriptor Table pointing at userspace memory translated + * by the nesting parent. + */ +struct iommu_hwpt_arm_smmuv3 { + __aligned_le64 ste[2]; +}; + /** * enum iommu_hwpt_data_type - IOMMU HWPT Data Type * @IOMMU_HWPT_DATA_NONE: no data
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field works. When S2FWB is supported and enabled the IOPTE will force cachable access to IOMMU_CACHE memory when nesting with a S1 and deny cachable access when !IOMMU_CACHE.
When using a single stage of translation, a simple S2 domain, it doesn't change things for PCI devices as it is just a different encoding for the existing mapping of the IOMMU protection flags to cachability attributes. For non-PCI it also changes the combining rules when incoming transactions have inconsistent attributes.
However, when used with a nested S1, FWB has the effect of preventing the guest from choosing a MemAttr in it's S1 that would cause ordinary DMA to bypass the cache. Consistent with KVM we wish to deny the guest the ability to become incoherent with cached memory the hypervisor believes is cachable so we don't have to flush it.
Allow NESTED domains to be created if the SMMU has S2FWB support and use S2FWB for NESTING_PARENTS. This is an additional option to CANWBS.
Reviewed-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 7 +++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +++ drivers/iommu/io-pgtable-arm.c | 27 ++++++++++++++++--- include/linux/io-pgtable.h | 2 ++ 5 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 91247a2a2d2c..a1c8fcd4797c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -220,9 +220,12 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, * Must support some way to prevent the VM from bypassing the cache * because VFIO currently does not do any cache maintenance. canwbs * indicates the device is fully coherent and no cache maintenance is - * ever required, even for PCI No-Snoop. + * ever required, even for PCI No-Snoop. S2FWB means the S1 can't make + * things non-coherent using the memattr, but No-Snoop behavior is not + * effected. */ - if (!arm_smmu_master_canwbs(master)) + if (!arm_smmu_master_canwbs(master) && + !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP);
vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9a4476f6c5b2..2ca079df2fa2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1233,7 +1233,8 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) /* S2 translates */ if (cfg & BIT(1)) { used_bits[1] |= - cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); + cpu_to_le64(STRTAB_STE_1_S2FWB | STRTAB_STE_1_EATS | + STRTAB_STE_1_SHCFG); used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | @@ -1843,6 +1844,8 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, FIELD_PREP(STRTAB_STE_1_EATS, ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+ if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_S2FWB) + target->data[1] |= cpu_to_le64(STRTAB_STE_1_S2FWB); if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) target->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING)); @@ -2689,6 +2692,9 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, pgtbl_cfg.oas = smmu->oas; fmt = ARM_64_LPAE_S2; finalise_stage_fn = arm_smmu_domain_finalise_s2; + if ((smmu->features & ARM_SMMU_FEAT_S2FWB) && + (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_S2FWB; break; default: return -EINVAL; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3bb9c157a6c8..900cc4d341c6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -65,6 +65,7 @@ #define IDR3_BBML0 0 #define IDR3_BBML1 1 #define IDR3_BBML2 2 +#define IDR3_FWB (1 << 8) #define IDR3_RIL (1 << 10)
#define ARM_SMMU_IDR5 0x14 @@ -295,6 +296,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) #define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4) #define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6)
+#define STRTAB_STE_1_S2FWB (1UL << 25) #define STRTAB_STE_1_S1STALLD (1UL << 27)
#define STRTAB_STE_1_EATS GENMASK_ULL(29, 28) @@ -767,6 +769,7 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_BBML1 (1 << 23) #define ARM_SMMU_FEAT_BBML2 (1 << 24) #define ARM_SMMU_FEAT_ECMDQ (1 << 25) +#define ARM_SMMU_FEAT_S2FWB (1 << 26) u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 5e89e572b890..865a1d475551 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -107,6 +107,18 @@ #define ARM_LPAE_PTE_HAP_FAULT (((arm_lpae_iopte)0) << 6) #define ARM_LPAE_PTE_HAP_READ (((arm_lpae_iopte)1) << 6) #define ARM_LPAE_PTE_HAP_WRITE (((arm_lpae_iopte)2) << 6) +/* + * For !FWB these code to: + * 1111 = Normal outer write back cachable / Inner Write Back Cachable + * Permit S1 to override + * 0101 = Normal Non-cachable / Inner Non-cachable + * 0001 = Device / Device-nGnRE + * For S2FWB these code: + * 0110 Force Normal Write Back + * 0101 Normal* is forced Normal-NC, Device unchanged + * 0001 Force Device-nGnRE + */ +#define ARM_LPAE_PTE_MEMATTR_FWB_WB (((arm_lpae_iopte)0x6) << 2) #define ARM_LPAE_PTE_MEMATTR_OIWB (((arm_lpae_iopte)0xf) << 2) #define ARM_LPAE_PTE_MEMATTR_NC (((arm_lpae_iopte)0x5) << 2) #define ARM_LPAE_PTE_MEMATTR_DEV (((arm_lpae_iopte)0x1) << 2) @@ -477,12 +489,16 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, */ if (data->iop.fmt == ARM_64_LPAE_S2 || data->iop.fmt == ARM_32_LPAE_S2) { - if (prot & IOMMU_MMIO) + if (prot & IOMMU_MMIO) { pte |= ARM_LPAE_PTE_MEMATTR_DEV; - else if (prot & IOMMU_CACHE) - pte |= ARM_LPAE_PTE_MEMATTR_OIWB; - else + } else if (prot & IOMMU_CACHE) { + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_S2FWB) + pte |= ARM_LPAE_PTE_MEMATTR_FWB_WB; + else + pte |= ARM_LPAE_PTE_MEMATTR_OIWB; + } else { pte |= ARM_LPAE_PTE_MEMATTR_NC; + } } else { if (prot & IOMMU_MMIO) pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV @@ -1436,6 +1452,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) IO_PGTABLE_QUIRK_ARM_BBML2)) return NULL;
+ if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_S2FWB)) + return NULL; + data = arm_lpae_alloc_pgtable(cfg); if (!data) return NULL; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 954119ae0f84..87950d35b996 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -93,6 +93,7 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior * when changing block size. + * IO_PGTABLE_QUIRK_ARM_S2FWB: Use the FWB format for the MemAttrs bits */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -103,6 +104,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_ARM_HD BIT(7) #define IO_PGTABLE_QUIRK_ARM_BBML1 BIT(8) #define IO_PGTABLE_QUIRK_ARM_BBML2 BIT(9) + #define IO_PGTABLE_QUIRK_ARM_S2FWB BIT(10) unsigned long quirks; unsigned long pgsize_bitmap; unsigned int ias;
From: Jason Gunthorpe jgg@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
The EATS flag needs to flow through the vSTE and into the pSTE, and ensure physical ATS is enabled on the PCI device.
The physical ATS state must match the VM's idea of EATS as we rely on the VM to issue the ATS invalidation commands. Thus ATS must remain off at the device until EATS on a nesting domain turns it on. Attaching a nesting domain is the point where the invalidation responsibility transfers to userspace.
Update the ATS logic to track EATS for nesting domains and flush the ATC whenever the S2 nesting parent changes.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 31 ++++++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 +++++++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++- include/uapi/linux/iommufd.h | 2 +- 4 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index a1c8fcd4797c..84c8a21c00ae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -95,8 +95,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, .master = master, .old_domain = iommu_get_domain_for_dev(dev), .ssid = IOMMU_NO_PASID, - /* Currently invalidation of ATC is not supported */ - .disable_ats = true, }; struct arm_smmu_ste ste; int ret; @@ -107,6 +105,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, return -EBUSY;
mutex_lock(&arm_smmu_asid_lock); + /* + * The VM has to control the actual ATS state at the PCI device because + * we forward the invalidations directly from the VM. If the VM doesn't + * think ATS is on it will not generate ATC flushes and the ATC will + * become incoherent. Since we can't access the actual virtual PCI ATS + * config bit here base this off the EATS value in the STE. If the EATS + * is set then the VM must generate ATC flushes. + */ + state.disable_ats = !nested_domain->enable_ats; ret = arm_smmu_attach_prepare(&state, domain); if (ret) { mutex_unlock(&arm_smmu_asid_lock); @@ -131,8 +138,10 @@ static const struct iommu_domain_ops arm_smmu_nested_ops = { .free = arm_smmu_domain_nested_free, };
-static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) +static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg, + bool *enable_ats) { + unsigned int eats; unsigned int cfg;
if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { @@ -149,6 +158,18 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg) if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS && cfg != STRTAB_STE_0_CFG_S1_TRANS) return -EIO; + + /* + * Only Full ATS or ATS UR is supported + * The EATS field will be set by arm_smmu_make_nested_domain_ste() + */ + eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1])); + arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS); + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) + return -EIO; + + if (cfg == STRTAB_STE_0_CFG_S1_TRANS) + *enable_ats = (eats == STRTAB_STE_1_EATS_TRANS); return 0; }
@@ -160,6 +181,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID; struct arm_smmu_nested_domain *nested_domain; struct iommu_hwpt_arm_smmuv3 arg; + bool enable_ats = false; int ret;
/* @@ -175,7 +197,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, if (ret) return ERR_PTR(ret);
- ret = arm_smmu_validate_vste(&arg); + ret = arm_smmu_validate_vste(&arg, &enable_ats); if (ret) return ERR_PTR(ret);
@@ -185,6 +207,7 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
nested_domain->domain.type = IOMMU_DOMAIN_NESTED; nested_domain->domain.ops = &arm_smmu_nested_ops; + nested_domain->enable_ats = enable_ats; nested_domain->vsmmu = vsmmu; nested_domain->ste[0] = arg.ste[0]; nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2ca079df2fa2..0cdb8643aebd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2306,7 +2306,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, if (!master->ats_enabled) continue;
- arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd); + if (master_domain->nested_ats_flush) { + /* + * If a S2 used as a nesting parent is changed we have + * no option but to completely flush the ATC. + */ + arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); + } else { + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, + &cmd); + }
for (i = 0; i < master->num_streams; i++) { cmd.atc.sid = master->streams[i].id; @@ -2856,7 +2865,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, struct arm_smmu_master *master, - ioasid_t ssid) + ioasid_t ssid, bool nested_ats_flush) { struct arm_smmu_master_domain *master_domain;
@@ -2865,7 +2874,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, list_for_each_entry(master_domain, &smmu_domain->devices, devices_elm) { if (master_domain->master == master && - master_domain->ssid == ssid) + master_domain->ssid == ssid && + master_domain->nested_ats_flush == nested_ats_flush) return master_domain; } return NULL; @@ -2896,13 +2906,18 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, { struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); struct arm_smmu_master_domain *master_domain; + bool nested_ats_flush = false; unsigned long flags;
if (!smmu_domain) return;
+ if (domain->type == IOMMU_DOMAIN_NESTED) + nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats; + spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid, + nested_ats_flush); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); @@ -2969,6 +2984,9 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, return -ENOMEM; master_domain->master = master; master_domain->ssid = state->ssid; + if (new_domain->type == IOMMU_DOMAIN_NESTED) + master_domain->nested_ats_flush = + to_smmu_nested_domain(new_domain)->enable_ats;
/* * During prepare we want the current smmu_domain and new diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 900cc4d341c6..8b3dac344213 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -334,7 +334,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid) #define STRTAB_STE_1_NESTING_ALLOWED \ cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ - STRTAB_STE_1_S1STALLD) + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS)
/* * Context descriptors. @@ -887,6 +887,7 @@ struct arm_smmu_domain { struct arm_smmu_nested_domain { struct iommu_domain domain; struct arm_vsmmu *vsmmu; + bool enable_ats : 1;
__le64 ste[2]; }; @@ -928,6 +929,7 @@ struct arm_smmu_master_domain { struct list_head devices_elm; struct arm_smmu_master *master; ioasid_t ssid; + bool nested_ats_flush : 1; };
static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 47ee35ce050b..125b51b78ad8 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -404,7 +404,7 @@ struct iommu_hwpt_vtd_s1 { * the translation. Must be little-endian. * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax - * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD + * - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD * * -EIO will be returned if @ste is not legal or contains any non-allowed field. * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Implement the vIOMMU's cache_invalidate op for user space to invalidate the IOTLB entries, Device ATS and CD entries that are cached by hardware.
Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries that are simply in the native format of a 128-bit TLBI command. Scan those commands against the permitted command list and fix their VMID/SID fields to match what is stored in the vIOMMU.
Co-developed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Eric Auger eric.auger@redhat.com Co-developed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 133 ++++++++++++++++++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 + include/uapi/linux/iommufd.h | 24 ++++ 4 files changed, 163 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 84c8a21c00ae..53c17d445006 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -215,8 +215,133 @@ arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, return &nested_domain->domain; }
+static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid) +{ + struct arm_smmu_master *master; + struct device *dev; + int ret = 0; + + xa_lock(&vsmmu->core.vdevs); + dev = iommufd_viommu_find_dev(&vsmmu->core, (unsigned long)vsid); + if (!dev) { + ret = -EIO; + goto unlock; + } + master = dev_iommu_priv_get(dev); + + /* At this moment, iommufd only supports PCI device that has one SID */ + if (sid) + *sid = master->streams[0].id; +unlock: + xa_unlock(&vsmmu->core.vdevs); + return ret; +} + +/* This is basically iommu_viommu_arm_smmuv3_invalidate in u64 for conversion */ +struct arm_vsmmu_invalidation_cmd { + union { + u64 cmd[2]; + struct iommu_viommu_arm_smmuv3_invalidate ucmd; + }; +}; + +/* + * Convert, in place, the raw invalidation command into an internal format that + * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are + * stored in CPU endian. + * + * Enforce the VMID or SID on the command. + */ +static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, + struct arm_vsmmu_invalidation_cmd *cmd) +{ + /* Commands are le64 stored in u64 */ + cmd->cmd[0] = le64_to_cpu(cmd->ucmd.cmd[0]); + cmd->cmd[1] = le64_to_cpu(cmd->ucmd.cmd[1]); + + switch (cmd->cmd[0] & CMDQ_0_OP) { + case CMDQ_OP_TLBI_NSNH_ALL: + /* Convert to NH_ALL */ + cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL | + FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + cmd->cmd[1] = 0; + break; + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); + break; + case CMDQ_OP_ATC_INV: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: { + u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); + + if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid)) + return -EIO; + cmd->cmd[0] &= ~CMDQ_CFGI_0_SID; + cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid); + break; + } + default: + return -EIO; + } + return 0; +} + +static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) +{ + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); + struct arm_smmu_device *smmu = vsmmu->smmu; + struct arm_vsmmu_invalidation_cmd *last; + struct arm_vsmmu_invalidation_cmd *cmds; + struct arm_vsmmu_invalidation_cmd *cur; + struct arm_vsmmu_invalidation_cmd *end; + int ret; + + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); + if (!cmds) + return -ENOMEM; + cur = cmds; + end = cmds + array->entry_num; + + static_assert(sizeof(*cmds) == 2 * sizeof(u64)); + ret = iommu_copy_struct_from_full_user_array( + cmds, sizeof(*cmds), array, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3); + if (ret) + goto out; + + last = cmds; + while (cur != end) { + ret = arm_vsmmu_convert_user_cmd(vsmmu, cur); + if (ret) + goto out; + + /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ + cur++; + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) + continue; + + /* FIXME always uses the main cmdq rather than trying to group by type */ + ret = arm_smmu_cmdq_issue_cmdlist(smmu, last->cmd, cur - last, true); + if (ret) { + cur--; + goto out; + } + last = cur; + } +out: + array->entry_num = cur - cmds; + kfree(cmds); + return ret; +} + static const struct iommufd_viommu_ops arm_vsmmu_ops = { .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, + .cache_invalidate = arm_vsmmu_cache_invalidate, };
struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, @@ -239,6 +364,14 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, if (s2_parent->smmu != master->smmu) return ERR_PTR(-EINVAL);
+ /* + * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW + * defect is needed to determine if arm_vsmmu_cache_invalidate() needs + * any change to remove this. + */ + if (WARN_ON(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) + return ERR_PTR(-EOPNOTSUPP); + /* * Must support some way to prevent the VM from bypassing the cache * because VFIO currently does not do any cache maintenance. canwbs diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 0cdb8643aebd..c1f53c9ab594 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -952,8 +952,8 @@ static int arm_smmu_ecmdq_issue_cmdlist(struct arm_smmu_device *smmu, * insert their own list of commands then all of the commands from one * CPU will appear before any of the commands from the other CPU. */ -static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, - u64 *cmds, int n, bool sync) +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) { u64 cmd_sync[CMDQ_ENT_DWORDS]; u32 prod; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 8b3dac344213..7d0db75602c5 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -558,6 +558,7 @@ struct arm_smmu_cmdq_ent { #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID 0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 + #define CMDQ_OP_TLBI_NH_VAA 0x13 #define CMDQ_OP_TLBI_EL2_ALL 0x20 #define CMDQ_OP_TLBI_EL2_ASID 0x21 #define CMDQ_OP_TLBI_EL2_VA 0x22 @@ -992,6 +993,9 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state); void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, const struct arm_smmu_ste *target);
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync); + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 125b51b78ad8..2a492e054fb7 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -688,9 +688,11 @@ struct iommu_hwpt_get_dirty_bitmap { * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation * Data Type * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 + * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3 */ enum iommu_hwpt_invalidate_data_type { IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0, + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1, };
/** @@ -729,6 +731,28 @@ struct iommu_hwpt_vtd_s1_invalidate { __u32 __reserved; };
+/** + * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation + * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. + * Must be little-endian. + * + * Supported command list only when passing in a vIOMMU via @hwpt_id: + * CMDQ_OP_TLBI_NSNH_ALL + * CMDQ_OP_TLBI_NH_VA + * CMDQ_OP_TLBI_NH_VAA + * CMDQ_OP_TLBI_NH_ALL + * CMDQ_OP_TLBI_NH_ASID + * CMDQ_OP_ATC_INV + * CMDQ_OP_CFGI_CD + * CMDQ_OP_CFGI_CD_ALL + * + * -EIO will be returned if the command is not supported. + */ +struct iommu_viommu_arm_smmuv3_invalidate { + __aligned_le64 cmd[2]; +}; + /** * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) * @size: sizeof(struct iommu_hwpt_invalidate)
From: Robin Murphy robin.murphy@arm.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Currently, iommu-dma is the only place outside of IOMMUFD and drivers which might need to be aware of the stage 2 domain encapsulated within a nested domain. This would be still the RMR solution where we're using host-managed MSIs with an identity mapping at stage 1, where it is the underlying stage 2 domain which owns an MSI cookie and holds the corresponding dynamic mappings. Hook up the new op to resolve what we need from a nested domain.
Signed-off-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/dma-iommu.c | 18 ++++++++++++++++-- include/linux/iommu.h | 4 ++++ 2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9a3dc42a3309..e3c05b6657ef 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1849,6 +1849,20 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; }
+/* + * Nested domains may not have an MSI cookie or accept mappings, but they may + * be related to a domain which does, so we let them tell us what they need. + */ +static struct iommu_domain *iommu_dma_get_msi_mapping_domain(struct device *dev) +{ + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + + if (domain && domain->type == IOMMU_DOMAIN_NESTED && + domain->ops && domain->ops->get_msi_mapping_domain) + domain = domain->ops->get_msi_mapping_domain(domain); + return domain; +} + /** * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain * @desc: MSI descriptor, will store the MSI page @@ -1859,7 +1873,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { struct device *dev = msi_desc_to_dev(desc); - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev); struct iommu_dma_msi_page *msi_page; static DEFINE_MUTEX(msi_prepare_lock); /* see below */
@@ -1892,7 +1906,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) { struct device *dev = msi_desc_to_dev(desc); - const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev); const struct iommu_dma_msi_page *msi_page;
msi_page = msi_desc_get_iommu_cookie(desc); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index c616ba00bb86..d96066b342ed 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -763,6 +763,8 @@ struct iommu_ops { * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap * @free: Release the domain after use. + * @get_msi_mapping_domain: Return the related iommu_domain that should hold the + * MSI cookie and accept mapping(s). */ struct iommu_domain_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); @@ -807,6 +809,8 @@ struct iommu_domain_ops { unsigned long *bitmap, unsigned long base_iova, unsigned long bitmap_pgshift); void (*free)(struct iommu_domain *domain); + struct iommu_domain * + (*get_msi_mapping_domain)(struct iommu_domain *domain);
KABI_RESERVE(1) KABI_RESERVE(2)
From: Nicolin Chen nicolinc@nvidia.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
In a 1-stage translation setup, a device is attached to a paging domain. In a 2-stage translation setup, a device is attached to a nested domain, which does not have the mappings for the MSI page but only an s2_parent paging domain pointer that holds the mappings.
Add arm_smmu_get_msi_mapping_domain in arm_smmu_nested_ops to return the correct paging domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 53c17d445006..2e09f0f70cec 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -30,6 +30,15 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type) return info; }
+static struct iommu_domain * +arm_smmu_get_msi_mapping_domain(struct iommu_domain *domain) +{ + struct arm_smmu_nested_domain *nested_domain = + container_of(domain, struct arm_smmu_nested_domain, domain); + + return &nested_domain->vsmmu->s2_parent->domain; +} + static void arm_smmu_make_nested_cd_table_ste( struct arm_smmu_ste *target, struct arm_smmu_master *master, struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) @@ -134,6 +143,7 @@ static void arm_smmu_domain_nested_free(struct iommu_domain *domain) }
static const struct iommu_domain_ops arm_smmu_nested_ops = { + .get_msi_mapping_domain = arm_smmu_get_msi_mapping_domain, .attach_dev = arm_smmu_attach_dev_nested, .free = arm_smmu_domain_nested_free, };
From: Zhangfei Gao zhangfei.gao@linaro.org
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs because the PRI might be a shared resource and current iommu subsystem is not ready to support enabling/disabling PRI on a VF without any impact on others.
However, we have devices that appear as PCI but are actually on the AMBA bus. These fake PCI devices have PASID capability, support stall as well as SRIOV, so remove the limitation for these devices.
Co-developed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Zhangfei Gao zhangfei.gao@linaro.org Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/fault.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index df03411c8728..3ed0ea414943 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -10,6 +10,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/pci.h> +#include <linux/pci-ats.h> #include <linux/poll.h> #include <uapi/linux/iommufd.h>
@@ -27,8 +28,12 @@ static int iommufd_fault_iopf_enable(struct iommufd_device *idev) * resource between PF and VFs. There is no coordination for this * shared capability. This waits for a vPRI reset to recover. */ - if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn) - return -EINVAL; + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + if (pdev->is_virtfn && pci_pri_supported(pdev)) + return -EINVAL; + }
mutex_lock(&idev->iopf_lock); /* Device iopf has already been on. */
From: Yi Liu yi.l.liu@intel.com
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
PASID usage requires PASID support in both device and IOMMU. Since the iommu drivers always enable the PASID capability for the device if it is supported, so it is reasonable to extend the IOMMU_GET_HW_INFO to report the PASID capability to userspace.
Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/iommufd/device.c | 27 ++++++++++++++++++++++++++- drivers/pci/ats.c | 32 ++++++++++++++++++++++++++++++++ include/linux/pci-ats.h | 3 +++ include/uapi/linux/iommufd.h | 14 +++++++++++++- 4 files changed, 74 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..c99d8b6dd1ef 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -4,6 +4,8 @@ #include <linux/iommu.h> #include <linux/iommufd.h> #include <linux/slab.h> +#include <linux/pci.h> +#include <linux/pci-ats.h> #include <uapi/linux/iommufd.h>
#include "../iommu-priv.h" @@ -1183,7 +1185,8 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) void *data; int rc;
- if (cmd->flags || cmd->__reserved) + if (cmd->flags || cmd->__reserved[0] || cmd->__reserved[1] || + cmd->__reserved[2]) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); @@ -1240,6 +1243,28 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING)) cmd->out_capabilities |= IOMMU_HW_CAP_DIRTY_TRACKING;
+ cmd->out_max_pasid_log2 = 0; + + if (dev_is_pci(idev->dev)) { + struct pci_dev *pdev = to_pci_dev(idev->dev); + int ctrl; + + if (pdev->is_virtfn) + pdev = pci_physfn(pdev); + + ctrl = pci_pasid_ctrl_status(pdev); + if (ctrl >= 0 && (ctrl & PCI_PASID_CTRL_ENABLE)) { + cmd->out_max_pasid_log2 = + ilog2(idev->dev->iommu->max_pasids); + if (ctrl & PCI_PASID_CTRL_EXEC) + cmd->out_capabilities |= + IOMMU_HW_CAP_PCI_PASID_EXEC; + if (ctrl & PCI_PASID_CTRL_PRIV) + cmd->out_capabilities |= + IOMMU_HW_CAP_PCI_PASID_PRIV; + } + } + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); out_free: kfree(data); diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c index f9cc2e10b676..eb354c5300cc 100644 --- a/drivers/pci/ats.c +++ b/drivers/pci/ats.c @@ -508,4 +508,36 @@ int pci_max_pasids(struct pci_dev *pdev) return (1 << supported); } EXPORT_SYMBOL_GPL(pci_max_pasids); + +/** + * pci_pasid_ctrl_status - Check the PASID status + * @pdev: PCI device structure + * + * Returns a negative value when no PASI capability is present. + * Otherwise the value of the control register is returned. + * Status reported are: + * + * PCI_PASID_CTRL_ENABLE - PASID enabled + * PCI_PASID_CTRL_EXEC - Execute permission enabled + * PCI_PASID_CTRL_PRIV - Privileged mode enabled + */ +int pci_pasid_ctrl_status(struct pci_dev *pdev) +{ + u16 ctrl = 0; + int pasid; + + if (pdev->is_virtfn) + pdev = pci_physfn(pdev); + + pasid = pdev->pasid_cap; + if (!pasid) + return -EINVAL; + + pci_read_config_word(pdev, pasid + PCI_PASID_CTRL, &ctrl); + + ctrl &= PCI_PASID_CTRL_ENABLE | PCI_PASID_CTRL_EXEC | + PCI_PASID_CTRL_PRIV; + + return ctrl; +} #endif /* CONFIG_PCI_PASID */ diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h index df54cd5b15db..5cee388752a0 100644 --- a/include/linux/pci-ats.h +++ b/include/linux/pci-ats.h @@ -39,6 +39,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features); void pci_disable_pasid(struct pci_dev *pdev); int pci_pasid_features(struct pci_dev *pdev); int pci_max_pasids(struct pci_dev *pdev); +int pci_pasid_ctrl_status(struct pci_dev *pdev); #else /* CONFIG_PCI_PASID */ static inline int pci_enable_pasid(struct pci_dev *pdev, int features) { return -EINVAL; } @@ -47,6 +48,8 @@ static inline int pci_pasid_features(struct pci_dev *pdev) { return -EINVAL; } static inline int pci_max_pasids(struct pci_dev *pdev) { return -EINVAL; } +static inline int pci_pasid_ctrl_status(struct pci_dev *pdev) +{ return -EINVAL; } #endif /* CONFIG_PCI_PASID */
#endif /* LINUX_PCI_ATS_H */ diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 2a492e054fb7..0be75c63dba8 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -565,9 +565,17 @@ enum iommu_hw_info_type { * IOMMU_HWPT_GET_DIRTY_BITMAP * IOMMU_HWPT_SET_DIRTY_TRACKING * + * @IOMMU_HW_CAP_PASID_EXEC: Execute Permission Supported, user ignores it + * when the struct iommu_hw_info::out_max_pasid_log2 + * is zero. + * @IOMMU_HW_CAP_PASID_PRIV: Privileged Mode Supported, user ignores it + * when the struct iommu_hw_info::out_max_pasid_log2 + * is zero. */ enum iommufd_hw_capabilities { IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0, + IOMMU_HW_CAP_PCI_PASID_EXEC = 1 << 1, + IOMMU_HW_CAP_PCI_PASID_PRIV = 1 << 2, };
/** @@ -583,6 +591,9 @@ enum iommufd_hw_capabilities { * iommu_hw_info_type. * @out_capabilities: Output the generic iommu capability info type as defined * in the enum iommu_hw_capabilities. + * @out_max_pasid_log2: Output the width of PASIDs. 0 means no PASID support. + * PCI devices turn to out_capabilities to check if the + * specific capabilities is supported or not. * @__reserved: Must be 0 * * Query an iommu type specific hardware information data from an iommu behind @@ -606,7 +617,8 @@ struct iommu_hw_info { __u32 data_len; __aligned_u64 data_uptr; __u32 out_data_type; - __u32 __reserved; + __u8 out_max_pasid_log2; + __u8 __reserved[3]; __aligned_u64 out_capabilities; }; #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
From: Zhangfei Gao zhangfei.gao@linaro.org
virt inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4WDJ CVE: NA
--------------------------------
Signed-off-by: Zhangfei Gao zhangfei.gao@linaro.org Signed-off-by: Kunkun Jiang jiangkunkun@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 3 ++- drivers/iommu/iommufd/fault.c | 6 ++++++ 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 2e09f0f70cec..a121616f5403 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -392,7 +392,8 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, */ if (!arm_smmu_master_canwbs(master) && !(smmu->features & ARM_SMMU_FEAT_S2FWB)) - return ERR_PTR(-EOPNOTSUPP); + printk("%s hack\n", __func__); + //return ERR_PTR(-EOPNOTSUPP);
vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, &arm_vsmmu_ops); diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c index 3ed0ea414943..35b7723c577d 100644 --- a/drivers/iommu/iommufd/fault.c +++ b/drivers/iommu/iommufd/fault.c @@ -45,6 +45,11 @@ static int iommufd_fault_iopf_enable(struct iommufd_device *idev) ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF); if (ret) --idev->iopf_enabled; + + ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); + if (ret) + return ret; + mutex_unlock(&idev->iopf_lock);
return ret; @@ -53,6 +58,7 @@ static int iommufd_fault_iopf_enable(struct iommufd_device *idev) static void iommufd_fault_iopf_disable(struct iommufd_device *idev) { mutex_lock(&idev->iopf_lock); + iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); if (!WARN_ON(idev->iopf_enabled == 0)) { if (--idev->iopf_enabled == 0) iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);