mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2026 -----
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2025 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 38 participants
  • 23872 discussions
[PATCH OLK-5.10] iommu/amd: move wait_on_sem() out of spinlock
by Zhang Yuwei 08 Jun '26

08 Jun '26
From: Ankit Soni <Ankit.Soni(a)amd.com> mainline inclusion from mainline-v7.0-rc1 commit d2a0cac10597068567d336e85fa3cbdbe8ca62bf category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14797 CVE: CVE-2026-43253 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- With iommu.strict=1, the existing completion wait path can cause soft lockups under stressed environment, as wait_on_sem() busy-waits under the spinlock with interrupts disabled. Move the completion wait in iommu_completion_wait() out of the spinlock. wait_on_sem() only polls the hardware-updated cmd_sem and does not require iommu->lock, so holding the lock during the busy wait unnecessarily increases contention and extends the time with interrupts disabled. Signed-off-by: Ankit Soni <Ankit.Soni(a)amd.com> Reviewed-by: Vasant Hegde <vasant.hegde(a)amd.com> Signed-off-by: Joerg Roedel <joerg.roedel(a)amd.com> Conflicts: drivers/iommu/amd/iommu.c [commit 1ce018df87640 not merged] Signed-off-by: Zhang Yuwei <zhangyuwei20(a)huawei.com> --- drivers/iommu/amd/iommu.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index ae63eb3fe3d0..c141f98465b7 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -882,7 +882,12 @@ static int wait_on_sem(struct amd_iommu *iommu, u64 data) { int i = 0; - while (*iommu->cmd_sem != data && i < LOOP_TIMEOUT) { + /* + * cmd_sem holds a monotonically non-decreasing completion sequence + * number. + */ + while ((__s64)(READ_ONCE(*iommu->cmd_sem) - data) < 0 && + i < LOOP_TIMEOUT) { udelay(1); i += 1; } @@ -1143,14 +1148,13 @@ static int iommu_completion_wait(struct amd_iommu *iommu) build_completion_wait(&cmd, iommu, data); ret = __iommu_queue_command_sync(iommu, &cmd, false); + raw_spin_unlock_irqrestore(&iommu->lock, flags); + if (ret) - goto out_unlock; + return ret; ret = wait_on_sem(iommu, data); -out_unlock: - raw_spin_unlock_irqrestore(&iommu->lock, flags); - return ret; } -- 2.22.0
2 1
0 0
[PATCH OLK-5.10] ALSA: ctxfi: Add fallback to default RSR for S/PDIF
by Zhang Yuwei 08 Jun '26

08 Jun '26
From: Harin Lee <me(a)harin.net> mainline inclusion from mainline-v7.1-rc1 commit 7d61662197ecdc458e33e475b6ada7f6da61d364 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15293 CVE: CVE-2026-46049 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- spdif_passthru_playback_get_resources() uses atc->pll_rate as the RSR for the MSR calculation loop. However, pll_rate is only updated in atc_pll_init() and not in hw_pll_init(), so it remains 0 after the card init. When spdif_passthru_playback_setup() skips atc_pll_init() for 32000 Hz, (rsr * desc.msr) always becomes 0, causing the loop to spin indefinitely. Add fallback to use atc->rsr when atc->pll_rate is 0. This reflects the hardware state, since hw_card_init() already configures the PLL to the default RSR. Fixes: 8cc72361481f ("ALSA: SB X-Fi driver merge") Cc: stable(a)vger.kernel.org Signed-off-by: Harin Lee <me(a)harin.net> Link: https://patch.msgid.link/20260406074913.217374-1-me@harin.net Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Zhang Yuwei <zhangyuwei20(a)huawei.com> --- sound/pci/ctxfi/ctatc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/pci/ctxfi/ctatc.c b/sound/pci/ctxfi/ctatc.c index 06775519dab0..fd821303999a 100644 --- a/sound/pci/ctxfi/ctatc.c +++ b/sound/pci/ctxfi/ctatc.c @@ -791,7 +791,8 @@ static int spdif_passthru_playback_get_resources(struct ct_atc *atc, struct src *src; int err; int n_amixer = apcm->substream->runtime->channels, i; - unsigned int pitch, rsr = atc->pll_rate; + unsigned int pitch; + unsigned int rsr = atc->pll_rate ? atc->pll_rate : atc->rsr; /* first release old resources */ atc_pcm_release_resources(atc, apcm); -- 2.22.0
1 0
0 0
[PATCH OLK-6.6] iommu/vt-d: Clear Present bit before tearing down context entry
by Zhang Yuwei 08 Jun '26

08 Jun '26
From: Lu Baolu <baolu.lu(a)linux.intel.com> mainline inclusion from mainline-v7.0-rc1 commit c1e4f1dccbe9d7656d1c6872ebeadb5992d0aaa2 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15190 CVE: CVE-2026-45944 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When tearing down a context entry, the current implementation zeros the entire 128-bit entry using multiple 64-bit writes. This creates a window where the hardware can fetch a "torn" entry — where some fields are already zeroed while the 'Present' bit is still set — leading to unpredictable behavior or spurious faults. While x86 provides strong write ordering, the compiler may reorder writes to the two 64-bit halves of the context entry. Even without compiler reordering, the hardware fetch is not guaranteed to be atomic with respect to multiple CPU writes. Align with the "Guidance to Software for Invalidations" in the VT-d spec (Section 6.5.3.3) by implementing the recommended ownership handshake: 1. Clear only the 'Present' (P) bit of the context entry first to signal the transition of ownership from hardware to software. 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU. 3. Perform the required cache and context-cache invalidation to ensure hardware no longer has cached references to the entry. 4. Fully zero out the entry only after the invalidation is complete. Also, add a dma_wmb() to context_set_present() to ensure the entry is fully initialized before the 'Present' bit becomes visible. Fixes: ba39592764ed ("Intel IOMMU: Intel IOMMU driver") Reported-by: Dmytro Maluka <dmaluka(a)chromium.org> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/ Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> Reviewed-by: Dmytro Maluka <dmaluka(a)chromium.org> Reviewed-by: Samiullah Khawaja <skhawaja(a)google.com> Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel(a)amd.com> Conflicts: drivers/iommu/intel/pasid.c drivers/iommu/intel/iommu.c drivers/iommu/intel/iommu.h [contxt conflict] Signed-off-by: Zhang Yuwei <zhangyuwei20(a)huawei.com> --- drivers/iommu/intel/iommu.c | 4 +++- drivers/iommu/intel/iommu.h | 21 ++++++++++++++++++++- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 86d713aefe91..ba4b298d456b 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2180,7 +2180,7 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8 did_old = context_domain_id(context); } - context_clear_entry(context); + context_clear_present(context); __iommu_flush_cache(iommu, context, sizeof(*context)); spin_unlock(&iommu->lock); iommu->flush.flush_context(iommu, @@ -2199,6 +2199,8 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8 DMA_TLB_DSI_FLUSH); __iommu_flush_dev_iotlb(info, 0, MAX_AGAW_PFN_WIDTH); + context_clear_entry(context); + __iommu_flush_cache(iommu, context, sizeof(*context)); } static int domain_setup_first_level(struct intel_iommu *iommu, diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 9e566a2f22df..791326fa95ba 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -929,7 +929,26 @@ static inline unsigned long virt_to_dma_pfn(void *p) static inline void context_set_present(struct context_entry *context) { - context->lo |= 1; + u64 val; + + dma_wmb(); + val = READ_ONCE(context->lo) | 1; + WRITE_ONCE(context->lo, val); +} + +/* + * Clear the Present (P) bit (bit 0) of a context table entry. This initiates + * the transition of the entry's ownership from hardware to software. The + * caller is responsible for fulfilling the invalidation handshake recommended + * by the VT-d spec, Section 6.5.3.3 (Guidance to Software for Invalidations). + */ +static inline void context_clear_present(struct context_entry *context) +{ + u64 val; + + val = READ_ONCE(context->lo) & GENMASK_ULL(63, 1); + WRITE_ONCE(context->lo, val); + dma_wmb(); } static inline void context_set_fault_enable(struct context_entry *context) -- 2.22.0
1 0
0 0
[PATCH OLK-6.6 0/2] irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support
by Jinjie Ruan 08 Jun '26

08 Jun '26
Fix ALLINT masking logic by decoupling from GIC NMI support. Jinjie Ruan (2): Revert "irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support" irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support arch/arm64/include/asm/nmi.h | 5 +++++ drivers/irqchip/irq-gic-v3.c | 14 ++++++++------ 2 files changed, 13 insertions(+), 6 deletions(-) -- 2.34.1
2 3
0 0
[PATCH OLK-6.6 0/2] irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support
by Jinjie Ruan 08 Jun '26

08 Jun '26
Fix ALLINT masking logic by decoupling from GIC NMI support. Jinjie Ruan (2): Revert "irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support" irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support arch/arm64/include/asm/nmi.h | 5 +++++ drivers/irqchip/irq-gic-v3.c | 14 ++++++++------ 2 files changed, 13 insertions(+), 6 deletions(-) -- 2.34.1
2 3
0 0
[PATCH OLK-5.10] irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support
by Jinjie Ruan 08 Jun '26

08 Jun '26
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9343 -------------------------------- The ARM64 FEAT_NMI implementation relies on two independent hardware validation paths: 1. CPU Feature Check (`system_uses_nmi()`): Validates if the PE supports FEAT_NMI via ID_AA64PFR1_EL1.NMI. If pseudo-NMI is disabled and this bit is set, enables SCTLR_EL1.NMI (map to SCTLR_EL2.NMI in VHE mode) and clears SCTLR_EL1.SPINTMASK. Once active, the hardware automatically sets PSTATE.ALLINT upon taking an exception to EL1/EL2, effectively masking interrupts. `sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPINTMASK, SCTLR_EL1_NMI)` 2. GICv3 Feature Check (gic_data.has_nmi): Validates if the interrupt controller supports Non-maskable Interrupts via GICD_TYPER.NMI. `gic_data.has_nmi = !!(typer & GICD_TYPER_NMI)` And PSTATE.ALLINT is strictly a PE-level state register. Its existence and behavior are entirely independent of whether the underlying GICv3 hardware implementation has its own FEAT_NMI capabilities enabled. Currently, __gic_handle_irq_from_irqson() uses has_v3_3_nmi() to guard the clearing of PSTATE.ALLINT. This creates a destructive dependency where the GIC's NMI status dictates PE register management. static inline bool has_v3_3_nmi(void) { return gic_data.has_nmi && system_uses_nmi(); } If a platform features a PE that supports FEAT_NMI but has a GICv3 that does not support (or has disabled) the NMI feature, the current implementation fails to clear ALLINT during IRQ handling. Because ALLINT is automatically set by the hardware exception entry, it remains permanently set. As a consequence, local_irq_enable() only clears PSTATE.I and F bits, leaving ALLINT untouched. Since ALLINT masks both standard IRQs and NMIs, even if PSTATE.I cleared, the system becomes completely incapable of responding to subsequent NMIs and standard IRQs during downstream processing (such as softirq execution). handle_softirqs() -> local_irq_enable() -> asm volatile("msr daifclr, #3") -> do pending softirq action -> local_irq_disable() -> asm volatile("msr daifset, #3") Fix this by replacing the has_v3_3_nmi() check with system_uses_nmi() when clearing ALLINT. This ensures that PE state management correctly reflects PE capabilities. Additionally, explicitly log whether the GIC supports hardware NMI based on the GICD_TYPER.NMI bit during initialization. Since FEAT_NMI deployment relies on both subsystems, printing this status provides critical diagnostic information for system verification. Fixes: eefea6156921 ("irqchip/gic-v3: Fix hard LOCKUP caused by NMI being masked") Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com> --- arch/arm64/include/asm/daifflags.h | 5 +++++ drivers/irqchip/irq-gic-v3.c | 8 +++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index e4aec9469dcf..f04c5ccbd176 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -17,6 +17,7 @@ #define DAIF_ERRCTX (PSR_I_BIT | PSR_A_BIT) #define DAIF_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) +#ifdef CONFIG_ARM64_NMI static __always_inline void _allint_clear(void) { asm volatile(__msr_s(SYS_ALLINT_CLR, "xzr")); @@ -26,6 +27,10 @@ static __always_inline void _allint_set(void) { asm volatile(__msr_s(SYS_ALLINT_SET, "xzr")); } +#else +static __always_inline void _allint_clear(void) { } +static __always_inline void _allint_set(void) { } +#endif /* mask/save/unmask/restore all exceptions, including interrupts. */ static inline void local_daif_mask(void) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 9870ad4cc766..38a429c29fdf 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -821,11 +821,12 @@ static void __gic_handle_irq_from_irqson(struct pt_regs *regs) if (gic_prio_masking_enabled()) { gic_pmr_mask_irqs(); gic_arch_enable_irqs(); - } else if (has_v3_3_nmi()) { -#ifdef CONFIG_ARM64_NMI + } + +#ifdef CONFIG_ARM64 + if (system_uses_nmi()) _allint_clear(); #endif - } if (!is_nmi) __gic_handle_irq(irqnr, regs); @@ -2273,6 +2274,7 @@ static int __init gic_init_bases(void __iomem *dist_base, gic_data.has_nmi = !!(typer & GICD_TYPER_NMI); pr_info("Distributor has %sRange Selector support\n", gic_data.has_rss ? "" : "no "); + pr_info("GICD_TYPER NMI is%s supported.\n", gic_data.has_nmi ? "" : " not"); if (typer & GICD_TYPER_MBIS) { err = mbi_init(handle, gic_data.domain); -- 2.34.1
2 1
0 0
[PATCH OLK-6.6 0/2] arm64: Fix ALLINT masking logic by decoupling from GIC NMI support
by Jinjie Ruan 08 Jun '26

08 Jun '26
Fix ALLINT masking logic by decoupling from GIC NMI support Jinjie Ruan (2): Revert "irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support" irqchip/gic-v3: Fix ALLINT masking logic by decoupling from GIC NMI support drivers/irqchip/irq-gic-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.34.1
2 3
0 0
[PATCH OLK-5.10] iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
by Zhang Yuwei 08 Jun '26

08 Jun '26
From: Jinhui Guo <guojinhui.liam(a)bytedance.com> stable inclusion from stable-v5.10.252 commit 581ce094d9eafb78ec4f9de77bd24b780c151236 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14677 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 10e60d87813989e20eac1f3eda30b3bae461e7f9 ] Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system. For example, if a VM fails to connect to the PCIe device, "virsh destroy" is executed to release resources and isolate the fault, but a hard-lockup occurs while releasing the group fd. Call Trace: qi_submit_sync qi_flush_dev_iotlb intel_pasid_tear_down_entry device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput Although pci_device_is_present() is slower than pci_dev_is_disconnected(), it still takes only ~70 µs on a ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed and width increase. Besides, devtlb_invalidation_with_pasid() is called only in the paths below, which are far less frequent than memory map/unmap. 1. mm-struct release 2. {attach,release}_dev 3. set/remove PASID 4. dirty-tracking setup The gain in system stability far outweighs the negligible cost of using pci_device_is_present() instead of pci_dev_is_disconnected() to decide when to skip ATS invalidation, especially under GDR high-load conditions. Fixes: 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") Cc: stable(a)vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam(a)bytedance.com> Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Lin Yujun <linyujun809(a)h-partners.com> --- drivers/iommu/intel/pasid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index cc913f00f296..33c85d9e3a45 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -513,7 +513,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, if (!info || !info->ats_enabled) return; - if (pci_dev_is_disconnected(to_pci_dev(dev))) + if (!pci_device_is_present(to_pci_dev(dev))) return; sid = info->bus << 8 | info->devfn; -- 2.22.0
2 1
0 0
[PATCH OLK-5.10] netfilter: flowtable: strictly check for maximum number of actions
by Zhang Changzhong 06 Jun '26

06 Jun '26
From: Pablo Neira Ayuso <pablo(a)netfilter.org> mainline inclusion from mainline-v7.0-rc7 commit 76522fcdbc3a02b568f5d957f7e66fc194abb893 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14887 CVE: CVE-2026-43329 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- The maximum number of flowtable hardware offload actions in IPv6 is: * ethernet mangling (4 payload actions, 2 for each ethernet address) * SNAT (4 payload actions) * DNAT (4 payload actions) * Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing) for QinQ. * Redirect (1 action) Which makes 17, while the maximum is 16. But act_ct supports for tunnels actions too. Note that payload action operates at 32-bit word level, so mangling an IPv6 address takes 4 payload actions. Update flow_action_entry_next() calls to check for the maximum number of supported actions. While at it, rise the maximum number of actions per flow from 16 to 24 so this works fine with IPv6 setups. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Hyunwoo Kim <imv4bel(a)gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_flow_table_offload.c [context conflict because commits 2ed37183abb7 and eeff3000f240 were not merged] Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com> --- net/netfilter/nf_flow_table_offload.c | 173 +++++++++++++++++++++++----------- 1 file changed, 116 insertions(+), 57 deletions(-) diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index f6275d9..dc53bf4 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -13,6 +13,8 @@ #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_conntrack_tuple.h> +#define NF_FLOW_RULE_ACTION_MAX 24 + static struct workqueue_struct *nf_flow_offload_wq; struct flow_offload_work { @@ -165,7 +167,12 @@ static void flow_offload_mangle(struct flow_action_entry *entry, static inline struct flow_action_entry * flow_action_entry_next(struct nf_flow_rule *flow_rule) { - int i = flow_rule->rule->action.num_entries++; + int i; + + if (unlikely(flow_rule->rule->action.num_entries >= NF_FLOW_RULE_ACTION_MAX)) + return NULL; + + i = flow_rule->rule->action.num_entries++; return &flow_rule->rule->action.entries[i]; } @@ -182,6 +189,9 @@ static int flow_offload_eth_src(struct net *net, u32 mask, val; u16 val16; + if (!entry0 || !entry1) + return -E2BIG; + dev = dev_get_by_index(net, tuple->iifidx); if (!dev) return -ENOENT; @@ -216,6 +226,9 @@ static int flow_offload_eth_dst(struct net *net, u8 nud_state; u16 val16; + if (!entry0 || !entry1) + return -E2BIG; + dst_cache = flow->tuplehash[dir].tuple.dst_cache; n = dst_neigh_lookup(dst_cache, daddr); if (!n) @@ -246,16 +259,19 @@ static int flow_offload_eth_dst(struct net *net, return 0; } -static void flow_offload_ipv4_snat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_ipv4_snat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { struct flow_action_entry *entry = flow_action_entry_next(flow_rule); u32 mask = ~htonl(0xffffffff); __be32 addr; u32 offset; + if (!entry) + return -E2BIG; + switch (dir) { case FLOW_OFFLOAD_DIR_ORIGINAL: addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr; @@ -266,23 +282,27 @@ static void flow_offload_ipv4_snat(struct net *net, offset = offsetof(struct iphdr, daddr); break; default: - return; + return -EOPNOTSUPP; } flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP4, offset, &addr, &mask); + return 0; } -static void flow_offload_ipv4_dnat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_ipv4_dnat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { struct flow_action_entry *entry = flow_action_entry_next(flow_rule); u32 mask = ~htonl(0xffffffff); __be32 addr; u32 offset; + if (!entry) + return -E2BIG; + switch (dir) { case FLOW_OFFLOAD_DIR_ORIGINAL: addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr; @@ -293,14 +313,15 @@ static void flow_offload_ipv4_dnat(struct net *net, offset = offsetof(struct iphdr, saddr); break; default: - return; + return -EOPNOTSUPP; } flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP4, offset, &addr, &mask); + return 0; } -static void flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, +static int flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, unsigned int offset, const __be32 *addr, const __be32 *mask) { @@ -309,15 +330,20 @@ static void flow_offload_ipv6_mangle(struct nf_flow_rule *flow_rule, for (i = 0; i < sizeof(struct in6_addr) / sizeof(u32); i++) { entry = flow_action_entry_next(flow_rule); + if (!entry) + return -E2BIG; + flow_offload_mangle(entry, FLOW_ACT_MANGLE_HDR_TYPE_IP6, offset + i * sizeof(u32), &addr[i], mask); } + + return 0; } -static void flow_offload_ipv6_snat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_ipv6_snat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { u32 mask = ~htonl(0xffffffff); const __be32 *addr; @@ -333,16 +359,16 @@ static void flow_offload_ipv6_snat(struct net *net, offset = offsetof(struct ipv6hdr, daddr); break; default: - return; + return -EOPNOTSUPP; } - flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); + return flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); } -static void flow_offload_ipv6_dnat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_ipv6_dnat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { u32 mask = ~htonl(0xffffffff); const __be32 *addr; @@ -358,10 +384,10 @@ static void flow_offload_ipv6_dnat(struct net *net, offset = offsetof(struct ipv6hdr, saddr); break; default: - return; + return -EOPNOTSUPP; } - flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); + return flow_offload_ipv6_mangle(flow_rule, offset, addr, &mask); } static int flow_offload_l4proto(const struct flow_offload *flow) @@ -383,15 +409,18 @@ static int flow_offload_l4proto(const struct flow_offload *flow) return type; } -static void flow_offload_port_snat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_port_snat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { struct flow_action_entry *entry = flow_action_entry_next(flow_rule); u32 mask, port; u32 offset; + if (!entry) + return -E2BIG; + switch (dir) { case FLOW_OFFLOAD_DIR_ORIGINAL: port = ntohs(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port); @@ -406,22 +435,26 @@ static void flow_offload_port_snat(struct net *net, mask = ~htonl(0xffff); break; default: - return; + return -EOPNOTSUPP; } flow_offload_mangle(entry, flow_offload_l4proto(flow), offset, &port, &mask); + return 0; } -static void flow_offload_port_dnat(struct net *net, - const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int flow_offload_port_dnat(struct net *net, + const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { struct flow_action_entry *entry = flow_action_entry_next(flow_rule); u32 mask, port; u32 offset; + if (!entry) + return -E2BIG; + switch (dir) { case FLOW_OFFLOAD_DIR_ORIGINAL: port = ntohs(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port); @@ -436,20 +469,24 @@ static void flow_offload_port_dnat(struct net *net, mask = ~htonl(0xffff0000); break; default: - return; + return -EOPNOTSUPP; } flow_offload_mangle(entry, flow_offload_l4proto(flow), offset, &port, &mask); + return 0; } -static void flow_offload_ipv4_checksum(struct net *net, - const struct flow_offload *flow, - struct nf_flow_rule *flow_rule) +static int flow_offload_ipv4_checksum(struct net *net, + const struct flow_offload *flow, + struct nf_flow_rule *flow_rule) { u8 protonum = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l4proto; struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + if (!entry) + return -E2BIG; + entry->id = FLOW_ACTION_CSUM; entry->csum_flags = TCA_CSUM_UPDATE_FLAG_IPV4HDR; @@ -461,22 +498,29 @@ static void flow_offload_ipv4_checksum(struct net *net, entry->csum_flags |= TCA_CSUM_UPDATE_FLAG_UDP; break; } + + return 0; } -static void flow_offload_redirect(const struct flow_offload *flow, +static int flow_offload_redirect(const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { struct flow_action_entry *entry = flow_action_entry_next(flow_rule); struct rtable *rt; + if (!entry) + return -E2BIG; + rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache; entry->id = FLOW_ACTION_REDIRECT; entry->dev = rt->dst.dev; dev_hold(rt->dst.dev); + + return 0; } -static void flow_offload_encap_tunnel(const struct flow_offload *flow, +static int flow_offload_encap_tunnel(const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { @@ -490,13 +534,17 @@ static void flow_offload_encap_tunnel(const struct flow_offload *flow, tun_info = lwt_tun_info(dst->lwtstate); if (tun_info && (tun_info->mode & IP_TUNNEL_INFO_TX)) { entry = flow_action_entry_next(flow_rule); + if (!entry) + return -E2BIG; entry->id = FLOW_ACTION_TUNNEL_ENCAP; entry->tunnel = tun_info; } } + + return 0; } -static void flow_offload_decap_tunnel(const struct flow_offload *flow, +static int flow_offload_decap_tunnel(const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { @@ -510,35 +558,44 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow, tun_info = lwt_tun_info(dst->lwtstate); if (tun_info && (tun_info->mode & IP_TUNNEL_INFO_TX)) { entry = flow_action_entry_next(flow_rule); + if (!entry) + return -E2BIG; entry->id = FLOW_ACTION_TUNNEL_DECAP; } } + + return 0; } int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { - flow_offload_decap_tunnel(flow, dir, flow_rule); - flow_offload_encap_tunnel(flow, dir, flow_rule); + if (flow_offload_decap_tunnel(flow, dir, flow_rule) < 0 || + flow_offload_encap_tunnel(flow, dir, flow_rule) < 0) + return -1; if (flow_offload_eth_src(net, flow, dir, flow_rule) < 0 || flow_offload_eth_dst(net, flow, dir, flow_rule) < 0) return -1; if (test_bit(NF_FLOW_SNAT, &flow->flags)) { - flow_offload_ipv4_snat(net, flow, dir, flow_rule); - flow_offload_port_snat(net, flow, dir, flow_rule); + if (flow_offload_ipv4_snat(net, flow, dir, flow_rule) < 0 || + flow_offload_port_snat(net, flow, dir, flow_rule) < 0) + return -1; } if (test_bit(NF_FLOW_DNAT, &flow->flags)) { - flow_offload_ipv4_dnat(net, flow, dir, flow_rule); - flow_offload_port_dnat(net, flow, dir, flow_rule); + if (flow_offload_ipv4_dnat(net, flow, dir, flow_rule) < 0 || + flow_offload_port_dnat(net, flow, dir, flow_rule) < 0) + return -1; } if (test_bit(NF_FLOW_SNAT, &flow->flags) || test_bit(NF_FLOW_DNAT, &flow->flags)) - flow_offload_ipv4_checksum(net, flow, flow_rule); + if (flow_offload_ipv4_checksum(net, flow, flow_rule) < 0) + return -1; - flow_offload_redirect(flow, dir, flow_rule); + if (flow_offload_redirect(flow, dir, flow_rule) < 0) + return -1; return 0; } @@ -548,30 +605,32 @@ int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { - flow_offload_decap_tunnel(flow, dir, flow_rule); - flow_offload_encap_tunnel(flow, dir, flow_rule); + if (flow_offload_decap_tunnel(flow, dir, flow_rule) < 0 || + flow_offload_encap_tunnel(flow, dir, flow_rule) < 0) + return -1; if (flow_offload_eth_src(net, flow, dir, flow_rule) < 0 || flow_offload_eth_dst(net, flow, dir, flow_rule) < 0) return -1; if (test_bit(NF_FLOW_SNAT, &flow->flags)) { - flow_offload_ipv6_snat(net, flow, dir, flow_rule); - flow_offload_port_snat(net, flow, dir, flow_rule); + if (flow_offload_ipv6_snat(net, flow, dir, flow_rule) < 0 || + flow_offload_port_snat(net, flow, dir, flow_rule) < 0) + return -1; } if (test_bit(NF_FLOW_DNAT, &flow->flags)) { - flow_offload_ipv6_dnat(net, flow, dir, flow_rule); - flow_offload_port_dnat(net, flow, dir, flow_rule); + if (flow_offload_ipv6_dnat(net, flow, dir, flow_rule) < 0 || + flow_offload_port_dnat(net, flow, dir, flow_rule) < 0) + return -1; } - flow_offload_redirect(flow, dir, flow_rule); + if (flow_offload_redirect(flow, dir, flow_rule) < 0) + return -1; return 0; } EXPORT_SYMBOL_GPL(nf_flow_rule_route_ipv6); -#define NF_FLOW_RULE_ACTION_MAX 16 - static struct nf_flow_rule * nf_flow_offload_rule_alloc(struct net *net, const struct flow_offload_work *offload, -- 2.9.5
2 1
0 0
[PATCH OLK-5.10] iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
by Zhang Yuwei 06 Jun '26

06 Jun '26
From: Jinhui Guo <guojinhui.liam(a)bytedance.com> stable inclusion from stable-v5.10.252 commit 581ce094d9eafb78ec4f9de77bd24b780c151236 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15190 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 10e60d87813989e20eac1f3eda30b3bae461e7f9 ] Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system. For example, if a VM fails to connect to the PCIe device, "virsh destroy" is executed to release resources and isolate the fault, but a hard-lockup occurs while releasing the group fd. Call Trace: qi_submit_sync qi_flush_dev_iotlb intel_pasid_tear_down_entry device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput Although pci_device_is_present() is slower than pci_dev_is_disconnected(), it still takes only ~70 µs on a ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed and width increase. Besides, devtlb_invalidation_with_pasid() is called only in the paths below, which are far less frequent than memory map/unmap. 1. mm-struct release 2. {attach,release}_dev 3. set/remove PASID 4. dirty-tracking setup The gain in system stability far outweighs the negligible cost of using pci_device_is_present() instead of pci_dev_is_disconnected() to decide when to skip ATS invalidation, especially under GDR high-load conditions. Fixes: 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") Cc: stable(a)vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam(a)bytedance.com> Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Lin Yujun <linyujun809(a)h-partners.com> --- drivers/iommu/intel/pasid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index cc913f00f296..33c85d9e3a45 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -513,7 +513,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, if (!info || !info->ats_enabled) return; - if (pci_dev_is_disconnected(to_pci_dev(dev))) + if (!pci_device_is_present(to_pci_dev(dev))) return; sid = info->bus << 8 | info->devfn; -- 2.22.0
2 1
0 0
  • ← Newer
  • 1
  • 2
  • 3
  • 4
  • ...
  • 2388
  • Older →

HyperKitty Powered by HyperKitty