This patch set fix CVE-2024-26812.
Alex Williamson (4): vfio: Introduce interface to flush virqfd inject workqueue vfio/pci: Disable auto-enable of exclusive INTx IRQ vfio/pci: Lock external INTx masking ops vfio/pci: Create persistent INTx handler
Barry Song (1): genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()
drivers/vfio/pci/vfio_pci_intrs.c | 177 ++++++++++++++++++------------ drivers/vfio/virqfd.c | 21 ++++ include/linux/interrupt.h | 4 + include/linux/vfio.h | 2 + kernel/irq/manage.c | 11 +- 5 files changed, 143 insertions(+), 72 deletions(-)
From: Barry Song song.bao.hua@hisilicon.com
stable inclusion from stable-v5.10.163 commit 50aaa6b1742cb26f718bc72f4196aaf6a4efa7a7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9E6TE CVE: CVE-2024-26812
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit cbe16f35bee6880becca6f20d2ebf6b457148552 ]
Many drivers don't want interrupts enabled automatically via request_irq(). So they are handling this issue by either way of the below two:
(1) irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...);
(2) request_irq(dev, irq...); disable_irq(irq);
The code in the second way is silly and unsafe. In the small time gap between request_irq() and disable_irq(), interrupts can still come.
The code in the first way is safe though it's subobtimal.
Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to request_irq() and request_nmi(). It prevents the automatic enabling of the requested interrupt/nmi in the same safe way as #1 above. With that the various usage sites of #1 and #2 above can be simplified and corrected.
Signed-off-by: Barry Song song.bao.hua@hisilicon.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Ingo Molnar mingo@kernel.org Cc: dmitry.torokhov@gmail.com Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com Stable-dep-of: 99c05e4283a1 ("iio: adis: add '__adis_enable_irq()' implementation") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com --- include/linux/interrupt.h | 4 ++++ kernel/irq/manage.c | 11 +++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 97de36a38770..f1683bd0d2ff 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -61,6 +61,9 @@ * interrupt handler after suspending interrupts. For system * wakeup devices users need to implement wakeup detection in * their interrupt handlers. + * IRQF_NO_AUTOEN - Don't enable IRQ or NMI automatically when users request it. + * Users will enable it explicitly by enable_irq() or enable_nmi() + * later. */ #define IRQF_SHARED 0x00000080 #define IRQF_PROBE_SHARED 0x00000100 @@ -74,6 +77,7 @@ #define IRQF_NO_THREAD 0x00010000 #define IRQF_EARLY_RESUME 0x00020000 #define IRQF_COND_SUSPEND 0x00040000 +#define IRQF_NO_AUTOEN 0x00080000
#define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 6d01d1450789..e6f6450ed454 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1628,7 +1628,8 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) irqd_set(&desc->irq_data, IRQD_NO_BALANCING); }
- if (irq_settings_can_autoenable(desc)) { + if (!(new->flags & IRQF_NO_AUTOEN) && + irq_settings_can_autoenable(desc)) { irq_startup(desc, IRQ_RESEND, IRQ_START_COND); } else { /* @@ -2059,10 +2060,15 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler, * which interrupt is which (messes up the interrupt freeing * logic etc). * + * Also shared interrupts do not go well with disabling auto enable. + * The sharing interrupt might request it while it's still disabled + * and then wait for interrupts forever. + * * Also IRQF_COND_SUSPEND only makes sense for shared interrupts and * it cannot be set along with IRQF_NO_SUSPEND. */ if (((irqflags & IRQF_SHARED) && !dev_id) || + ((irqflags & IRQF_SHARED) && (irqflags & IRQF_NO_AUTOEN)) || (!(irqflags & IRQF_SHARED) && (irqflags & IRQF_COND_SUSPEND)) || ((irqflags & IRQF_NO_SUSPEND) && (irqflags & IRQF_COND_SUSPEND))) return -EINVAL; @@ -2218,7 +2224,8 @@ int request_nmi(unsigned int irq, irq_handler_t handler,
desc = irq_to_desc(irq);
- if (!desc || irq_settings_can_autoenable(desc) || + if (!desc || (irq_settings_can_autoenable(desc) && + !(irqflags & IRQF_NO_AUTOEN)) || !irq_settings_can_request(desc) || WARN_ON(irq_settings_is_per_cpu_devid(desc)) || !irq_supports_nmi(desc))
From: Alex Williamson alex.williamson@redhat.com
stable inclusion from stable-v5.15.154 commit 26a6a1e0b4ecea56862f40fd2939f327395afc49 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9E6TE CVE: CVE-2024-26812
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit b620ecbd17a03cacd06f014a5d3f3a11285ce053 ]
In order to synchronize changes that can affect the thread callback, introduce an interface to force a flush of the inject workqueue. The irqfd pointer is only valid under spinlock, but the workqueue cannot be flushed under spinlock. Therefore the flush work for the irqfd is queued under spinlock. The vfio_irqfd_cleanup_wq workqueue is re-used for queuing this work such that flushing the workqueue is also ordered relative to shutdown.
Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Eric Auger eric.auger@redhat.com Link: https://lore.kernel.org/r/20240308230557.805580-4-alex.williamson@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: include/linux/vfio.h
Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com --- drivers/vfio/virqfd.c | 21 +++++++++++++++++++++ include/linux/vfio.h | 2 ++ 2 files changed, 23 insertions(+)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c index 2a1be859ee71..ab2a429d36e7 100644 --- a/drivers/vfio/virqfd.c +++ b/drivers/vfio/virqfd.c @@ -104,6 +104,13 @@ static void virqfd_inject(struct work_struct *work) virqfd->thread(virqfd->opaque, virqfd->data); }
+static void virqfd_flush_inject(struct work_struct *work) +{ + struct virqfd *virqfd = container_of(work, struct virqfd, flush_inject); + + flush_work(&virqfd->inject); +} + int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), void (*thread)(void *, void *), @@ -127,6 +134,7 @@ int vfio_virqfd_enable(void *opaque,
INIT_WORK(&virqfd->shutdown, virqfd_shutdown); INIT_WORK(&virqfd->inject, virqfd_inject); + INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
irqfd = fdget(fd); if (!irqfd.file) { @@ -217,6 +225,19 @@ void vfio_virqfd_disable(struct virqfd **pvirqfd) } EXPORT_SYMBOL_GPL(vfio_virqfd_disable);
+void vfio_virqfd_flush_thread(struct virqfd **pvirqfd) +{ + unsigned long flags; + + spin_lock_irqsave(&virqfd_lock, flags); + if (*pvirqfd && (*pvirqfd)->thread) + queue_work(vfio_irqfd_cleanup_wq, &(*pvirqfd)->flush_inject); + spin_unlock_irqrestore(&virqfd_lock, flags); + + flush_workqueue(vfio_irqfd_cleanup_wq); +} +EXPORT_SYMBOL_GPL(vfio_virqfd_flush_thread); + module_init(vfio_virqfd_init); module_exit(vfio_virqfd_exit);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 66741ab087c1..ad17afecd070 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -189,6 +189,7 @@ struct virqfd { wait_queue_entry_t wait; poll_table pt; struct work_struct shutdown; + struct work_struct flush_inject; struct virqfd **pvirqfd; };
@@ -197,5 +198,6 @@ extern int vfio_virqfd_enable(void *opaque, void (*thread)(void *, void *), void *data, struct virqfd **pvirqfd, int fd); extern void vfio_virqfd_disable(struct virqfd **pvirqfd); +void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
#endif /* VFIO_H */
From: Alex Williamson alex.williamson@redhat.com
stable inclusion from stable-v5.15.154 commit b7a2f0955ffceffadfe098b40b50307431f45438 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9E6TE CVE: CVE-2024-26812
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit fe9a7082684eb059b925c535682e68c34d487d43 ]
Currently for devices requiring masking at the irqchip for INTx, ie. devices without DisINTx support, the IRQ is enabled in request_irq() and subsequently disabled as necessary to align with the masked status flag. This presents a window where the interrupt could fire between these events, resulting in the IRQ incrementing the disable depth twice. This would be unrecoverable for a user since the masked flag prevents nested enables through vfio.
Instead, invert the logic using IRQF_NO_AUTOEN such that exclusive INTx is never auto-enabled, then unmask as required.
Cc: stable@vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Eric Auger eric.auger@redhat.com Link: https://lore.kernel.org/r/20240308230557.805580-2-alex.williamson@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com --- drivers/vfio/pci/vfio_pci_intrs.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index c989f777bf77..75d9bb39d07f 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -202,8 +202,15 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
vdev->ctx[0].trigger = trigger;
+ /* + * Devices without DisINTx support require an exclusive interrupt, + * IRQ masking is performed at the IRQ chip. The masked status is + * protected by vdev->irqlock. Setup the IRQ without auto-enable and + * unmask as necessary below under lock. DisINTx is unmodified by + * the IRQ configuration and may therefore use auto-enable. + */ if (!vdev->pci_2_3) - irqflags = 0; + irqflags = IRQF_NO_AUTOEN;
ret = request_irq(pdev->irq, vfio_intx_handler, irqflags, vdev->ctx[0].name, vdev); @@ -214,13 +221,9 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) return ret; }
- /* - * INTx disable will stick across the new irq setup, - * disable_irq won't. - */ spin_lock_irqsave(&vdev->irqlock, flags); - if (!vdev->pci_2_3 && vdev->ctx[0].masked) - disable_irq_nosync(pdev->irq); + if (!vdev->pci_2_3 && !vdev->ctx[0].masked) + enable_irq(pdev->irq); spin_unlock_irqrestore(&vdev->irqlock, flags);
return 0;
From: Alex Williamson alex.williamson@redhat.com
stable inclusion from stable-v5.15.154 commit ec73e079729258a05452356cf6d098bf1504d5a6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9E6TE CVE: CVE-2024-26812
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 810cd4bb53456d0503cc4e7934e063835152c1b7 ]
Mask operations through config space changes to DisINTx may race INTx configuration changes via ioctl. Create wrappers that add locking for paths outside of the core interrupt code.
In particular, irq_type is updated holding igate, therefore testing is_intx() requires holding igate. For example clearing DisINTx from config space can otherwise race changes of the interrupt configuration.
This aligns interfaces which may trigger the INTx eventfd into two camps, one side serialized by igate and the other only enabled while INTx is configured. A subsequent patch introduces synchronization for the latter flows.
Cc: stable@vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Eric Auger eric.auger@redhat.com Link: https://lore.kernel.org/r/20240308230557.805580-3-alex.williamson@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/vfio/pci/vfio_pci_intrs.c
Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com --- drivers/vfio/pci/vfio_pci_intrs.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 75d9bb39d07f..84e2cfd5fe2a 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -36,11 +36,13 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) eventfd_signal(vdev->ctx[0].trigger, 1); }
-void vfio_pci_intx_mask(struct vfio_pci_device *vdev) +static void __vfio_pci_intx_mask(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; unsigned long flags;
+ lockdep_assert_held(&vdev->igate); + spin_lock_irqsave(&vdev->irqlock, flags);
/* @@ -68,6 +70,13 @@ void vfio_pci_intx_mask(struct vfio_pci_device *vdev) spin_unlock_irqrestore(&vdev->irqlock, flags); }
+void vfio_pci_intx_mask(struct vfio_pci_device *vdev) +{ + mutex_lock(&vdev->igate); + __vfio_pci_intx_mask(vdev); + mutex_unlock(&vdev->igate); +} + /* * If this is triggered by an eventfd, we can't call eventfd_signal * or else we'll deadlock on the eventfd wait queue. Return >0 when @@ -110,12 +119,22 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused) return ret; }
-void vfio_pci_intx_unmask(struct vfio_pci_device *vdev) +static void __vfio_pci_intx_unmask(struct vfio_pci_device *vdev) { + lockdep_assert_held(&vdev->igate); + if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0) vfio_send_intx_eventfd(vdev, NULL); }
+void vfio_pci_intx_unmask(struct vfio_pci_device *vdev) +{ + mutex_lock(&vdev->igate); + __vfio_pci_intx_unmask(vdev); + mutex_unlock(&vdev->igate); +} + + static irqreturn_t vfio_intx_handler(int irq, void *dev_id) { struct vfio_pci_device *vdev = dev_id; @@ -431,11 +450,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev, return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_NONE) { - vfio_pci_intx_unmask(vdev); + __vfio_pci_intx_unmask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { uint8_t unmask = *(uint8_t *)data; if (unmask) - vfio_pci_intx_unmask(vdev); + __vfio_pci_intx_unmask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { int32_t fd = *(int32_t *)data; if (fd >= 0) @@ -458,11 +477,11 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_device *vdev, return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_NONE) { - vfio_pci_intx_mask(vdev); + __vfio_pci_intx_mask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { uint8_t mask = *(uint8_t *)data; if (mask) - vfio_pci_intx_mask(vdev); + __vfio_pci_intx_mask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { return -ENOTTY; /* XXX implement me */ }
From: Alex Williamson alex.williamson@redhat.com
stable inclusion from stable-v5.15.154 commit 4cb0d7532126d23145329826c38054b4e9a05e7c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9E6TE CVE: CVE-2024-26812
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 18c198c96a815c962adc2b9b77909eec0be7df4d ]
A vulnerability exists where the eventfd for INTx signaling can be deconfigured, which unregisters the IRQ handler but still allows eventfds to be signaled with a NULL context through the SET_IRQS ioctl or through unmask irqfd if the device interrupt is pending.
Ideally this could be solved with some additional locking; the igate mutex serializes the ioctl and config space accesses, and the interrupt handler is unregistered relative to the trigger, but the irqfd path runs asynchronous to those. The igate mutex cannot be acquired from the atomic context of the eventfd wake function. Disabling the irqfd relative to the eventfd registration is potentially incompatible with existing userspace.
As a result, the solution implemented here moves configuration of the INTx interrupt handler to track the lifetime of the INTx context object and irq_type configuration, rather than registration of a particular trigger eventfd. Synchronization is added between the ioctl path and eventfd_signal() wrapper such that the eventfd trigger can be dynamically updated relative to in-flight interrupts or irqfd callbacks.
Cc: stable@vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Eric Auger eric.auger@redhat.com Link: https://lore.kernel.org/r/20240308230557.805580-5-alex.williamson@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/vfio/pci/vfio_pci_intrs.c
Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com --- drivers/vfio/pci/vfio_pci_intrs.c | 149 ++++++++++++++++-------------- 1 file changed, 82 insertions(+), 67 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 84e2cfd5fe2a..c190f418804b 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -32,8 +32,13 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) { struct vfio_pci_device *vdev = opaque;
- if (likely(is_intx(vdev) && !vdev->virq_disabled)) - eventfd_signal(vdev->ctx[0].trigger, 1); + if (likely(is_intx(vdev) && !vdev->virq_disabled)) { + struct eventfd_ctx *trigger; + + trigger = READ_ONCE(vdev->ctx[0].trigger); + if (likely(trigger)) + eventfd_signal(trigger, 1); + } }
static void __vfio_pci_intx_mask(struct vfio_pci_device *vdev) @@ -161,98 +166,104 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id) return ret; }
-static int vfio_intx_enable(struct vfio_pci_device *vdev) +static int vfio_intx_enable(struct vfio_pci_device *vdev, + struct eventfd_ctx *trigger) { + struct pci_dev *pdev = vdev->pdev; + unsigned long irqflags; + char *name; + int ret; + if (!is_irq_none(vdev)) return -EINVAL;
- if (!vdev->pdev->irq) + if (!pdev->irq) return -ENODEV;
+ name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev)); + if (!name) + return -ENOMEM; + vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL); if (!vdev->ctx) return -ENOMEM;
vdev->num_ctx = 1;
+ vdev->ctx[0].name = name; + vdev->ctx[0].trigger = trigger; + /* - * If the virtual interrupt is masked, restore it. Devices - * supporting DisINTx can be masked at the hardware level - * here, non-PCI-2.3 devices will have to wait until the - * interrupt is enabled. + * Fill the initial masked state based on virq_disabled. After + * enable, changing the DisINTx bit in vconfig directly changes INTx + * masking. igate prevents races during setup, once running masked + * is protected via irqlock. + * + * Devices supporting DisINTx also reflect the current mask state in + * the physical DisINTx bit, which is not affected during IRQ setup. + * + * Devices without DisINTx support require an exclusive interrupt. + * IRQ masking is performed at the IRQ chip. Again, igate protects + * against races during setup and IRQ handlers and irqfds are not + * yet active, therefore masked is stable and can be used to + * conditionally auto-enable the IRQ. + * + * irq_type must be stable while the IRQ handler is registered, + * therefore it must be set before request_irq(). */ vdev->ctx[0].masked = vdev->virq_disabled; - if (vdev->pci_2_3) - pci_intx(vdev->pdev, !vdev->ctx[0].masked); + if (vdev->pci_2_3) { + pci_intx(pdev, !vdev->ctx[0].masked); + irqflags = IRQF_SHARED; + } else { + irqflags = vdev->ctx[0].masked ? IRQF_NO_AUTOEN : 0; + }
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
+ ret = request_irq(pdev->irq, vfio_intx_handler, + irqflags, vdev->ctx[0].name, vdev); + if (ret) { + vdev->irq_type = VFIO_PCI_NUM_IRQS; + kfree(name); + vdev->num_ctx = 0; + kfree(vdev->ctx); + return ret; + } + return 0; }
-static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) +static int vfio_intx_set_signal(struct vfio_pci_device *vdev, + struct eventfd_ctx *trigger) { struct pci_dev *pdev = vdev->pdev; - unsigned long irqflags = IRQF_SHARED; - struct eventfd_ctx *trigger; - unsigned long flags; - int ret; + struct eventfd_ctx *old;
- if (vdev->ctx[0].trigger) { - free_irq(pdev->irq, vdev); - kfree(vdev->ctx[0].name); - eventfd_ctx_put(vdev->ctx[0].trigger); - vdev->ctx[0].trigger = NULL; - } - - if (fd < 0) /* Disable only */ - return 0; + old = vdev->ctx[0].trigger;
- vdev->ctx[0].name = kasprintf(GFP_KERNEL, "vfio-intx(%s)", - pci_name(pdev)); - if (!vdev->ctx[0].name) - return -ENOMEM; + WRITE_ONCE(vdev->ctx[0].trigger, trigger);
- trigger = eventfd_ctx_fdget(fd); - if (IS_ERR(trigger)) { - kfree(vdev->ctx[0].name); - return PTR_ERR(trigger); - } - - vdev->ctx[0].trigger = trigger; - - /* - * Devices without DisINTx support require an exclusive interrupt, - * IRQ masking is performed at the IRQ chip. The masked status is - * protected by vdev->irqlock. Setup the IRQ without auto-enable and - * unmask as necessary below under lock. DisINTx is unmodified by - * the IRQ configuration and may therefore use auto-enable. - */ - if (!vdev->pci_2_3) - irqflags = IRQF_NO_AUTOEN; - - ret = request_irq(pdev->irq, vfio_intx_handler, - irqflags, vdev->ctx[0].name, vdev); - if (ret) { - vdev->ctx[0].trigger = NULL; - kfree(vdev->ctx[0].name); - eventfd_ctx_put(trigger); - return ret; + /* Releasing an old ctx requires synchronizing in-flight users */ + if (old) { + synchronize_irq(pdev->irq); + vfio_virqfd_flush_thread(&vdev->ctx[0].unmask); + eventfd_ctx_put(old); }
- spin_lock_irqsave(&vdev->irqlock, flags); - if (!vdev->pci_2_3 && !vdev->ctx[0].masked) - enable_irq(pdev->irq); - spin_unlock_irqrestore(&vdev->irqlock, flags); - return 0; }
static void vfio_intx_disable(struct vfio_pci_device *vdev) { + struct pci_dev *pdev = vdev->pdev; + vfio_virqfd_disable(&vdev->ctx[0].unmask); vfio_virqfd_disable(&vdev->ctx[0].mask); - vfio_intx_set_signal(vdev, -1); + free_irq(pdev->irq, vdev); + if (vdev->ctx[0].trigger) + eventfd_ctx_put(vdev->ctx[0].trigger); + kfree(vdev->ctx[0].name); vdev->irq_type = VFIO_PCI_NUM_IRQS; vdev->num_ctx = 0; kfree(vdev->ctx); @@ -502,19 +513,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_device *vdev, return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + struct eventfd_ctx *trigger = NULL; int32_t fd = *(int32_t *)data; int ret;
- if (is_intx(vdev)) - return vfio_intx_set_signal(vdev, fd); + if (fd >= 0) { + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) + return PTR_ERR(trigger); + }
- ret = vfio_intx_enable(vdev); - if (ret) - return ret; + if (is_intx(vdev)) + ret = vfio_intx_set_signal(vdev, trigger); + else + ret = vfio_intx_enable(vdev, trigger);
- ret = vfio_intx_set_signal(vdev, fd); - if (ret) - vfio_intx_disable(vdev); + if (ret && trigger) + eventfd_ctx_put(trigger);
return ret; }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/6062 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/6062 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D...