[PATCH OLK-5.10 0/8] CVE-2024-21823
Arjan van de Ven (2): VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist dmaengine: idxd: add a new security check to deal with a hardware erratum Dave Jiang (2): dmaengine: idxd: add per DSA wq workqueue for processing cr faults dmaengine: idxd: Fix ->poll() return value Fenghua Yu (1): dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling Harshit Mogalapalli (1): dmaengine: idxd: Fix passing freed memory in idxd_cdev_open() Nikhil Rao (1): dmaengine: idxd: add a write() method for applications to submit work Vinicius Costa Gomes (1): dmaengine: idxd: Fix allowing write() from different address spaces drivers/dma/idxd/cdev.c | 205 ++++++++++++++++++++++++++++++++++- drivers/dma/idxd/idxd.h | 10 ++ drivers/dma/idxd/init.c | 6 + drivers/dma/idxd/registers.h | 3 - drivers/dma/idxd/sysfs.c | 28 ++++- drivers/vfio/pci/vfio_pci.c | 2 + include/linux/pci_ids.h | 2 + 7 files changed, 245 insertions(+), 11 deletions(-) -- 2.34.3
From: Arjan van de Ven <arjan@linux.intel.com> mainline inclusion from mainline-v6.10-rc1 commit 95feb3160eef0caa6018e175a5560b816aee8e79 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Due to an erratum with the SPR_DSA and SPR_IAX devices, it is not secure to assign these devices to virtual machines. Add the PCI IDs of these devices to the VFIO denylist to ensure that this is handled appropriately by the VFIO subsystem. The SPR_DSA and SPR_IAX devices are on-SOC devices for the Sapphire Rapids (and related) family of products that perform data movement and compression. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Conflicts: drivers/dma/idxd/registers.h include/linux/pci_ids.h [Contextual conflict of macro definitions.] Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/registers.h | 3 --- drivers/vfio/pci/vfio_pci.c | 2 ++ include/linux/pci_ids.h | 2 ++ 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index fe3b8d04f9db..fdfe7930f183 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -2,13 +2,10 @@ /* Copyright(c) 2019 Intel Corporation. All rights rsvd. */ #ifndef _IDXD_REGISTERS_H_ #define _IDXD_REGISTERS_H_ /* PCI Config */ -#define PCI_DEVICE_ID_INTEL_DSA_SPR0 0x0b25 -#define PCI_DEVICE_ID_INTEL_IAX_SPR0 0x0cfe - #define DEVICE_VERSION_1 0x100 #define DEVICE_VERSION_2 0x200 #define IDXD_MMIO_BAR 0 #define IDXD_WQ_BAR 2 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index c04b34247067..4c593554f6a0 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -137,10 +137,12 @@ static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev) case PCI_DEVICE_ID_INTEL_QAT_C3XXX_VF: case PCI_DEVICE_ID_INTEL_QAT_C62X: case PCI_DEVICE_ID_INTEL_QAT_C62X_VF: case PCI_DEVICE_ID_INTEL_QAT_DH895XCC: case PCI_DEVICE_ID_INTEL_QAT_DH895XCC_VF: + case PCI_DEVICE_ID_INTEL_DSA_SPR0: + case PCI_DEVICE_ID_INTEL_IAX_SPR0: return true; default: return false; } } diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 53d27d4d888c..cc5c372744c9 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2694,11 +2694,13 @@ #define PCI_DEVICE_ID_INTEL_MFD_EMMC1 0x0824 #define PCI_DEVICE_ID_INTEL_MRST_SD2 0x084F #define PCI_DEVICE_ID_INTEL_QUARK_X1000_ILB 0x095E #define PCI_DEVICE_ID_INTEL_I960 0x0960 #define PCI_DEVICE_ID_INTEL_I960RM 0x0962 +#define PCI_DEVICE_ID_INTEL_DSA_SPR0 0x0b25 #define PCI_DEVICE_ID_INTEL_CENTERTON_ILB 0x0c60 +#define PCI_DEVICE_ID_INTEL_IAX_SPR0 0x0cfe #define PCI_DEVICE_ID_INTEL_8257X_SOL 0x1062 #define PCI_DEVICE_ID_INTEL_82573E_SOL 0x1085 #define PCI_DEVICE_ID_INTEL_82573L_SOL 0x108F #define PCI_DEVICE_ID_INTEL_82815_MC 0x1130 #define PCI_DEVICE_ID_INTEL_82815_CGC 0x1132 -- 2.34.3
From: Arjan van de Ven <arjan@linux.intel.com> mainline inclusion from mainline-v6.10-rc1 commit e11452eb071b2a8e6ba52892b2e270bbdaa6640d category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- On Sapphire Rapids and related platforms, the DSA and IAA devices have an erratum that causes direct access (for example, by using the ENQCMD or MOVDIR64 instructions) from untrusted applications to be a security problem. To solve this, add a flag to the PCI device enumeration and device structures to indicate the presence/absence of this security exposure. In the mmap() method of the device, this flag is then used to enforce that the user has the CAP_SYS_RAWIO capability. In a future patch, a write() based method will be added that allows untrusted applications submit work to the accelerator, where the kernel can do sanity checking on the user input to ensure secure operation of the accelerator. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Conflicts: drivers/dma/idxd/cdev.c drivers/dma/idxd/idxd.h drivers/dma/idxd/init.c [The mainline code has undergone multiple iterations, leading to conflicts in the structure context.] Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 12 ++++++++++++ drivers/dma/idxd/idxd.h | 3 +++ drivers/dma/idxd/init.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index a9b96b18772f..2138c993f207 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -196,10 +196,22 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) phys_addr_t base = pci_resource_start(pdev, IDXD_WQ_BAR); unsigned long pfn; int rc; dev_dbg(&pdev->dev, "%s called\n", __func__); + + /* + * Due to an erratum in some of the devices supported by the driver, + * direct user submission to the device can be unsafe. + * (See the INTEL-SA-01084 security advisory) + * + * For the devices that exhibit this behavior, require that the user + * has CAP_SYS_RAWIO capabilities. + */ + if (!idxd->user_submission_safe && !capable(CAP_SYS_RAWIO)) + return -EPERM; + rc = check_vma(wq, vma, __func__); if (rc < 0) return rc; vma->vm_flags |= VM_DONTCOPY; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 7ced8d283d98..14c6ef987fed 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -256,10 +256,11 @@ struct idxd_driver_data { const char *name_prefix; enum idxd_type type; struct device_type *dev_type; int compl_size; int align; + bool user_submission_safe; }; struct idxd_device { struct idxd_dev idxd_dev; struct idxd_driver_data *data; @@ -314,10 +315,12 @@ struct idxd_device { struct work_struct work; struct idxd_pmu *idxd_pmu; unsigned long *opcap_bmap; + + bool user_submission_safe; }; /* IDXD software descriptor */ struct idxd_desc { union { diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index ec61449e2adc..cdc4043471fb 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -46,17 +46,19 @@ static struct idxd_driver_data idxd_driver_data[] = { .name_prefix = "dsa", .type = IDXD_TYPE_DSA, .compl_size = sizeof(struct dsa_completion_record), .align = 32, .dev_type = &dsa_device_type, + .user_submission_safe = false, /* See INTEL-SA-01084 security advisory */ }, [IDXD_TYPE_IAX] = { .name_prefix = "iax", .type = IDXD_TYPE_IAX, .compl_size = sizeof(struct iax_completion_record), .align = 64, .dev_type = &iax_device_type, + .user_submission_safe = false, /* See INTEL-SA-01084 security advisory */ }, }; static struct pci_device_id idxd_pci_tbl[] = { /* DSA ver 1.0 platforms */ @@ -669,10 +671,12 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) } dev_info(&pdev->dev, "Intel(R) Accelerator Device (v%x)\n", idxd->hw.version); + idxd->user_submission_safe = data->user_submission_safe; + return 0; err_dev_register: idxd_cleanup(idxd); err: -- 2.34.3
From: Nikhil Rao <nikhil.rao@intel.com> mainline inclusion from mainline-v6.10-rc1 commit 6827738dc684a87ad54ebba3ae7f3d7c977698eb category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- After the patch to restrict the use of mmap() to CAP_SYS_RAWIO for the currently existing devices, most applications can no longer make use of the accelerators as in production "you don't run things as root". To keep the DSA and IAA accelerators usable, hook up a write() method so that applications can still submit work. In the write method, sufficient input validation is performed to avoid the security issue that required the mmap CAP_SYS_RAWIO check. One complication is that the DSA device allows for indirect ("batched") descriptors. There is no reasonable way to do the input validation on these indirect descriptors so the write() method will not allow these to be submitted to the hardware on affected hardware, and the sysfs enumeration of support for the opcode is also removed. Early performance data shows that the performance delta for most common cases is within the noise. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 65 ++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/sysfs.c | 27 +++++++++++++++-- 2 files changed, 90 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 2138c993f207..9f8adb7013eb 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -222,10 +222,74 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, vma->vm_page_prot); } +static int idxd_submit_user_descriptor(struct idxd_user_context *ctx, + struct dsa_hw_desc __user *udesc) +{ + struct idxd_wq *wq = ctx->wq; + struct idxd_dev *idxd_dev = &wq->idxd->idxd_dev; + const uint64_t comp_addr_align = is_dsa_dev(idxd_dev) ? 0x20 : 0x40; + void __iomem *portal = idxd_wq_portal_addr(wq); + struct dsa_hw_desc descriptor __aligned(64); + int rc; + + rc = copy_from_user(&descriptor, udesc, sizeof(descriptor)); + if (rc) + return -EFAULT; + + /* + * DSA devices are capable of indirect ("batch") command submission. + * On devices where direct user submissions are not safe, we cannot + * allow this since there is no good way for us to verify these + * indirect commands. + */ + if (is_dsa_dev(idxd_dev) && descriptor.opcode == DSA_OPCODE_BATCH && + !wq->idxd->user_submission_safe) + return -EINVAL; + /* + * As per the programming specification, the completion address must be + * aligned to 32 or 64 bytes. If this is violated the hardware + * engine can get very confused (security issue). + */ + if (!IS_ALIGNED(descriptor.completion_addr, comp_addr_align)) + return -EINVAL; + + if (wq_dedicated(wq)) + iosubmit_cmds512(portal, &descriptor, 1); + else { + descriptor.priv = 0; + descriptor.pasid = ctx->pasid; + rc = idxd_enqcmds(wq, portal, &descriptor); + if (rc < 0) + return rc; + } + + return 0; +} + +static ssize_t idxd_cdev_write(struct file *filp, const char __user *buf, size_t len, + loff_t *unused) +{ + struct dsa_hw_desc __user *udesc = (struct dsa_hw_desc __user *)buf; + struct idxd_user_context *ctx = filp->private_data; + ssize_t written = 0; + int i; + + for (i = 0; i < len/sizeof(struct dsa_hw_desc); i++) { + int rc = idxd_submit_user_descriptor(ctx, udesc + i); + + if (rc) + return written ? written : rc; + + written += sizeof(struct dsa_hw_desc); + } + + return written; +} + static __poll_t idxd_cdev_poll(struct file *filp, struct poll_table_struct *wait) { struct idxd_user_context *ctx = filp->private_data; struct idxd_wq *wq = ctx->wq; @@ -244,10 +308,11 @@ static __poll_t idxd_cdev_poll(struct file *filp, static const struct file_operations idxd_cdev_fops = { .owner = THIS_MODULE, .open = idxd_cdev_open, .release = idxd_cdev_release, .mmap = idxd_cdev_mmap, + .write = idxd_cdev_write, .poll = idxd_cdev_poll, }; int idxd_cdev_get_major(struct idxd_device *idxd) { diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 3229dfc78650..c73989ec7d01 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1160,16 +1160,39 @@ static ssize_t wq_enqcmds_retries_store(struct device *dev, struct device_attrib } static struct device_attribute dev_attr_wq_enqcmds_retries = __ATTR(enqcmds_retries, 0644, wq_enqcmds_retries_show, wq_enqcmds_retries_store); +static ssize_t op_cap_show_common(struct device *dev, char *buf, unsigned long *opcap_bmap) +{ + ssize_t pos; + int i; + + pos = 0; + for (i = IDXD_MAX_OPCAP_BITS/64 - 1; i >= 0; i--) { + unsigned long val = opcap_bmap[i]; + + /* On systems where direct user submissions are not safe, we need to clear out + * the BATCH capability from the capability mask in sysfs since we cannot support + * that command on such systems. + */ + if (i == DSA_OPCODE_BATCH/64 && !confdev_to_idxd(dev)->user_submission_safe) + clear_bit(DSA_OPCODE_BATCH % 64, &val); + + pos += sysfs_emit_at(buf, pos, "%*pb", 64, &val); + pos += sysfs_emit_at(buf, pos, "%c", i == 0 ? '\n' : ','); + } + + return pos; +} + static ssize_t wq_op_config_show(struct device *dev, struct device_attribute *attr, char *buf) { struct idxd_wq *wq = confdev_to_wq(dev); - return sysfs_emit(buf, "%*pb\n", IDXD_MAX_OPCAP_BITS, wq->opcap_bmap); + return op_cap_show_common(dev, buf, wq->opcap_bmap); } static int idxd_verify_supported_opcap(struct idxd_device *idxd, unsigned long *opmask) { int bit; @@ -1379,11 +1402,11 @@ static DEVICE_ATTR_RO(max_transfer_size); static ssize_t op_cap_show(struct device *dev, struct device_attribute *attr, char *buf) { struct idxd_device *idxd = confdev_to_idxd(dev); - return sysfs_emit(buf, "%*pb\n", IDXD_MAX_OPCAP_BITS, idxd->opcap_bmap); + return op_cap_show_common(dev, buf, idxd->opcap_bmap); } static DEVICE_ATTR_RO(op_cap); static ssize_t gen_cap_show(struct device *dev, struct device_attribute *attr, char *buf) -- 2.34.3
From: Dave Jiang <dave.jiang@intel.com> mainline inclusion from mainline-v6.10-rc1 commit 2f30decd2f23a376d2ed73dfe4c601421edf501a category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Add a workqueue for user submitted completion record fault processing. The workqueue creation and destruction lifetime will be tied to the user sub-driver since it will only be used when the wq is a user type. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-7-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 11 +++++++++++ drivers/dma/idxd/idxd.h | 1 + 2 files changed, 12 insertions(+) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 9f8adb7013eb..e2a89873c6e1 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -406,10 +406,17 @@ static int idxd_user_drv_probe(struct idxd_dev *idxd_dev) return -EOPNOTSUPP; } mutex_lock(&wq->wq_lock); + + wq->wq = create_workqueue(dev_name(wq_confdev(wq))); + if (!wq->wq) { + rc = -ENOMEM; + goto wq_err; + } + wq->type = IDXD_WQT_USER; rc = drv_enable_wq(wq); if (rc < 0) goto err; @@ -424,11 +431,13 @@ static int idxd_user_drv_probe(struct idxd_dev *idxd_dev) return 0; err_cdev: drv_disable_wq(wq); err: + destroy_workqueue(wq->wq); wq->type = IDXD_WQT_NONE; +wq_err: mutex_unlock(&wq->wq_lock); return rc; } static void idxd_user_drv_remove(struct idxd_dev *idxd_dev) @@ -437,10 +446,12 @@ static void idxd_user_drv_remove(struct idxd_dev *idxd_dev) mutex_lock(&wq->wq_lock); idxd_wq_del_cdev(wq); drv_disable_wq(wq); wq->type = IDXD_WQT_NONE; + destroy_workqueue(wq->wq); + wq->wq = NULL; mutex_unlock(&wq->wq_lock); } static enum idxd_dev_type dev_types[] = { IDXD_DEV_WQ, diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 14c6ef987fed..5dbb67ff1c0c 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -183,10 +183,11 @@ struct idxd_wq { struct completion wq_dead; struct completion wq_resurrect; struct idxd_dev idxd_dev; struct idxd_cdev *idxd_cdev; struct wait_queue_head err_queue; + struct workqueue_struct *wq; struct idxd_device *idxd; int id; struct idxd_irq_entry ie; enum idxd_wq_type type; struct idxd_group *group; -- 2.34.3
From: Fenghua Yu <fenghua.yu@intel.com> mainline inclusion from mainline-v6.10-rc1 commit b022f59725f0ae846191abbd6d2e611d7f60f826 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Define idxd_copy_cr() to copy completion record to fault address in user address that is found by work queue (wq) and PASID. It will be used to write the user's completion record that the hardware device is not able to write due to user completion record page fault. An xarray is added to associate the PASID and mm with the struct idxd_user_context so mm can be found by PASID and wq. It is called when handling the completion record fault in a kernel thread context. Switch to the mm using kthread_use_vm() and copy the completion record to the mm via copy_to_user(). Once the copy is completed, switch back to the current mm using kthread_unuse_mm(). Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-9-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 107 +++++++++++++++++++++++++++++++++++++-- drivers/dma/idxd/idxd.h | 6 +++ drivers/dma/idxd/init.c | 2 + drivers/dma/idxd/sysfs.c | 1 + 4 files changed, 111 insertions(+), 5 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index e2a89873c6e1..c7aa47f01df0 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -10,11 +10,13 @@ #include <linux/io-64-nonatomic-lo-hi.h> #include <linux/cdev.h> #include <linux/fs.h> #include <linux/poll.h> #include <linux/iommu.h> +#include <linux/highmem.h> #include <uapi/linux/idxd.h> +#include <linux/xarray.h> #include "registers.h" #include "idxd.h" struct idxd_cdev_context { const char *name; @@ -33,10 +35,11 @@ static struct idxd_cdev_context ictx[IDXD_TYPE_MAX] = { struct idxd_user_context { struct idxd_wq *wq; struct task_struct *task; unsigned int pasid; + struct mm_struct *mm; unsigned int flags; struct iommu_sva *sva; }; static void idxd_cdev_dev_release(struct device *dev) @@ -67,10 +70,23 @@ static inline struct idxd_wq *inode_wq(struct inode *inode) struct idxd_cdev *idxd_cdev = inode_idxd_cdev(inode); return idxd_cdev->wq; } +static void idxd_xa_pasid_remove(struct idxd_user_context *ctx) +{ + struct idxd_wq *wq = ctx->wq; + void *ptr; + + mutex_lock(&wq->uc_lock); + ptr = xa_cmpxchg(&wq->upasid_xa, ctx->pasid, ctx, NULL, GFP_KERNEL); + if (ptr != (void *)ctx) + dev_warn(&wq->idxd->pdev->dev, "xarray cmpxchg failed for pasid %u\n", + ctx->pasid); + mutex_unlock(&wq->uc_lock); +} + static int idxd_cdev_open(struct inode *inode, struct file *filp) { struct idxd_user_context *ctx; struct idxd_device *idxd; struct idxd_wq *wq; @@ -107,33 +123,45 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp) goto failed; } pasid = iommu_sva_get_pasid(sva); if (pasid == IOMMU_PASID_INVALID) { - iommu_sva_unbind_device(sva); rc = -EINVAL; - goto failed; + goto failed_get_pasid; } ctx->sva = sva; ctx->pasid = pasid; + ctx->mm = current->mm; + + mutex_lock(&wq->uc_lock); + rc = xa_insert(&wq->upasid_xa, pasid, ctx, GFP_KERNEL); + mutex_unlock(&wq->uc_lock); + if (rc < 0) + dev_warn(dev, "PASID entry already exist in xarray.\n"); if (wq_dedicated(wq)) { rc = idxd_wq_set_pasid(wq, pasid); if (rc < 0) { iommu_sva_unbind_device(sva); dev_err(dev, "wq set pasid failed: %d\n", rc); - goto failed; + goto failed_set_pasid; } } } idxd_wq_get(wq); mutex_unlock(&wq->wq_lock); return 0; - failed: +failed_set_pasid: + if (device_user_pasid_enabled(idxd)) + idxd_xa_pasid_remove(ctx); +failed_get_pasid: + if (device_user_pasid_enabled(idxd)) + iommu_sva_unbind_device(sva); +failed: mutex_unlock(&wq->wq_lock); kfree(ctx); return rc; } @@ -160,12 +188,14 @@ static int idxd_cdev_release(struct inode *node, struct file *filep) } else { idxd_wq_drain(wq); } } - if (ctx->sva) + if (ctx->sva) { iommu_sva_unbind_device(ctx->sva); + idxd_xa_pasid_remove(ctx); + } kfree(ctx); mutex_lock(&wq->wq_lock); idxd_wq_put(wq); mutex_unlock(&wq->wq_lock); return 0; @@ -494,5 +524,72 @@ void idxd_cdev_remove(void) for (i = 0; i < IDXD_TYPE_MAX; i++) { unregister_chrdev_region(ictx[i].devt, MINORMASK); ida_destroy(&ictx[i].minor_ida); } } + +/** + * idxd_copy_cr - copy completion record to user address space found by wq and + * PASID + * @wq: work queue + * @pasid: PASID + * @addr: user fault address to write + * @cr: completion record + * @len: number of bytes to copy + * + * This is called by a work that handles completion record fault. + * + * Return: number of bytes copied. + */ +int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, + void *cr, int len) +{ + struct device *dev = &wq->idxd->pdev->dev; + int left = len, status_size = 1; + struct idxd_user_context *ctx; + struct mm_struct *mm; + + mutex_lock(&wq->uc_lock); + + ctx = xa_load(&wq->upasid_xa, pasid); + if (!ctx) { + dev_warn(dev, "No user context\n"); + goto out; + } + + mm = ctx->mm; + /* + * The completion record fault handling work is running in kernel + * thread context. It temporarily switches to the mm to copy cr + * to addr in the mm. + */ + kthread_use_mm(mm); + left = copy_to_user((void __user *)addr + status_size, cr + status_size, + len - status_size); + /* + * Copy status only after the rest of completion record is copied + * successfully so that the user gets the complete completion record + * when a non-zero status is polled. + */ + if (!left) { + u8 status; + + /* + * Ensure that the completion record's status field is written + * after the rest of the completion record has been written. + * This ensures that the user receives the correct completion + * record information once polling for a non-zero status. + */ + wmb(); + status = *(u8 *)cr; + if (put_user(status, (u8 __user *)addr)) + left += status_size; + } else { + left += status_size; + } + kthread_unuse_mm(mm); + +out: + mutex_unlock(&wq->uc_lock); + + return len - left; +} diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 5dbb67ff1c0c..c3ace4aed0fc 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -213,10 +213,14 @@ struct idxd_wq { struct sbitmap_queue sbq; struct idxd_dma_chan *idxd_chan; char name[WQ_NAME_SIZE + 1]; u64 max_xfer_bytes; u32 max_batch_size; + + /* Lock to protect upasid_xa access. */ + struct mutex uc_lock; + struct xarray upasid_xa; }; struct idxd_engine { struct idxd_dev idxd_dev; int id; @@ -664,10 +668,12 @@ void idxd_dma_complete_txd(struct idxd_desc *desc, int idxd_cdev_register(void); void idxd_cdev_remove(void); int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, + void *buf, int len); /* perfmon */ #if IS_ENABLED(CONFIG_INTEL_IDXD_PERFMON) int perfmon_pmu_init(struct idxd_device *idxd); void perfmon_pmu_remove(struct idxd_device *idxd); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index cdc4043471fb..25bc0313db49 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -203,10 +203,12 @@ static int idxd_setup_wqs(struct idxd_device *idxd) rc = -ENOMEM; goto err; } bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS); } + mutex_init(&wq->uc_lock); + xa_init(&wq->upasid_xa); idxd->wqs[i] = wq; } return 0; diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index c73989ec7d01..f900976acf7e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1313,10 +1313,11 @@ static void idxd_conf_wq_release(struct device *dev) { struct idxd_wq *wq = confdev_to_wq(dev); bitmap_free(wq->opcap_bmap); kfree(wq->wqcfg); + xa_destroy(&wq->upasid_xa); kfree(wq); } struct device_type idxd_wq_device_type = { .name = "wq", -- 2.34.3
From: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> mainline inclusion from mainline-v6.10-rc1 commit 0642287e3ecdd0d1f88e6a2e63768e16153a990c category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Smatch warns: drivers/dma/idxd/cdev.c:327: idxd_cdev_open() warn: 'sva' was already freed. When idxd_wq_set_pasid() fails, the current code unbinds sva and then goes to 'failed_set_pasid' where iommu_sva_unbind_device is called again causing the above warning. [ device_user_pasid_enabled(idxd) is still true when calling failed_set_pasid ] Fix this by removing additional unbind when idxd_wq_set_pasid() fails Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230509060716.2830630-1-harshit.m.mogalapalli@ora... Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index c7aa47f01df0..6701b2265b7f 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -140,11 +140,10 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp) dev_warn(dev, "PASID entry already exist in xarray.\n"); if (wq_dedicated(wq)) { rc = idxd_wq_set_pasid(wq, pasid); if (rc < 0) { - iommu_sva_unbind_device(sva); dev_err(dev, "wq set pasid failed: %d\n", rc); goto failed_set_pasid; } } } -- 2.34.3
From: Vinicius Costa Gomes <vinicius.gomes@intel.com> mainline inclusion from mainline-v6.10-rc1 commit 8dfa57aabff625bf445548257f7711ef294cd30e category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Check if the process submitting the descriptor belongs to the same address space as the one that opened the file, reject otherwise. Fixes: 6827738dc684 ("dmaengine: idxd: add a write() method for applications to submit work") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20250421170337.3008875-1-dave.jiang@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Conflicts: drivers/dma/idxd/cdev.c [Commit 1c71222e5f23 (mm: replace vma->vm_flags direct modifications with modifier calls) not merged, so the absence of the vm_flags_set modifier function has caused a context conflict.] Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 6701b2265b7f..5c39a29dbde6 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -237,10 +237,13 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) * has CAP_SYS_RAWIO capabilities. */ if (!idxd->user_submission_safe && !capable(CAP_SYS_RAWIO)) return -EPERM; + if (current->mm != ctx->mm) + return -EPERM; + rc = check_vma(wq, vma, __func__); if (rc < 0) return rc; vma->vm_flags |= VM_DONTCOPY; @@ -303,10 +306,13 @@ static ssize_t idxd_cdev_write(struct file *filp, const char __user *buf, size_t struct dsa_hw_desc __user *udesc = (struct dsa_hw_desc __user *)buf; struct idxd_user_context *ctx = filp->private_data; ssize_t written = 0; int i; + if (current->mm != ctx->mm) + return -EPERM; + for (i = 0; i < len/sizeof(struct dsa_hw_desc); i++) { int rc = idxd_submit_user_descriptor(ctx, udesc + i); if (rc) return written ? written : rc; @@ -323,10 +329,13 @@ static __poll_t idxd_cdev_poll(struct file *filp, struct idxd_user_context *ctx = filp->private_data; struct idxd_wq *wq = ctx->wq; struct idxd_device *idxd = wq->idxd; __poll_t out = 0; + if (current->mm != ctx->mm) + return -EPERM; + poll_wait(filp, &wq->err_queue, wait); spin_lock(&idxd->dev_lock); if (idxd->sw_err.valid) out = EPOLLIN | EPOLLRDNORM; spin_unlock(&idxd->dev_lock); -- 2.34.3
From: Dave Jiang <dave.jiang@intel.com> mainline inclusion from mainline-v6.10-rc1 commit ae74cd15ade833adc289279b5c6f12e78f64d4d7 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/4650 CVE: CVE-2024-21823 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The fix to block access from different address space did not return a correct value for ->poll() change. kernel test bot reported that a return value of type __poll_t is expected rather than int. Fix to return POLLNVAL to indicate invalid request. Fixes: 8dfa57aabff6 ("dmaengine: idxd: Fix allowing write() from different address spaces") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505081851.rwD7jVxg-lkp@intel.com/ Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20250508170548.2747425-1-dave.jiang@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> --- drivers/dma/idxd/cdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 5c39a29dbde6..9b07474f450b 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -330,11 +330,11 @@ static __poll_t idxd_cdev_poll(struct file *filp, struct idxd_wq *wq = ctx->wq; struct idxd_device *idxd = wq->idxd; __poll_t out = 0; if (current->mm != ctx->mm) - return -EPERM; + return POLLNVAL; poll_wait(filp, &wq->err_queue, wait); spin_lock(&idxd->dev_lock); if (idxd->sw_err.valid) out = EPOLLIN | EPOLLRDNORM; -- 2.34.3
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/20052 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GUZ... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/20052 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GUZ...
participants (2)
-
patchwork bot -
Wang Zhaolong