From: Tejun Heo tj@kernel.org
mainline inclusion from mainline-5.11-rc3 commit d16baa3f1453c14d680c5fee01cd122a22d0e0ce category: bugfix bugzilla: 47177 CVE: NA
-------------------------------------------------
When initializing iocost for a queue, its rqos should be registered before the blkcg policy is activated to allow policy data initiailization to lookup the associated ioc. This unfortunately means that the rqos methods can be called on bios before iocgs are attached to all existing blkgs.
While the race is theoretically possible on ioc_rqos_throttle(), it mostly happened in ioc_rqos_merge() due to the difference in how they lookup ioc. The former determines it from the passed in @rqos and then bails before dereferencing iocg if the looked up ioc is disabled, which most likely is the case if initialization is still in progress. The latter looked up ioc by dereferencing the possibly NULL iocg making it a lot more prone to actually triggering the bug.
* Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look up ioc for consistency.
* Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before dereferencing it.
* Explain the danger of NULL iocgs in blk_iocost_init().
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Jonathan Lemon bsd@fb.com Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-iocost.c
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-iocost.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index a3b1f1c87b0ea..92180abc43e15 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1716,8 +1716,8 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio,
iocg = blkg_to_iocg(blkg);
- /* bypass IOs if disabled or for root cgroup */ - if (!ioc->enabled || !iocg->level) + /* bypass IOs if disabled, still initializing, or for root cgroup */ + if (!ioc->enabled || !iocg || !iocg->level) return;
/* always activate so that even 0 cost IOs get protected to some level */ @@ -1857,8 +1857,8 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq, rcu_read_unlock();
iocg = blkg_to_iocg(blkg); - /* bypass if disabled or for root cgroup */ - if (!ioc->enabled || !iocg->level) + /* bypass if disabled, still initializing, or for root cgroup */ + if (!ioc->enabled || !iocg || !iocg->level) return;
abs_cost = calc_vtime_cost(bio, iocg, true); @@ -1997,6 +1997,12 @@ static int blk_iocost_init(struct request_queue *q) ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock);
+ /* + * rqos must be added before activation to allow iocg_pd_init() to + * lookup the ioc from q. This means that the rqos methods may get + * called before policy activation completion, can't assume that the + * target bio has an iocg associated and need to test for NULL iocg. + */ rq_qos_add(q, rqos); ret = blkcg_activate_policy(q, &blkcg_policy_iocost); if (ret) {
From: Yufen Yu yuyufen@huawei.com
hulk inclusion category: bugfix bugzilla: 51878 CVE: NA
-------------------------------------------------
We found that offline a sata device on hisi sas control and then scanning the host can probe 255 non-existent devices into system.
[root@localhost ~]# lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc
1) echo "offline" > /sys/block/sdb/device/state 2) echo "- - -" > /sys/class/scsi_host/host2/scan
Then, we can see another 255 non-existent devices in system: [root@localhost ~]# lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh ... [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb
After REPORT LUN command issued to the offline device fail, it tries to do a sequential scan and probe all devices whose lun is not 0 successfully.
To fix the problem, we try to do same things as commit 2fc62e2ac350 ("[SCSI] libsas: disable scanning lun > 0 on ata devices"), which will prevent the device whose lun number is not zero probe into system.
Reported-by: Wu Bo wubo40@huawei.com Suggested-by: John Garry john.garry@huawei.com Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/scsi/aic94xx/aic94xx_init.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + drivers/scsi/isci/init.c | 1 + drivers/scsi/libsas/sas_scsi_host.c | 9 +++++++++ drivers/scsi/mvsas/mv_init.c | 1 + drivers/scsi/pm8001/pm8001_init.c | 1 + 8 files changed, 16 insertions(+)
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c index 702da909cee5e..ad8a65ab489cf 100644 --- a/drivers/scsi/aic94xx/aic94xx_init.c +++ b/drivers/scsi/aic94xx/aic94xx_init.c @@ -71,6 +71,7 @@ static struct scsi_host_template aic94xx_sht = { .use_clustering = ENABLE_CLUSTERING, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, .track_queue_depth = 1, diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 746f0aca5e3bd..5c766f7508923 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -1785,6 +1785,7 @@ static struct scsi_host_template sht_v1_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, .shost_attrs = host_attrs_v1_hw, diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 703a8a81176a2..61ff065e5d9b8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -3603,6 +3603,7 @@ static struct scsi_host_template sht_v2_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, .shost_attrs = host_attrs_v2_hw, diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 3cf0255dccf13..512f96b80ecc0 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3269,6 +3269,7 @@ static struct scsi_host_template sht_v3_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, .shost_attrs = host_attrs_v3_hw, diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c index dde84f7443136..07de94ea38192 100644 --- a/drivers/scsi/isci/init.c +++ b/drivers/scsi/isci/init.c @@ -167,6 +167,7 @@ static struct scsi_host_template isci_sht = { .eh_abort_handler = sas_eh_abort_handler, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, .shost_attrs = isci_host_attrs, diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 4eeaafbe05043..0ce716a534690 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -844,6 +844,14 @@ struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy) return found_dev; }
+int sas_slave_alloc(struct scsi_device *sdev) +{ + if (dev_is_sata(sdev_to_domain_dev(sdev)) && sdev->lun) + return -ENXIO; + + return 0; +} + int sas_target_alloc(struct scsi_target *starget) { struct sas_rphy *rphy = dev_to_rphy(starget->dev.parent); @@ -988,5 +996,6 @@ EXPORT_SYMBOL_GPL(sas_task_abort); EXPORT_SYMBOL_GPL(sas_phy_reset); EXPORT_SYMBOL_GPL(sas_eh_device_reset_handler); EXPORT_SYMBOL_GPL(sas_eh_target_reset_handler); +EXPORT_SYMBOL_GPL(sas_slave_alloc); EXPORT_SYMBOL_GPL(sas_target_destroy); EXPORT_SYMBOL_GPL(sas_ioctl); diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index 8c91637cd598d..77699c08a1bd0 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -49,6 +49,7 @@ static struct scsi_host_template mvs_sht = { .module = THIS_MODULE, .name = DRV_NAME, .queuecommand = sas_queuecommand, + .slave_alloc = sas_slave_alloc, .target_alloc = sas_target_alloc, .slave_configure = sas_slave_configure, .scan_finished = mvs_scan_finished, diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c index 1d59d7447a1c8..07d753b1c0d08 100644 --- a/drivers/scsi/pm8001/pm8001_init.c +++ b/drivers/scsi/pm8001/pm8001_init.c @@ -74,6 +74,7 @@ static struct scsi_host_template pm8001_sht = { .module = THIS_MODULE, .name = DRV_NAME, .queuecommand = sas_queuecommand, + .slave_alloc = sas_slave_alloc, .target_alloc = sas_target_alloc, .slave_configure = sas_slave_configure, .scan_finished = pm8001_scan_finished,
From: Zhenzhong Duan zhenzhong.duan@oracle.com
mainline inclusion from mainline-5.1 commit 05eee619ed61c8cd89633954d38c4e5653086845 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3T22N CVE: NA
There are cases where a guest tries to switch spinlocks to bare metal behavior (e.g. by setting "xen_nopvspin" on XEN platform and "hv_nopvspin" on HYPER_V).
That feature is missed on KVM, add a new parameter "nopvspin" to disable PV spinlocks for KVM guest.
The new 'nopvspin' parameter will also replace Xen and Hyper-V specific parameters in future patches.
Define variable nopvsin as global because it will be used in future patches as above.
Signed-off-by: Zhenzhong Duan zhenzhong.duan@oracle.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krcmar rkrcmar@redhat.com Cc: Sean Christopherson sean.j.christopherson@intel.com Cc: Vitaly Kuznetsov vkuznets@redhat.com Cc: Wanpeng Li wanpengli@tencent.com Cc: Jim Mattson jmattson@google.com Cc: Joerg Roedel joro@8bytes.org Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will@kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Jiajun Chen chenjiajun8@huawei.com Reviewed-by: Xiangyou Xie xiexiangyou@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- .../admin-guide/kernel-parameters.txt | 5 +++ arch/x86/include/asm/qspinlock.h | 1 + arch/x86/kernel/kvm.c | 41 +++++++++++++++---- kernel/locking/qspinlock.c | 7 ++++ 4 files changed, 47 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 05b9b8c6034cd..40dc1e3e89d93 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5392,6 +5392,11 @@ with /sys/devices/system/xen_memory/xen_memory0/scrub_pages. Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
+ nopvspin [X86,KVM] + Disables the qspinlock slow path using PV optimizations + which allow the hypervisor to 'idle' the guest on lock + contention. + xirc2ps_cs= [NET,PCMCIA] Format: <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index e34ffb0af6bd2..4d425673d831a 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -39,6 +39,7 @@ extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); extern void __pv_init_lock_hash(void); extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock); +extern bool nopvspin;
#define queued_spin_unlock queued_spin_unlock /** diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 9c52eec69bfc6..d1ccd5bbf6231 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -843,16 +843,36 @@ void __init kvm_spinlock_init(void) { if (!kvm_para_available()) return; - /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) + /* + * In case host doesn't support KVM_FEATURE_PV_UNHALT there is still an + * advantage of keeping virt_spin_lock_key enabled: virt_spin_lock() is + * preferred over native qspinlock when vCPU is preempted. + */ + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) { + pr_info("PV spinlocks disabled, no host support\n"); return; + }
- if (kvm_para_has_hint(KVM_HINTS_REALTIME)) - return; + /* + * Disable PV spinlocks and use native qspinlock when dedicated pCPUs + * are available. + */ + if (kvm_para_has_hint(KVM_HINTS_REALTIME)) { + pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints\n"); + goto out; + }
- /* Don't use the pvqspinlock code if there is only 1 vCPU. */ - if (num_possible_cpus() == 1) - return; + if (num_possible_cpus() == 1) { + pr_info("PV spinlocks disabled, single CPU\n"); + goto out; + } + + if (nopvspin) { + pr_info("PV spinlocks disabled, forced by "nopvspin" parameter\n"); + goto out; + } + + pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash(); pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath; @@ -864,6 +884,13 @@ void __init kvm_spinlock_init(void) pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(__kvm_vcpu_is_preempted); } + /* + * When PV spinlock is enabled which is preferred over + * virt_spin_lock(), virt_spin_lock_key's value is meaningless. + * Just disable it anyway. + */ +out: + static_branch_disable(&virt_spin_lock_key); }
#endif /* CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index a33c509199fd8..fd148973080f6 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -631,4 +631,11 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath); #include "qspinlock_paravirt.h" #include "qspinlock.c"
+bool nopvspin __initdata; +static __init int parse_nopvspin(char *arg) +{ + nopvspin = true; + return 0; +} +early_param("nopvspin", parse_nopvspin); #endif
From: Andreas Gruenbacher agruenba@redhat.com
mainline inclusion from mainline-5.9-rc1 commit 856473cd5d17dbbf3055710857c67a4af6d9fcc0 category: bugfix bugzilla: 40769 CVE: NA ---------------------------
Make sure iomap_end is always called when iomap_begin succeeds.
Without this fix, iomap_end won't be called when a filesystem's iomap_begin operation returns an invalid mapping, bypassing any unlocking done in iomap_end. With this fix, the unlocking will still happen.
This bug was found by Bob Peterson during code review. It's unlikely that such iomap_begin bugs will survive to affect users, so backporting this fix seems unnecessary.
Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure") Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com [fs/iomap/apply.c not exist, instead fs/iomap.c] Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/iomap.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c index bb2f966798d3e..fd3439997d292 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -67,10 +67,14 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, ret = ops->iomap_begin(inode, pos, length, flags, &iomap); if (ret) return ret; - if (WARN_ON(iomap.offset > pos)) - return -EIO; - if (WARN_ON(iomap.length == 0)) - return -EIO; + if (WARN_ON(iomap.offset > pos)) { + written = -EIO; + goto out; + } + if (WARN_ON(iomap.length == 0)) { + written = -EIO; + goto out; + }
/* * Cut down the length to the one actually provided by the filesystem, @@ -86,6 +90,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, */ written = actor(inode, pos, length, data, &iomap);
+out: /* * Now the data has been copied, commit the range we've copied. This * should not fail unless the filesystem has had a fatal error.