From: Peng Wu wupeng58@huawei.com
ascend inclusion category: bugfix bugzilla: NA CVE: NA
-------------------------------------------
Setting an initial value to variable node_id in function shmem_getpage_gfp. Otherwise, Oops is triggered in some scenarios.
[20987.530901] Internal error: Oops: 96000007 [#1] SMP [20987.541162] Modules linked in: cfg80211 rfkill ib_isert iscsi_target_mod rpcrdma ib_srpt target_core_mod dm_mirror dm_region_hash ib_srp scsi_transport_srp dm_log sunrpc dm_mod ib_ipoib rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm ib_cm iw_cm aes_ce_blk crypto_simd cryptd hns_roce_hw_v2 aes_ce_cipher ghash_ce hns_roce sha1_ce ib_core sg ipmi_ssif hi_sfc sbsa_gwdt mtd sch_fq_codel ip_tables realtek hclge hinic sha2_ce sha256_arm64 hns3 ipmi_si hisi_sas_v3_hw hibmc_drm host_edma_drv hnae3 hisi_sas_main ipmi_devintf ipmi_msghandler [20987.639396] Process move_pages03 (pid: 40173, stack limit = 0x00000000804b9d00) [20987.654773] CPU: 50 PID: 40173 Comm: move_pages03 Kdump: loaded Not tainted 4.19.195+ #1 [20987.671794] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.08 12/14/2019 [20987.687355] pstate: 80400009 (Nzcv daif +PAN -UAO) [20987.697433] pc : __alloc_pages_nodemask+0x7c/0xdc0 [20987.707510] lr : alloc_pages_vma+0xac/0x318 [20987.716304] sp : ffff0001537cb690 [20987.723268] x29: ffff0001537cb690 x28: 00000000006200ca [20987.734439] x27: 0000000000000000 x26: ffff802fd24439c8 [20987.745610] x25: 0000000000000000 x24: 00000000ffff0000 [20987.756782] x23: 0000000000000000 x22: 0000000000000000 [20987.767952] x21: 00000000ffff0000 x20: ffff000009b69000 [20987.779123] x19: ffff802fd24439c8 x18: 0000000000000000 [20987.790294] x17: 0000000000000000 x16: 0000000000000000 [20987.801466] x15: 0000000000000000 x14: 0000000000000000 [20987.812637] x13: 0000000000000000 x12: 0000000000000000 [20987.823808] x11: ffff000009b69748 x10: 0000000000000040 [20987.834978] x9 : 0000000000000000 x8 : ffff0001537cb978 [20987.846149] x7 : 0000000000000000 x6 : 000000000000003f [20987.857320] x5 : 0000000000000000 x4 : 00000000007fffff [20987.868491] x3 : ffff000009b6c998 x2 : 0000000000000000 [20987.879662] x1 : 0000000000250015 x0 : ffff000009b69788 [20987.890833] Call trace: [20987.895970] __alloc_pages_nodemask+0x7c/0xdc0 [20987.905312] alloc_pages_vma+0xac/0x318 [20987.913374] shmem_alloc_page+0x6c/0xc0 [20987.921436] shmem_alloc_and_acct_page+0x124/0x1f8 [20987.931510] shmem_getpage_gfp+0x16c/0x1028 [20987.940305] shmem_fault+0x94/0x2a0 [20987.947636] __do_fault+0x50/0x220 [20987.954784] do_shared_fault+0x28/0x228 [20987.962846] __handle_mm_fault+0x610/0x8f0 [20987.971457] handle_mm_fault+0xe4/0x1d8 [20987.979520] do_page_fault+0x210/0x4f8 [20987.987398] do_translation_fault+0xa8/0xbc [20987.996192] do_mem_abort+0x68/0x118 [20988.003706] el0_da+0x24/0x28 [20988.009941] Code: b9404c64 72a004a1 b9401062 0a04039c (f875d800)
Fixes: d3edfd4f60bae ("share_pool: Alloc shared memory on a specified memory node") Signed-off-by: Peng Wu wupeng58@huawei.com Reviewed-by: 为珑 陈 chenweilong@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/shmem.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c index f08d5ce17a092..4522348cfc189 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1737,7 +1737,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, int error; int once = 0; int alloced = 0; - int node_id; + int node_id = shmem_node_id(vma);
if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return -EFBIG; @@ -1889,7 +1889,6 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, goto alloc_nohuge; }
- node_id = shmem_node_id(vma);
alloc_huge: page = shmem_alloc_and_acct_page(gfp, inode, index, true,
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-5.10-rc2 commit 65ff5cd04551daf2c11c7928e48fc3483391c900 category: bugfix bugzilla: 45589 CVE: NA
-------------------------------------------------
Mark flush request as IDLE in its .end_io(), aligning it with how normal requests behave. The flush request stays in in-flight tags if we're not using an IO scheduler, so we need to change its state into IDLE. Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during error recovery because flush the request state is kept as COMPLETED.
Reported-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Cc: Chao Leng lengchao@huawei.com Cc: Sagi Grimberg sagi@grimberg.me Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-flush.c
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-flush.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/blk-flush.c b/block/blk-flush.c index 2a8369eb6c1cb..c1ba915658a2c 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -233,6 +233,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) /* release the tag's ownership to the req cloned from */ spin_lock_irqsave(&fq->mq_flush_lock, flags);
+ WRITE_ONCE(flush_rq->state, MQ_RQ_IDLE); if (!refcount_dec_and_test(&flush_rq->ref)) { fq->rq_status = error; spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-5.10-rc5 commit 9f16a66733c90b5f33f624b0b0e36a345b0aaf93 category: bugfix bugzilla: 45589 CVE: NA
-------------------------------------------------
For avoiding use-after-free on flush request, we call its .end_io() from both timeout code path and __blk_mq_end_request().
When flush request's ref doesn't drop to zero, it is still used, we can't mark it as IDLE, so fix it by marking IDLE when its refcount drops to zero really.
Fixes: 65ff5cd04551 ("blk-mq: mark flush request as IDLE in flush_end_io()") Signed-off-by: Ming Lei ming.lei@redhat.com Cc: Yi Zhang yi.zhang@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-flush.c
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-flush.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c index c1ba915658a2c..c357e5c16d89c 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -233,13 +233,18 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) /* release the tag's ownership to the req cloned from */ spin_lock_irqsave(&fq->mq_flush_lock, flags);
- WRITE_ONCE(flush_rq->state, MQ_RQ_IDLE); if (!refcount_dec_and_test(&flush_rq->ref)) { fq->rq_status = error; spin_unlock_irqrestore(&fq->mq_flush_lock, flags); return; }
+ /* + * Flush request has to be marked as IDLE when it is really ended + * because its .end_io() is called from timeout code path too for + * avoiding use-after-free. + */ + WRITE_ONCE(flush_rq->state, MQ_RQ_IDLE); if (fq->rq_status != BLK_STS_OK) error = fq->rq_status;
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-5.12-rc1 commit 767630c63bb23acf022adb265574996ca39a4645 category: bugfix bugzilla: 107770 CVE: NA
-------------------------------------------------
blkdev_fallocate() tries to detect whether a discard raced with an overlapping write by calling invalidate_inode_pages2_range(). However this check can give both false negatives (when writing using direct IO or when writeback already writes out the written pagecache range) and false positives (when write is not actually overlapping but ends in the same page when blocksize < pagesize). This actually causes issues for qemu which is getting confused by EBUSY errors.
Fix the problem by removing this conflicting write detection since it is inherently racy and thus of little use anyway.
Reported-by: Maxim Levitsky mlevitsk@redhat.com CC: "Darrick J. Wong" darrick.wong@oracle.com Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.co... Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Maxim Levitsky mlevitsk@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/block_dev.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 30dd7b19bd2e3..06f73a1a1f66b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -2174,13 +2174,11 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, return error;
/* - * Invalidate again; if someone wandered in and dirtied a page, - * the caller will be given -EBUSY. The third argument is - * inclusive, so the rounding here is safe. + * Invalidate the page cache again; if someone wandered in and dirtied + * a page, we just discard it - userspace has no way of knowing whether + * the write happened before or after discard completing... */ - return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping, - start >> PAGE_SHIFT, - end >> PAGE_SHIFT); + return truncate_bdev_range(bdev, file->f_mode, start, end); }
const struct file_operations def_blk_fops = {
From: Al Viro viro@zeniv.linux.org.uk
mainline inclusion from mainline-5.9-rc7 commit 933a3752babcf6513117d5773d2b70782d6ad149 category: bugfix bugzilla: 42553 CVE: NA
-------------------------------------------------
the callers rely upon having any iov_iter_truncate() done inside ->direct_IO() countered by iov_iter_reexpand().
Reported-by: Qian Cai cai@redhat.com Tested-by: Qian Cai cai@redhat.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk
Conflict: fs/fuse/file.c commit 5da784cce430("fuse: add max_pages to init_out") is not backported, fuse_round_up only accept one paramater. Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/fuse/file.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 9c11897845728..41e2a7b567d7f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2866,11 +2866,10 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t ret = 0; struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; - bool async_dio = ff->fc->async_dio; loff_t pos = 0; struct inode *inode; loff_t i_size; - size_t count = iov_iter_count(iter); + size_t count = iov_iter_count(iter), shortened = 0; loff_t offset = iocb->ki_pos; struct fuse_io_priv *io;
@@ -2878,17 +2877,9 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) inode = file->f_mapping->host; i_size = i_size_read(inode);
- if ((iov_iter_rw(iter) == READ) && (offset > i_size)) + if ((iov_iter_rw(iter) == READ) && (offset >= i_size)) return 0;
- /* optimization for short read */ - if (async_dio && iov_iter_rw(iter) != WRITE && offset + count > i_size) { - if (offset >= i_size) - return 0; - iov_iter_truncate(iter, fuse_round_up(i_size - offset)); - count = iov_iter_count(iter); - } - io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); if (!io) return -ENOMEM; @@ -2904,15 +2895,22 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * By default, we want to optimize all I/Os with async request * submission to the client filesystem if supported. */ - io->async = async_dio; + io->async = ff->fc->async_dio; io->iocb = iocb; io->blocking = is_sync_kiocb(iocb);
+ /* optimization for short read */ + if (io->async && !io->write && offset + count > i_size) { + iov_iter_truncate(iter, fuse_round_up(i_size - offset)); + shortened = count - iov_iter_count(iter); + count -= shortened; + } + /* * We cannot asynchronously extend the size of a file. * In such case the aio will behave exactly like sync io. */ - if ((offset + count > i_size) && iov_iter_rw(iter) == WRITE) + if ((offset + count > i_size) && io->write) io->blocking = true;
if (io->async && io->blocking) { @@ -2930,6 +2928,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } else { ret = __fuse_direct_read(io, iter, &pos); } + iov_iter_reexpand(iter, iov_iter_count(iter) + shortened);
if (io->async) { bool blocking = io->blocking;
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc6 commit 3719f4ff047e20062b8314c23ec3cab84d74c908 category: bugfix bugzilla: 115514 CVE: NA
-----------------------------------------------
When scsi_add_host_with_dma() returns failure, the caller will call scsi_host_put(shost) to release everything allocated for this host instance. Consequently we can't also free allocated stuff in scsi_add_host_with_dma(), otherwise we will end up with a double free.
Strictly speaking, host resource allocations should have been done in scsi_host_alloc(). However, the allocations may need information which is not yet provided by the driver when that function is called. So leave the allocations where they are but rely on host device's release handler to free resources.
Link: https://lore.kernel.org/r/20210602133029.2864069-3-ming.lei@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: John Garry john.garry@huawei.com Cc: Hannes Reinecke hare@suse.de Tested-by: John Garry john.garry@huawei.com Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: John Garry john.garry@huawei.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/scsi/hosts.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index fa03be813f2cb..3bbbeb8e79f3c 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -282,23 +282,22 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, shost->work_q_name); if (!shost->work_q) { error = -EINVAL; - goto out_free_shost_data; + goto out_del_dev; } }
error = scsi_sysfs_add_host(shost); if (error) - goto out_destroy_host; + goto out_del_dev;
scsi_proc_host_add(shost); scsi_autopm_put_host(shost); return error;
- out_destroy_host: - if (shost->work_q) - destroy_workqueue(shost->work_q); - out_free_shost_data: - kfree(shost->shost_data); + /* + * Any host allocation in this function will be freed in + * scsi_host_dev_release(). + */ out_del_dev: device_del(&shost->shost_dev); out_del_gendev: @@ -313,8 +312,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_disable(&shost->shost_gendev); pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); - if (shost_use_blk_mq(shost)) - scsi_mq_destroy_tags(shost); fail: return error; }