Christoph Hellwig (1): block: return -EBUSY when there are open partitions in blkdev_reread_part
Eric Biggers (1): random: fix data race on crng_node_pool
Salman Qazi (1): block: Limit number of items taken from the I/O scheduler in one go
Sebastian Andrzej Siewior (1): crypto: cryptd - Protect per-CPU resource by disabling BH.
Siddh Raman Pant (1): loop: Check for overflow while configuring loop
Yu Kuai (2): blk-wbt: make enable_state more accurate block: don't set GD_NEED_PART_SCAN if scan partition failed
Zhong Jinghua (2): Revert "loop: Check for overflow while configuring loop" loop: loop_set_status_from_info() check before assignment
block/blk-mq-sched.c | 66 ++++++++++++++++++++++++++++++++++--------- block/blk-wbt.c | 7 ++++- block/blk-wbt.h | 12 ++++---- block/genhd.c | 6 ++++ block/ioctl.c | 4 +++ crypto/cryptd.c | 23 ++++++++------- drivers/block/loop.c | 10 +++---- drivers/char/random.c | 42 ++++++++++++++------------- 8 files changed, 113 insertions(+), 57 deletions(-)
From: Eric Biggers ebiggers@google.com
stable inclusion from stable-v4.19.226 commit a6f8ba674655f511bbf0e55d4f16c581f7a8a35a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U CVE: NA
----------------------------------------
commit 5d73d1e320c3fd94ea15ba5f79301da9a8bcc7de upstream.
extract_crng() and crng_backtrack_protect() load crng_node_pool with a plain load, which causes undefined behavior if do_numa_crng_init() modifies it concurrently.
Fix this by using READ_ONCE(). Note: as per the previous discussion https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u, READ_ONCE() is believed to be sufficient here, and it was requested that it be used here instead of smp_load_acquire().
Also change do_numa_crng_init() to set crng_node_pool using cmpxchg_release() instead of mb() + cmpxchg(), as the former is sufficient here but is more lightweight.
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Acked-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: GONG, Ruiqi gongruiqi@huaweicloud.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/char/random.c | 42 ++++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 20 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index 05daadfbf1ed..a008d816cf2b 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -836,8 +836,8 @@ static void do_numa_crng_init(struct work_struct *work) crng_initialize(crng); pool[i] = crng; } - mb(); - if (cmpxchg(&crng_node_pool, NULL, pool)) { + /* pairs with READ_ONCE() in select_crng() */ + if (cmpxchg_release(&crng_node_pool, NULL, pool) != NULL) { for_each_node(i) kfree(pool[i]); kfree(pool); @@ -850,8 +850,26 @@ static void numa_crng_init(void) { schedule_work(&numa_crng_init_work); } + +static struct crng_state *select_crng(void) +{ + struct crng_state **pool; + int nid = numa_node_id(); + + /* pairs with cmpxchg_release() in do_numa_crng_init() */ + pool = READ_ONCE(crng_node_pool); + if (pool && pool[nid]) + return pool[nid]; + + return &primary_crng; +} #else static void numa_crng_init(void) {} + +static struct crng_state *select_crng(void) +{ + return &primary_crng; +} #endif
/* @@ -1000,15 +1018,7 @@ static void _extract_crng(struct crng_state *crng,
static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) { - struct crng_state *crng = NULL; - -#ifdef CONFIG_NUMA - if (crng_node_pool) - crng = crng_node_pool[numa_node_id()]; - if (crng == NULL) -#endif - crng = &primary_crng; - _extract_crng(crng, out); + _extract_crng(select_crng(), out); }
/* @@ -1037,15 +1047,7 @@ static void _crng_backtrack_protect(struct crng_state *crng,
static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used) { - struct crng_state *crng = NULL; - -#ifdef CONFIG_NUMA - if (crng_node_pool) - crng = crng_node_pool[numa_node_id()]; - if (crng == NULL) -#endif - crng = &primary_crng; - _crng_backtrack_protect(crng, tmp, used); + _crng_backtrack_protect(select_crng(), tmp, used); }
static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
mainline inclusion from mainline-v5.19-rc1 commit 91e8bcd7b4da182e09ea19a2c73167345fe14c98 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7JJ9V CVE: NA
--------------------------------
The access to cryptd_queue::cpu_queue is synchronized by disabling preemption in cryptd_enqueue_request() and disabling BH in cryptd_queue_worker(). This implies that access is allowed from BH.
If cryptd_enqueue_request() is invoked from preemptible context _and_ soft interrupt then this can lead to list corruption since cryptd_enqueue_request() is not protected against access from soft interrupt.
Replace get_cpu() in cryptd_enqueue_request() with local_bh_disable() to ensure BH is always disabled. Remove preempt_disable() from cryptd_queue_worker() since it is not needed because local_bh_disable() ensures synchronisation.
Fixes: 254eff771441 ("crypto: cryptd - Per-CPU thread implementation...") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Conflicts: crypto/cryptd.c Signed-off-by: GUO Zihua guozihua@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- crypto/cryptd.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/crypto/cryptd.c b/crypto/cryptd.c index e0c8e907b086..3feebeec1d05 100644 --- a/crypto/cryptd.c +++ b/crypto/cryptd.c @@ -42,6 +42,10 @@ struct cryptd_cpu_queue { };
struct cryptd_queue { + /* + * Protected by disabling BH to allow enqueueing from softinterrupt and + * dequeuing from kworker (cryptd_queue_worker()). + */ struct cryptd_cpu_queue __percpu *cpu_queue; };
@@ -137,28 +141,28 @@ static void cryptd_fini_queue(struct cryptd_queue *queue) static int cryptd_enqueue_request(struct cryptd_queue *queue, struct crypto_async_request *request) { - int cpu, err; + int err; struct cryptd_cpu_queue *cpu_queue; atomic_t *refcnt;
- cpu = get_cpu(); + local_bh_disable(); cpu_queue = this_cpu_ptr(queue->cpu_queue); err = crypto_enqueue_request(&cpu_queue->queue, request);
refcnt = crypto_tfm_ctx(request->tfm);
if (err == -ENOSPC) - goto out_put_cpu; + goto out;
- queue_work_on(cpu, kcrypto_wq, &cpu_queue->work); + queue_work_on(smp_processor_id(), kcrypto_wq, &cpu_queue->work);
if (!atomic_read(refcnt)) - goto out_put_cpu; + goto out;
atomic_inc(refcnt);
-out_put_cpu: - put_cpu(); +out: + local_bh_enable();
return err; } @@ -174,15 +178,10 @@ static void cryptd_queue_worker(struct work_struct *work) cpu_queue = container_of(work, struct cryptd_cpu_queue, work); /* * Only handle one request at a time to avoid hogging crypto workqueue. - * preempt_disable/enable is used to prevent being preempted by - * cryptd_enqueue_request(). local_bh_disable/enable is used to prevent - * cryptd_enqueue_request() being accessed from software interrupts. */ local_bh_disable(); - preempt_disable(); backlog = crypto_get_backlog(&cpu_queue->queue); req = crypto_dequeue_request(&cpu_queue->queue); - preempt_enable(); local_bh_enable();
if (!req)
From: Salman Qazi sqazi@google.com
mainline inclusion from mainline-v5.8-rc1 commit 28d65729b050977d8a9125e6726871e83bd22124 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6RQVT CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked.
The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands.
Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun.
A similar phenomenon exists with dispatches from the software queue.
The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun.
Signed-off-by: Salman Qazi sqazi@google.com Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-mq-sched.c Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk-mq-sched.c | 66 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 52 insertions(+), 14 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index fa4781865be6..0fb33abac3f6 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -87,12 +87,16 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to + * be run again. This is necessary to avoid starving flushes. */ -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + int ret = 0; unsigned long end = jiffies + HZ;
do { @@ -102,6 +106,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) !e->type->ops.mq.has_work(hctx)) break;
+ if (!list_empty_careful(&hctx->dispatch)) { + ret = -EAGAIN; + break; + } + if (!blk_mq_get_dispatch_budget(hctx)) break;
@@ -133,6 +142,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) break; } } while (1); + + return ret; }
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, @@ -150,16 +161,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to + * to be run again. This is necessary to avoid starving flushes. */ -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) +static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); + int ret = 0;
do { struct request *rq;
+ if (!list_empty_careful(&hctx->dispatch)) { + ret = -EAGAIN; + break; + } + if (!sbitmap_any_bit_set(&hctx->ctx_map)) break;
@@ -193,22 +213,17 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
WRITE_ONCE(hctx->dispatch_from, ctx); + return ret; }
-void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) +int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; + int ret = 0; LIST_HEAD(rq_list);
- /* RCU or SRCU read lock is needed before checking quiesced flag */ - if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q) || - blk_queue_quiesced_internal(q))) - return; - - hctx->run++; - /* * If we have previous entries on our dispatch list, grab them first for * more fair dispatch. @@ -237,19 +252,42 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) blk_mq_sched_mark_restart_hctx(hctx); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { if (has_sched_dispatch) - blk_mq_do_dispatch_sched(hctx); + ret = blk_mq_do_dispatch_sched(hctx); else - blk_mq_do_dispatch_ctx(hctx); + ret = blk_mq_do_dispatch_ctx(hctx); } } else if (has_sched_dispatch) { - blk_mq_do_dispatch_sched(hctx); + ret = blk_mq_do_dispatch_sched(hctx); } else if (hctx->dispatch_busy) { /* dequeue request one by one from sw queue if queue is busy */ - blk_mq_do_dispatch_ctx(hctx); + ret = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list, false); } + + return ret; +} + +void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) +{ + struct request_queue *q = hctx->queue; + + /* RCU or SRCU read lock is needed before checking quiesced flag */ + if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q) || + blk_queue_quiesced_internal(q))) + return; + + hctx->run++; + + /* + * A return of -EAGAIN is an indication that hctx->dispatch is not + * empty and we must run again in order to avoid starving flushes. + */ + if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) { + if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) + blk_mq_run_hw_queue(hctx, true); + } }
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v6.2-rc1 commit a9a236d238a5e8ab2e74ca62c2c7ba5dd435af77 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z1UG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
----------------------------------------
Currently, if user disable wbt through sysfs, 'enable_state' will be 'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state 'WBT_STATE_OFF_MANUAL' to cover that case.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20221019121518.3865235-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: block/blk-wbt.c
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk-wbt.c | 7 ++++++- block/blk-wbt.h | 12 +++++++----- 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 1a5ea2e0c055..94b5eff0cd3a 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -442,8 +442,13 @@ void wbt_set_min_lat(struct request_queue *q, u64 val) struct rq_qos *rqos = wbt_rq_qos(q); if (!rqos) return; + RQWB(rqos)->min_lat_nsec = val; - RQWB(rqos)->enable_state = WBT_STATE_ON_MANUAL; + if (val) + RQWB(rqos)->enable_state = WBT_STATE_ON_MANUAL; + else + RQWB(rqos)->enable_state = WBT_STATE_OFF_MANUAL; + __wbt_update_limits(RQWB(rqos)); }
diff --git a/block/blk-wbt.h b/block/blk-wbt.h index dd0d0f297d1e..58b36e019f9e 100644 --- a/block/blk-wbt.h +++ b/block/blk-wbt.h @@ -28,13 +28,15 @@ enum { };
/* - * Enable states. Either off, or on by default (done at init time), - * or on through manual setup in sysfs. + * If current state is WBT_STATE_ON/OFF_DEFAULT, it can be covered to any other + * state, if current state is WBT_STATE_ON/OFF_MANUAL, it can only be covered + * to WBT_STATE_OFF/ON_MANUAL. */ enum { - WBT_STATE_ON_DEFAULT = 1, - WBT_STATE_ON_MANUAL = 2, - WBT_STATE_OFF_DEFAULT + WBT_STATE_ON_DEFAULT = 1, /* on by default */ + WBT_STATE_ON_MANUAL = 2, /* on manually by sysfs */ + WBT_STATE_OFF_DEFAULT = 3, /* off by default */ + WBT_STATE_OFF_MANUAL = 4, /* off manually by sysfs */ };
struct rq_wb {
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.12~10 commit 68e6582e8f2dc32fd2458b9926564faa1fb4560e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7F3M1 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
----------------------------------------------
The switch to go through blkdev_get_by_dev means we now ignore the return value from bdev_disk_changed in __blkdev_get. Add a manual check to restore the old semantics.
Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part") Reported-by: Karel Zak kzak@redhat.com Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20210421160502.447418-1-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/ioctl.c Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/ioctl.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c index e5da2482342a..9d0eeb1b0bbd 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -79,6 +79,8 @@ int __blkdev_reread_part(struct block_device *bdev) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EACCES; + if (bdev->bd_part_count) + return -EBUSY;
lockdep_assert_held(&bdev->bd_mutex);
@@ -503,6 +505,8 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, return -EACCES; if (bdev != bdev->bd_contains) return -EINVAL; + if (bdev->bd_part_count) + return -EBUSY; return disk_scan_partitions(bdev->bd_disk, mode); case BLKGETSIZE: size = i_size_read(bdev->bd_inode);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v6.3-rc6 commit 3723091ea1884d599cc8b8bf719d6f42e8d4d8b1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7F3M1 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still set, and partition scan will be proceed again when blkdev_get_by_dev() is called. However, this will cause a problem that re-assemble partitioned raid device will creat partition for underlying disk.
Test procedure:
mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 sgdisk -n 0:0:+100MiB /dev/md0 blockdev --rereadpt /dev/sda blockdev --rereadpt /dev/sdb mdadm -S /dev/md0 mdadm -A /dev/md0 /dev/sda /dev/sdb
Test result: underlying disk partition and raid partition can be observed at the same time
Note that this can still happen in come corner cases that GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid device.
Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/genhd.c Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/genhd.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c index ae4c4c4ae5a9..d75d52a53e10 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -673,6 +673,12 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) if (!ret) blkdev_put(bdev, mode & ~FMODE_EXCL);
+ /* + * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set, + * and this will cause that re-assemble partitioned raid device will + * creat partition for underlying disk. + */ + bdev->bd_invalidated = 0; if (!(mode & FMODE_EXCL)) { bd_abort_claiming(bdev, bdev, disk_scan_partitions); bdput(bdev);
From: Zhong Jinghua zhongjinghua@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA CVE: NA
----------------------------------------
This reverts commit ba3f75d02acb40b681fe7b4de7044e8103fa32ad.
The location of the code merge is wrong, roll back and start again.
Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/block/loop.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 826633aa328c..303c123bf78c 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1354,11 +1354,6 @@ loop_get_status(struct loop_device *lo, struct loop_info64 *info) info->lo_number = lo->lo_number; info->lo_offset = lo->lo_offset; info->lo_sizelimit = lo->lo_sizelimit; - - /* loff_t vars have been assigned __u64 */ - if (lo->lo_offset < 0 || lo->lo_sizelimit < 0) - return -EOVERFLOW; - info->lo_flags = lo->lo_flags; memcpy(info->lo_file_name, lo->lo_file_name, LO_NAME_SIZE); memcpy(info->lo_crypt_name, lo->lo_crypt_name, LO_NAME_SIZE);
From: Siddh Raman Pant code@siddh.me
mainline inclusion from mainline-v6.0-rc3 commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset.
loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset;
This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos);
Thus, check for negative value during loop_set_status_from_info().
Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29...
Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Siddh Raman Pant code@siddh.me Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/block/loop.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 303c123bf78c..69e986737111 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1287,6 +1287,9 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) err = -EFBIG; goto out_unfreeze; } + /* loff_t vars have been assigned __u64 */ + if (lo->lo_offset < 0 || lo->lo_sizelimit < 0) + return -EOVERFLOW; }
loop_config_discard(lo);
From: Zhong Jinghua zhongjinghua@huawei.com
mainline inclusion from mainline-v6.3-rc1 commit 9f6ad5d533d1c71e51bdd06a5712c4fbc8768dfa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should be checked before reassignment, because if an overflow error occurs, the original correct value will be changed to the wrong value, and it will not be changed back.
More, the original patch did not solve the problem, the value was set and ioctl returned an error, but the subsequent io used the value in the loop driver, which still caused an alarm:
loop_handle_cmd do_req_filebacked loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; lo_rw_aio cmd->iocb.ki_pos = pos
Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/block/loop.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 69e986737111..a21196213472 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1283,13 +1283,15 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) lo->lo_device->bd_inode->i_mapping->nrpages); goto out_unfreeze; } + + /* Avoid assigning overflow values */ + if (info->lo_offset > LLONG_MAX || info->lo_sizelimit > LLONG_MAX) + return -EOVERFLOW; + if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) { err = -EFBIG; goto out_unfreeze; } - /* loff_t vars have been assigned __u64 */ - if (lo->lo_offset < 0 || lo->lo_sizelimit < 0) - return -EOVERFLOW; }
loop_config_discard(lo);