From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.14 commit 2cafe29a8d03f02a3d16193bdaae2f3e82a423f9 category: bugfix bugzilla: 182135 CVE: NA
-------------------------------------------------
Yi reported several kernel panics on:
[16687.001777] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 ... [16687.163549] pc : __rq_qos_track+0x38/0x60
or
[ 997.690455] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ... [ 997.850347] pc : __rq_qos_done+0x2c/0x50
Turns out it is caused by race between adding rq qos(wbt) and normal IO because rq_qos_add can be run when IO is being submitted, fix this issue by freezing queue before adding/deleting rq qos to queue.
rq_qos_exit() needn't to freeze queue because it is called after queue has been frozen.
iolatency calls rq_qos_add() during allocating queue, so freezing won't add delay because queue usage refcount works at atomic mode at that time.
iocost calls rq_qos_add() when writing cgroup attribute file, that is fine to freeze queue at that time since we usually freeze queue when storing to queue sysfs attribute, meantime iocost only exists on the root cgroup.
wbt_init calls it in blk_register_queue() and queue sysfs attribute store(queue_wb_lat_store() when write it 1st time in case of !BLK_WBT_MQ), the following patch will speedup the queue freezing in wbt_init.
Reported-by: Yi Zhang yi.zhang@redhat.com Cc: Bart Van Assche bvanassche@acm.org Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Tested-by: Yi Zhang yi.zhang@redhat.com Link: https://lore.kernel.org/r/20210609015822.103433-2-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: block/blk-rq-qos.h Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-rq-qos.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h index efb3e5430ce3c..7c36442ba7f7f 100644 --- a/block/blk-rq-qos.h +++ b/block/blk-rq-qos.h @@ -6,6 +6,7 @@ #include <linux/blk_types.h> #include <linux/atomic.h> #include <linux/wait.h> +#include <linux/blk-mq.h>
enum rq_qos_id { RQ_QOS_WBT, @@ -77,20 +78,43 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos) { + /* + * No IO can be in-flight when adding rqos, so freeze queue, which + * is fine since we only support rq_qos for blk-mq queue. + * + * Reuse ->queue_lock for protecting against other concurrent + * rq_qos adding/deleting + */ + blk_mq_freeze_queue(q); + + spin_lock_irq(q->queue_lock); rqos->next = q->rq_qos; q->rq_qos = rqos; + spin_unlock_irq(q->queue_lock); + + blk_mq_unfreeze_queue(q); }
static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos) { struct rq_qos **cur;
+ /* + * See comment in rq_qos_add() about freezing queue & using + * ->queue_lock. + */ + blk_mq_freeze_queue(q); + + spin_lock_irq(q->queue_lock); for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) { if (*cur == rqos) { *cur = rqos->next; break; } } + spin_unlock_irq(q->queue_lock); + + blk_mq_unfreeze_queue(q); }
bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.14 commit a72c374f97a4c7b2f9dde5144c867fec4bdcd798 category: bugfix bugzilla: 182135 CVE: NA
-------------------------------------------------
Mark queue init done when everything is done well in blk_register_queue(), so that wbt_enable_default() can be run quickly without any RCU period involved since adding rq qos requires to freeze queue.
Also no any side effect by delaying to mark queue init done.
Reported-by: Yi Zhang yi.zhang@redhat.com Cc: Bart Van Assche bvanassche@acm.org Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Tested-by: Yi Zhang yi.zhang@redhat.com Link: https://lore.kernel.org/r/20210609015822.103433-3-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: block/blk-sysfs.c Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-sysfs.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 1f958bace71fc..66fc91da4dfa0 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -902,21 +902,6 @@ int blk_register_queue(struct gendisk *disk) "%s is registering an already registered queue\n", kobject_name(&dev->kobj));
- /* - * SCSI probing may synchronously create and destroy a lot of - * request_queues for non-existent devices. Shutting down a fully - * functional queue takes measureable wallclock time as RCU grace - * periods are involved. To avoid excessive latency in these - * cases, a request_queue starts out in a degraded mode which is - * faster to shut down and is made fully functional here as - * request_queues for non-existent devices never get registered. - */ - if (!blk_queue_init_done(q)) { - queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q); - percpu_ref_switch_to_percpu(&q->q_usage_counter); - blk_queue_bypass_end(q); - } - ret = blk_trace_init_sysfs(dev); if (ret) return ret; @@ -962,6 +947,21 @@ int blk_register_queue(struct gendisk *disk) ret = 0; unlock: mutex_unlock(&q->sysfs_dir_lock); + + /* + * SCSI probing may synchronously create and destroy a lot of + * request_queues for non-existent devices. Shutting down a fully + * functional queue takes measureable wallclock time as RCU grace + * periods are involved. To avoid excessive latency in these + * cases, a request_queue starts out in a degraded mode which is + * faster to shut down and is made fully functional here as + * request_queues for non-existent devices never get registered. + */ + if (!blk_queue_init_done(q)) { + queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q); + percpu_ref_switch_to_percpu(&q->q_usage_counter); + blk_queue_bypass_end(q); + } return ret; } EXPORT_SYMBOL_GPL(blk_register_queue);
From: Laibin Qiu qiulaibin@huawei.com
hulk inclusion category: bugfix bugzilla: 182135 CVE: NA
-------------------------------------------------
blkcg_init_queue blk_iolatency_init blk_mq_freeze_queue blk_throtl_drain <- blk_throtl will be initialized later, so q->td is NULL and trigger this NULL pointer BUG. blk_throtl_init
The following is the log of the problem. ------------[ cut here ]------------ [ 8.516269] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 8.516296] RIP: 0010:blk_throtl_drain+0x142/0x760 [ 8.520477] Call Trace: [ 8.520477] blkcg_drain_queue+0x90/0x120 [ 8.520477] __blk_drain_queue+0x18a/0x760 [ 8.520477] blk_drain_queue+0x45/0x80 [ 8.520477] blk_freeze_queue+0x58/0x70 [ 8.520477] blk_iolatency_init+0xdc/0x3e7 [ 8.520477] blkcg_init_queue+0x214/0x5e0 [ 8.520477] blk_alloc_queue_node+0x826/0xc10 [ 8.520477] ? ramdisk_size+0x27/0x27 [ 8.520477] brd_alloc+0x118/0x540 [ 8.520477] ? ramdisk_size+0x27/0x27 [ 8.520477] brd_init+0x179/0x4be [ 8.520477] ? do_one_initcall+0x5fe/0x783 [ 8.520477] ? ramdisk_size+0x27/0x27 [ 8.520477] do_one_initcall+0xf7/0x783 [ 8.520477] ? initcall_blacklisted+0x1b0/0x1b0 [ 8.520477] ? __wake_up_common+0x600/0x620 [ 8.520477] ? lock_downgrade+0x720/0x720 [ 8.520477] ? check_preemption_disabled+0x40/0x2a0 [ 8.520477] kernel_init_freeable+0xb34/0xc27 [ 8.520477] ? rest_init+0x41c/0x41c [ 8.520477] kernel_init+0x10/0x1e0 [ 8.520477] ? rest_init+0x41c/0x41c [ 8.520477] ret_from_fork+0x24/0x30 [ 8.520477] Modules linked in: [ 8.520477] Dumping ftrace buffer: [ 8.520477] (ftrace buffer empty) [ 8.520477] ---[ end trace 74cf51ecec6ee7a1 ]---
Fix this by judging whether q->td is NULL. If so, Indicates that the io_throttle module has not been initialized. Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-cgroup.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 492e49f8e76c0..c55a903bd8058 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1338,6 +1338,13 @@ void blkcg_drain_queue(struct request_queue *q) if (!q->root_blkg) return;
+ /* + * @q could be exiting and q->td has not been initialized. + * If so, don't need drain any throttled bios. + */ + if (!q->td) + return; + blk_throtl_drain(q); }
From: Laibin Qiu qiulaibin@huawei.com
hulk inclusion category: bugfix bugzilla: 182135 CVE: NA
-------------------------------------------------
td is isolated by CONFIG_BLK_DEV_THROTTLING in struct request_queue.
Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- block/blk-cgroup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index c55a903bd8058..ce04c5a5ebb11 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1342,9 +1342,10 @@ void blkcg_drain_queue(struct request_queue *q) * @q could be exiting and q->td has not been initialized. * If so, don't need drain any throttled bios. */ +#ifdef CONFIG_BLK_DEV_THROTTLIN if (!q->td) return; - +#endif blk_throtl_drain(q); }