From: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com
mainline inclusion from mainline-v5.18-rc1 commit 572299f03afd676dd4e20669cdaf5ed0fe1379d4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61Z7G CVE: NA
--------------------------------
When IO requests are made continuously and the target block device handles requests faster than request arrival, the request dispatch loop keeps on repeating to dispatch the arriving requests very long time, more than a minute. Since the loop runs as a workqueue worker task, the very long loop duration triggers workqueue watchdog timeout and BUG [1].
To avoid the very long loop duration, break the loop periodically. When opportunity to dispatch requests still exists, check need_resched(). If need_resched() returns true, the dispatch loop already consumed its time slice, then reschedule the dispatch work and break the loop. With heavy IO load, need_resched() does not return true for 20~30 seconds. To cover such case, check time spent in the dispatch loop with jiffies. If more than 1 second is spent, reschedule the dispatch work and break the loop.
[1]
[ 609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s! [ 609.701820] Showing busy workqueues and worker pools: [ 609.707915] workqueue events: flags=0x0 [ 609.712615] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.712626] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 609.712687] workqueue events_freezable: flags=0x4 [ 609.732943] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.732952] pending: pci_pme_list_scan [ 609.732968] workqueue events_power_efficient: flags=0x80 [ 609.751947] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.751955] pending: neigh_managed_work [ 609.752018] workqueue kblockd: flags=0x18 [ 609.769480] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 609.769488] in-flight: 1020:blk_mq_run_work_fn [ 609.769498] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67 [ 639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s! [ 639.909513] Showing busy workqueues and worker pools: [ 639.915404] workqueue events: flags=0x0 [ 639.920197] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 639.920215] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 639.920365] workqueue kblockd: flags=0x18 [ 639.939932] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 639.939942] in-flight: 1020:blk_mq_run_work_fn [ 639.939955] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67
Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io") Signed-off-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/ Link: https://lore.kernel.org/r/20220318022641.133484-1-shinichiro.kawasaki@wdc.co... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk-mq-sched.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 7bdc151c0616..fa4781865be6 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -93,6 +93,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + unsigned long end = jiffies + HZ;
do { struct request *rq; @@ -124,7 +125,14 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); - } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + + if (!blk_mq_dispatch_rq_list(q, &rq_list, true)) + break; + if (need_resched() || time_is_before_jiffies(end)) { + blk_mq_delay_run_hw_queue(hctx, 0); + break; + } + } while (1); }
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,