From: Wu Bo wubo40@huawei.com
hulk inclusion category: bugfix bugzilla: NA CVE: NA
https://gitee.com/src-openeuler/kernel/issues/I28N9J ---------------------------
For some reason, during reboot the system, iscsi.service failed to logout all sessions. kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths.
[ 1044.098991] reboot: Mddev shutdown finished. [ 1044.099311] reboot: Usermodehelper disable finished. [ 1050.611244] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295152378, last ping 4295153633, now 4295154944 [ 1348.599676] Call trace: [ 1348.599887] __switch_to+0xe8/0x150 [ 1348.600113] __schedule+0x33c/0xa08 [ 1348.600372] schedule+0x2c/0x88 [ 1348.600567] schedule_timeout+0x184/0x3a8 [ 1348.600820] io_schedule_timeout+0x28/0x48 [ 1348.601089] wait_for_common_io.constprop.2+0x168/0x258 [ 1348.601425] wait_for_completion_io_timeout+0x28/0x38 [ 1348.601762] blk_execute_rq+0x98/0xd8 [ 1348.602006] __scsi_execute+0xe0/0x1e8 [ 1348.602262] sd_sync_cache+0xd0/0x220 [sd_mod] [ 1348.602551] sd_shutdown+0x6c/0xf8 [sd_mod] [ 1348.602826] device_shutdown+0x13c/0x250 [ 1348.603078] kernel_restart_prepare+0x5c/0x68 [ 1348.603400] kernel_restart+0x20/0x98 [ 1348.603683] __se_sys_reboot+0x214/0x260 [ 1348.603987] __arm64_sys_reboot+0x24/0x30 [ 1348.604300] el0_svc_common+0x80/0x1b8 [ 1348.604590] el0_svc_handler+0x78/0xe0 [ 1348.604877] el0_svc+0x10/0x260
The scsi Mid Layer timeout handler function(scsi_times_out) will be call iscsi_eh_cmd_timed_out function to do error handler.
if the system_state is still SYSTEM_RUNNING, the iscsi_eh_cmd_timed_out will return BLK_EH_RESET_TIMER, the block layer will reset timer, and Waiting for the next timeout. Repeat the above operation.
In order to solve the cmd hung problem, Added a timeout limit.
Signed-off-by: Wu Bo wubo40@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Cheng Jian cj.chengjian@huawei.com --- drivers/scsi/libiscsi.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 35d603db33bb..6a437743f1c0 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -2025,7 +2025,10 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) * Instead, handle cmd, allow completion to happen and let * upper layer to deal with the result. */ - if (unlikely(system_state != SYSTEM_RUNNING)) { + conn = session->leadconn; + if (unlikely(system_state != SYSTEM_RUNNING) || (conn && + time_before_eq(conn->last_recv + + (cls_session->recovery_tmo * HZ), jiffies))) { sc->result = DID_NO_CONNECT << 16; ISCSI_DBG_EH(session, "sc on shutdown, handled\n"); rc = BLK_EH_DONE; @@ -2471,10 +2474,11 @@ int iscsi_eh_session_reset(struct scsi_cmnd *sc) iscsi_conn_failure(conn, ISCSI_ERR_SCSI_EH_SESSION_RST);
ISCSI_DBG_EH(session, "wait for relogin\n"); - wait_event_interruptible(conn->ehwait, + wait_event_interruptible_timeout(conn->ehwait, session->state == ISCSI_STATE_TERMINATE || session->state == ISCSI_STATE_LOGGED_IN || - session->state == ISCSI_STATE_RECOVERY_FAILED); + session->state == ISCSI_STATE_RECOVERY_FAILED, + cls_session->recovery_tmo * HZ); if (signal_pending(current)) flush_signals(current);