在 2021/11/16 19:04, John Garry 写道:
On 16/11/2021 03:10, chenxiang wrote:
From: John Garry john.garry@huawei.com
Please fix the author here (to be Alan)
ok
John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device.
The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock.
The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure.
This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock.
Reported-and-tested-by: John Garry john.garry@huawei.com Signed-off-by: Alan Stern stern@rowland.harvard.edu
You need to add your SoB when you send to the community.
ok
Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
Did Alan add this? I didn't think that it was necessary
Yes, he added it in the thread (https://lore.kernel.org/linux-scsi/20210714161027.GC380727@rowland.harvard.e...).
.