From: Logan Gunthorpe logang@deltatee.com
stable inclusion from stable-v5.10.138 commit 6d7aabdba60cc7e6d051617f6c984f5d1bfe2974 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60QFD
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9973f0fa7d20269fe6fefe6333997fb5914449c1 ]
The mdadm test 07layouts randomly produces a kernel hung task deadlock. The deadlock is caused by the suspend_lo/suspend_hi files being set by the mdadm background process during reshape and not being cleared because the process hangs. (Leaving aside the issue of the fragility of freezing kernel tasks by buggy userspace processes...)
When the background mdadm process hangs it, is waiting (without a timeout) on a change to the sync_completed file signalling that the reshape has completed. The process is woken up a couple times when the reshape finishes but it is woken up before MD_RECOVERY_RUNNING is cleared so sync_completed_show() reports 0 instead of "none".
To fix this, notify the sysfs file in md_reap_sync_thread() after MD_RECOVERY_RUNNING has been cleared. This wakes up mdadm and causes it to continue and write to suspend_lo/suspend_hi to allow IO to continue.
Signed-off-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Song Liu song@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Reviewed-by: Wei Li liwei391@huawei.com --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 961cc4e6d801..abee20704bd5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9457,6 +9457,7 @@ void md_reap_sync_thread(struct mddev *mddev) wake_up(&resync_wait); /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + sysfs_notify_dirent_safe(mddev->sysfs_completed); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(mddev); if (mddev->event_work.func)