[PATCH OLK-6.6] quota: fix livelock between quotactl and freeze_super
From: Abhishek Bapat <abhishekbapat@google.com> mainline inclusion from mainline-v7.0-rc1 commit 77449e453dfc006ad738dec55374c4cbc056fd39 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a filesystem is frozen, quotactl_block() enters a retry loop waiting for the filesystem to thaw. It acquires s_umount, checks the freeze state, drops s_umount and uses sb_start_write() - sb_end_write() pair to wait for the unfreeze. However, this retry loop can trigger a livelock issue, specifically on kernels with preemption disabled. The mechanism is as follows: 1. freeze_super() sets SB_FREEZE_WRITE and calls sb_wait_write(). 2. sb_wait_write() calls percpu_down_write(), which initiates synchronize_rcu(). 3. Simultaneously, quotactl_block() spins in its retry loop, immediately executing the sb_start_write() - sb_end_write() pair. 4. Because the kernel is non-preemptible and the loop contains no scheduling points, quotactl_block() never yields the CPU. This prevents that CPU from reaching an RCU quiescent state. 5. synchronize_rcu() in the freezer thread waits indefinitely for the quotactl_block() CPU to report a quiescent state. 6. quotactl_block() spins indefinitely waiting for the freezer to advance, which it cannot do as it is blocked on the RCU sync. This results in a hang of the freezer process and 100% CPU usage by the quota process. While this can occur intermittently on multi-core systems, it is reliably reproducing on a node with the following script, running both the freezer and the quota toggle on the same CPU: # mkfs.ext4 -O quota /dev/sda 2g && mkdir a_mount # mount /dev/sda -o quota,usrquota,grpquota a_mount # taskset -c 3 bash -c "while true; do xfs_freeze -f a_mount; \ xfs_freeze -u a_mount; done" & # taskset -c 3 bash -c "while true; do quotaon a_mount; \ quotaoff a_mount; done" & Adding cond_resched() to the retry loop fixes the issue. It acts as an RCU quiescent state, allowing synchronize_rcu() in percpu_down_write() to complete. Fixes: 576215cffdef ("fs: Drop wait_unfrozen wait queue") Signed-off-by: Abhishek Bapat <abhishekbapat@google.com> Link: https://patch.msgid.link/20260115213103.1089129-1-abhishekbapat@google.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> --- fs/quota/quota.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 0e41fb84060f..5be53cae2c95 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -899,6 +899,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd) sb_start_write(sb); sb_end_write(sb); put_super(sb); + cond_resched(); goto retry; } return sb; -- 2.39.2
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/21391 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/B7T... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/21391 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/B7T...
participants (2)
-
patchwork bot -
Yongjian Sun