This patch set fix file create fail when get wrong longest extent.
Zizhi Wo (2): xfs: Fix file creation failure xfs: Fix agf_longest update error
fs/xfs/libxfs/xfs_alloc.c | 14 ++++++++++++++ fs/xfs/libxfs/xfs_alloc_btree.c | 13 ++++++++++--- fs/xfs/libxfs/xfs_btree.h | 1 + 3 files changed, 25 insertions(+), 3 deletions(-)
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9TDTA CVE: NA
--------------------------------
In the file system expansion test and concurrent file creation and writing scenarios, file creation fails occasionally.
The detailed test scheme is as follows: 1. If the remaining space is less than 128 MB, expand the space by 1 GB; --xfs_growfs /$DEV -D $bc -m 100 2. 32 processes create a file every 0.5s and write 4 KB to 4 MB data randomly. --filesize=$((RANDOM % 1024 + 1)) --dd if=/dev/zero oflag=direct of=$filename bs=4K count=$filesize And when the file creation fails, there are still hundreds of megabytes of free space. The overall analysis process is as follows:
Direct write Create file xfs_file_write_iter ... xfs_direct_write_iomap_begin xfs_iomap_write_direct ... xfs_alloc_vextent xfs_alloc_ag_vextent xfs_alloc_ag_vextent_near xfs_alloc_cur_finish xfs_alloc_fixup_trees xfs_btree_delete xfs_btree_delrec xfs_allocbt_update_lastrec /* The longest update is 0 * because numrec == 0. */ agf->agf_longest = len = 0 xfs_create ... xfs_dialloc xfs_ialloc_ag_alloc xfs_alloc_vextent xfs_alloc_fix_freelist xfs_alloc_space_available -> as longest=0, it will return false, no space for inode alloc.
The root cause of the problem is that allocation extents holds agf locks, but the inode creation process will quickly check whether there is space firstly, which does not have agf locks. And when the first judgment fails, it returns directly. If the first judgment passes, the lock is held before entering the second judgment, that's how the "check-lock-check again" algorithm is designed. If all AG fails in no lock check, an error will return. This problem occurs probably when there is not enough space left for all the AG's in front, and the last AG deletes the last CNT tree record and the new record is not inserted yet.
Fix this issue by adding the bc_free_longest field to the xfs_btree_cur_t structure to store the longest count that will be updated. The assignment is done in xfs_alloc_fixup_trees() and xfs_free_ag_extent().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 14 ++++++++++++++ fs/xfs/libxfs/xfs_alloc_btree.c | 9 ++++++++- fs/xfs/libxfs/xfs_btree.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index b901eedf3bb8..95bfe89651ad 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -543,6 +543,13 @@ xfs_alloc_fixup_trees( nfbno2 = rbno + rlen; nflen2 = (fbno + flen) - nfbno2; } + + /* + * Record the potential maximum free length in advance. + */ + if (nfbno1 != NULLAGBLOCK || nfbno2 != NULLAGBLOCK) + cnt_cur->bc_free_longest = max_t(xfs_extlen_t, nflen1, nflen2); + /* * Delete the entry from the by-size btree. */ @@ -2039,6 +2046,13 @@ xfs_free_ag_extent( * Now allocate and initialize a cursor for the by-size tree. */ cnt_cur = xfs_allocbt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_CNT); + /* + * Record the potential maximum free length in advance. + */ + if (haveleft) + cnt_cur->bc_free_longest = ltlen; + if (haveright) + cnt_cur->bc_free_longest = gtlen; /* * Have both left and right contiguous neighbors. * Merge all three into a single free block. diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index 613a1743fdd5..d6dbbe85e819 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -147,7 +147,14 @@ xfs_allocbt_update_lastrec( rrp = XFS_ALLOC_REC_ADDR(cur->bc_mp, block, numrecs); len = rrp->ar_blockcount; } else { - len = 0; + /* + * Update in advance to prevent file creation failure + * for concurrent processes even though there is no + * numrec currently. + * And there's no need to worry as the value that not + * less than bc_free_longest will be inserted later. + */ + len = cpu_to_be32(cur->bc_free_longest); }
break; diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index d172046ae833..1d89b897c056 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -235,6 +235,7 @@ typedef struct xfs_btree_cur struct xfs_btree_cur_ag bc_ag; struct xfs_btree_cur_ino bc_ino; }; + xfs_extlen_t bc_free_longest; /* the actual longest free extent */ } xfs_btree_cur_t;
/* cursor flags */
From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9TDTA CVE: NA
--------------------------------
A concurrent file creation and little writing could unexpectedly return -ENOSPC error since there is a race window that the allocator could get the wrong agf->agf_longest.
Write file process steps: 1) Find the entry that best meets the conditions, then calculate the start address and length of the remaining part of the entry after allocation. 2) Delete this entry and update the agf->agf_longest. 3) Insert the remaining unused parts of this entry based on the calculations in 1), and update the agf->agf_longest again if necessary.
Create file process steps: 1) Check whether there are free inodes in the inode chunk. 2) If there is no free inode, check whether there has space for creating inode chunks, perform the no-lock judgment first. 3) If the judgment succeeds, the judgment is performed again with agf lock held. Otherwire, an error is returned directly.
If the write process is in step 2) but not go to 3) yet, the create file process goes to 2) at this time, it will be mistaken for no space, resulting in the file system still has space but the file creation fails. Because in the previous code, if numrec == 0, longest will be temporarily set to 0 in write process step 2), and we have fixed it in commit 628ab796d8b1 ("xfs: Fix file creation failure") to update the longest in advance.
But we don't fix it all. If numrec is not 0 in xfs_allocbt_update_lastrec, xfs will update the agf_longest to the -current- longest node currently. However, this is not true because the value of the node may be smaller than the remaining part of the original longest extent after part of it is deleted. The agf_longest updated at this moment is not accurate.
Fix it by comparing cur->bc_free_longest with the -current- longest node and taking the maximum value as the agf_longest.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: 628ab796d8b1 ("xfs: Fix file creation failure") Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc_btree.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index d6dbbe85e819..8282a171131e 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -141,20 +141,20 @@ xfs_allocbt_update_lastrec( return; ASSERT(ptr == numrecs + 1);
+ /* + * Update in advance to prevent file creation failure + * for concurrent processes even though there is no + * numrec currently. + * And there's no need to worry as the value that not + * less than bc_free_longest will be inserted later. + */ + len = cpu_to_be32(cur->bc_free_longest); if (numrecs) { xfs_alloc_rec_t *rrp;
rrp = XFS_ALLOC_REC_ADDR(cur->bc_mp, block, numrecs); - len = rrp->ar_blockcount; - } else { - /* - * Update in advance to prevent file creation failure - * for concurrent processes even though there is no - * numrec currently. - * And there's no need to worry as the value that not - * less than bc_free_longest will be inserted later. - */ - len = cpu_to_be32(cur->bc_free_longest); + len = cpu_to_be32(max_t(xfs_extlen_t, cur->bc_free_longest, + be32_to_cpu(rrp->ar_blockcount))); }
break;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10941 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10941 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...