From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 188788, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
Twice fixup for the same ag may happen within exact one tp, and the consume of agfl after first fixup may trigger failure of second fixup, which is a unintended behavior and then xfs shutdown[1][2].
Gao Xiang describe one solution that we can reserve more blocks when first fixup, but there is some logical error:
- we may first see postallocs as 1 and second as 0, this can trigger pointless agfl filling or shortening - upper case(postallocs first equals to 1, second equals to 0) give us examples that we need shorten the agfl, but xfs_alloc_fix_freelist can only free agfl after success freespace check. Besides, the filling or shortening of agfl won't change fdblocks, so we can fall into that we can see fdblocks(or resblocks) but ag fixup will reject us, and then xfs can shutdown too - once postallocs equals to 1, it can also change the logical of xfs_alloc_ag_max_usable, which will change the block allocation logical(found this problem by check each ag's freeblocks after we fallocate a huge file) - once postallocs equals to 1, we reserve 2 * xfs_alloc_min_freelist(), but sometimes it seems not enough once bnt/cnt grow and the second fixup need more reserve...
This patch fix all bug above by using m_ag_maxlevels to reserve more blocks, and adapt xfs_alloc_set_aside/xfs_alloc_ag_max_usable to match this more reserve. Besides, we just reserve more, won't fill or shorten agfl according to that reserve.
[1] https://www.spinics.net/lists/linux-xfs/msg66440.html [2] https://lore.kernel.org/linux-xfs/20221228133204.4021519-1-guoxuenan@huawei....
Fixes: 53f85096f93e ("xfs: account extra freespace btree splits for multiple allocations") Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 41 ++++++++++++++++++++++++++++++++++----- fs/xfs/xfs_mount.c | 9 +++++++++ 2 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 433798f5d935..d4d7a99114c7 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -81,6 +81,25 @@ xfs_prealloc_blocks( return XFS_IBT_BLOCK(mp) + 1; }
+/* + * Twice fixup for the same ag may happen within exact one tp, and the consume + * of agfl after first fixup may trigger second fixup's failure, then xfs will + * shutdown. To avoid that, we reserve blocks which can satisfy the second + * fixup. + */ +xfs_extlen_t +xfs_ag_fixup_aside( + struct xfs_mount *mp) +{ + xfs_extlen_t ret; + + ret = 2 * mp->m_ag_maxlevels; + if (xfs_has_rmapbt(mp)) + ret += mp->m_rmap_maxlevels; + + return ret; +} + /* * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of * AGF buffer (PV 947395), we place constraints on the relationship among @@ -95,12 +114,15 @@ xfs_prealloc_blocks( * * We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to handle a * potential split of the file's bmap btree. + * + * Besides, comment for xfs_ag_fixup_aside show why we reserve more blocks. */ unsigned int xfs_alloc_set_aside( struct xfs_mount *mp) { - return mp->m_sb.sb_agcount * (XFS_ALLOC_AGFL_RESERVE + 4); + return mp->m_sb.sb_agcount * (XFS_ALLOC_AGFL_RESERVE + + 4 + xfs_ag_fixup_aside(mp)); }
/* @@ -133,6 +155,8 @@ xfs_alloc_ag_max_usable( if (xfs_has_reflink(mp)) blocks++; /* refcount root block */
+ blocks += xfs_ag_fixup_aside(mp); + return mp->m_sb.sb_agblocks - blocks; }
@@ -2591,6 +2615,7 @@ xfs_alloc_fix_freelist( struct xfs_alloc_arg targs; /* local allocation arguments */ xfs_agblock_t bno; /* freelist block */ xfs_extlen_t need; /* total blocks needed in freelist */ + xfs_extlen_t minfree; int error = 0;
/* deferred ops (AGFL block frees) require permanent transactions */ @@ -2622,8 +2647,11 @@ xfs_alloc_fix_freelist( * blocks to perform multiple allocations from a single AG and * transaction if needed. */ - need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, alloc_flags | + minfree = need = xfs_alloc_min_freelist(mp, pag); + if (args->postallocs) + minfree += xfs_ag_fixup_aside(mp); + + if (!xfs_alloc_space_available(args, minfree, alloc_flags | XFS_ALLOC_FLAG_CHECK)) goto out_agbp_relse;
@@ -2646,8 +2674,11 @@ xfs_alloc_fix_freelist( xfs_agfl_reset(tp, agbp, pag);
/* If there isn't enough total space or single-extent, reject it. */ - need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, alloc_flags)) + minfree = need = xfs_alloc_min_freelist(mp, pag); + if (args->postallocs) + minfree += xfs_ag_fixup_aside(mp); + + if (!xfs_alloc_space_available(args, minfree, alloc_flags)) goto out_agbp_relse;
/* diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3f7044611286..04b347fe1b59 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -778,6 +778,15 @@ xfs_mountfs( xfs_rmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp);
+ /* + * We now need m_ag_maxlevels/m_rmap_maxlevels to initialize + * m_alloc_set_aside/m_ag_max_usable. And when we first do the + * init in xfs_sb_mount_common, m_alloc_set_aside/m_ag_max_usable + * still equals to 0. Redo it now. + */ + mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); + mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); + /* * Check if sb_agblocks is aligned at stripe boundary. If sb_agblocks * is NOT aligned turn off m_dalign since allocator alignment is within