From: "Darrick J. Wong" djwong@kernel.org
mainline-inclusion from mainline-v5.12-rc4 commit 1aec7c3d05670b92b7339b19999009a93808efb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------------
In commit f8f2835a9cf3 we changed the behavior of XFS to use EFIs to remove blocks from an overfilled AGFL because there were complaints about transaction overruns that stemmed from trying to free multiple blocks in a single transaction.
Unfortunately, that commit missed a subtlety in the debug-mode transaction accounting when a realtime volume is attached. If a realtime file undergoes a data fork mapping change such that realtime extents are allocated (or freed) in the same transaction that a data device block is also allocated (or freed), we can trip a debugging assertion. This can happen (for example) if a realtime extent is allocated and it is necessary to reshape the bmbt to hold the new mapping.
When we go to allocate a bmbt block from an AG, the first thing the data device block allocator does is ensure that the freelist is the proper length. If the freelist is too long, it will trim the freelist to the proper length.
In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to record the decrement in the AG free list count. Prior to f8f28 we would put the free block back in the free space btrees in the same transaction, which calls xfs_trans_agblocks_delta() to record the increment in the AG free block count. Since AGFL blocks are included in the global free block count (fdblocks), there is no corresponding fdblocks update, so the AGFL free satisfies the following condition in xfs_trans_apply_sb_deltas:
/* * Check that superblock mods match the mods made to AGF counters. */ ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) == (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta + tp->t_ag_btree_delta));
The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is the number blocks that were allocated.
After commit f8f28 we defer the block freeing to the next chained transaction, which means that the calls to xfs_trans_agflist_delta and xfs_trans_agblocks_delta occur in separate transactions. The (first) transaction that shortens the free list trips on the comparison, which has now become:
(X + 0) == ((X) + -1 + 0)
because we haven't freed the AGFL block yet; we've only logged an intention to free it. When the second transaction (the deferred free) commits, it will evaluate the expression as:
(0 + 0) == (1 + 0 + 0)
and trip over that in turn.
At this point, the astute reader may note that the two commits tagged by this patch have been in the kernel for a long time but haven't generated any bug reports. How is it that the author became aware of this bug?
This originally surfaced as an intermittent failure when I was testing realtime rmap, but a different bug report by Zorro Lang reveals the same assertion occuring on !lazysbcount filesystems.
The common factor to both reports (and why this problem wasn't previously reported) becomes apparent if we consider when xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():
if (tp->t_flags & XFS_TRANS_SB_DIRTY) xfs_trans_apply_sb_deltas(tp);
With a modern lazysbcount filesystem, transactions update only the percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence xfs_trans_apply_sb_deltas is rarely called.
However, updates to the count of free realtime extents are not part of lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or removing data fork mappings to realtime files; similarly, XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.
Dave mentioned in response to an earlier version of this patch:
"IIUC, what you are saying is that this debug code is simply not exercised in normal testing and hasn't been for the past decade? And it still won't be exercised on anything other than realtime device testing?
"...it was debugging code from 1994 that was largely turned into dead code when lazysbcounters were introduced in 2007. Hence I'm not sure it holds any value anymore."
This debugging code isn't especially helpful - you can modify the flcount on one AG and the freeblks of another AG, and it won't trigger. Add the fact that nobody noticed for a decade, and let's just get rid of it (and start testing realtime :P).
This bug was found by running generic/051 on either a V4 filesystem lacking lazysbcount; or a V5 filesystem with a realtime volume.
Cc: bfoster@redhat.com, zlang@redhat.com Fixes: f8f2835a9cf3 ("xfs: defer agfl block frees when dfops is available") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Lihong Kou koulihong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 3 --- fs/xfs/libxfs/xfs_alloc_btree.c | 2 -- fs/xfs/libxfs/xfs_rmap_btree.c | 2 -- fs/xfs/xfs_fsops.c | 2 -- fs/xfs/xfs_trans.c | 7 ------- fs/xfs/xfs_trans.h | 15 --------------- 6 files changed, 31 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 15640015be9d..6989efd55b5b 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -718,7 +718,6 @@ xfs_alloc_update_counters( agbp->b_pag->pagf_freeblks += len; be32_add_cpu(&agf->agf_freeblks, len);
- xfs_trans_agblocks_delta(tp, len); if (unlikely(be32_to_cpu(agf->agf_freeblks) > be32_to_cpu(agf->agf_length))) { xfs_buf_mark_corrupt(agbp); @@ -2689,7 +2688,6 @@ xfs_alloc_get_freelist( pag = agbp->b_pag; ASSERT(!pag->pagf_agflreset); be32_add_cpu(&agf->agf_flcount, -1); - xfs_trans_agflist_delta(tp, -1); pag->pagf_flcount--;
logflags = XFS_AGF_FLFIRST | XFS_AGF_FLCOUNT; @@ -2796,7 +2794,6 @@ xfs_alloc_put_freelist( pag = agbp->b_pag; ASSERT(!pag->pagf_agflreset); be32_add_cpu(&agf->agf_flcount, 1); - xfs_trans_agflist_delta(tp, 1); pag->pagf_flcount++;
logflags = XFS_AGF_FLLAST | XFS_AGF_FLCOUNT; diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index 8e01231b308e..dbe302d1cb8d 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -73,7 +73,6 @@ xfs_allocbt_alloc_block(
xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false);
- xfs_trans_agbtree_delta(cur->bc_tp, 1); new->s = cpu_to_be32(bno);
*stat = 1; @@ -97,7 +96,6 @@ xfs_allocbt_free_block(
xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, XFS_EXTENT_BUSY_SKIP_DISCARD); - xfs_trans_agbtree_delta(cur->bc_tp, -1); return 0; }
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c index beb81c84a937..9f5bcbd834c3 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.c +++ b/fs/xfs/libxfs/xfs_rmap_btree.c @@ -103,7 +103,6 @@ xfs_rmapbt_alloc_block( xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false);
- xfs_trans_agbtree_delta(cur->bc_tp, 1); new->s = cpu_to_be32(bno); be32_add_cpu(&agf->agf_rmap_blocks, 1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_RMAP_BLOCKS); @@ -136,7 +135,6 @@ xfs_rmapbt_free_block(
xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, XFS_EXTENT_BUSY_SKIP_DISCARD); - xfs_trans_agbtree_delta(cur->bc_tp, -1);
pag = cur->bc_ag.agbp->b_pag; xfs_ag_resv_free_extent(pag, XFS_AG_RESV_RMAPBT, NULL, 1); diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index ef1d5bb88b93..4a3a4f38531a 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -107,8 +107,6 @@ xfs_growfs_data_private( if (error) goto out_trans_cancel;
- xfs_trans_agblocks_delta(tp, id.nfree); - /* If there are new blocks in the old last AG, extend it. */ if (new) { error = xfs_ag_extend_space(mp, tp, &id, new); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index c94e71f741b6..7b9a394876bd 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -471,13 +471,6 @@ xfs_trans_apply_sb_deltas( bp = xfs_trans_getsb(tp); sbp = bp->b_addr;
- /* - * Check that superblock mods match the mods made to AGF counters. - */ - ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) == - (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta + - tp->t_ag_btree_delta)); - /* * Only update the superblock counters if we are logging them */ diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 084658946cc8..2f802d6c9fcf 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -140,11 +140,6 @@ typedef struct xfs_trans { int64_t t_res_fdblocks_delta; /* on-disk only chg */ int64_t t_frextents_delta;/* superblock freextents chg*/ int64_t t_res_frextents_delta; /* on-disk only chg */ -#if defined(DEBUG) || defined(XFS_WARN) - int64_t t_ag_freeblks_delta; /* debugging counter */ - int64_t t_ag_flist_delta; /* debugging counter */ - int64_t t_ag_btree_delta; /* debugging counter */ -#endif int64_t t_dblocks_delta;/* superblock dblocks change */ int64_t t_agcount_delta;/* superblock agcount change */ int64_t t_imaxpct_delta;/* superblock imaxpct change */ @@ -165,16 +160,6 @@ typedef struct xfs_trans { */ #define xfs_trans_set_sync(tp) ((tp)->t_flags |= XFS_TRANS_SYNC)
-#if defined(DEBUG) || defined(XFS_WARN) -#define xfs_trans_agblocks_delta(tp, d) ((tp)->t_ag_freeblks_delta += (int64_t)d) -#define xfs_trans_agflist_delta(tp, d) ((tp)->t_ag_flist_delta += (int64_t)d) -#define xfs_trans_agbtree_delta(tp, d) ((tp)->t_ag_btree_delta += (int64_t)d) -#else -#define xfs_trans_agblocks_delta(tp, d) -#define xfs_trans_agflist_delta(tp, d) -#define xfs_trans_agbtree_delta(tp, d) -#endif - /* * XFS transaction mechanism exported interfaces. */