Baokun Li (1): xfs: propagate the return value of xfs_log_force() to avoid soft lockup
Colin Ian King (2): xfs: remove redundant initializations of pointers drop_leaf and save_leaf xfs: remove redundant pointer lip
Darrick J. Wong (9): xfs: use setattr_copy to set vfs inode attributes xfs: remove kmem_zone typedef xfs: rename _zone variables to _cache xfs: compact deferred intent item structures xfs: create slab caches for frequently-used deferred items xfs: rename xfs_bmap_add_free to xfs_free_extent_later xfs: reduce the size of struct xfs_extent_free_item xfs: remove unused parameter from refcount code xfs: pass xfs_extent_free_item directly through the log intent code
Dave Chinner (19): xfs: don't assert fail on perag references on teardown xfs: set prealloc flag in xfs_alloc_file_space() xfs: validity check agbnos on the AGFL xfs: validate block number being freed before adding to xefi xfs: don't reverse order of items in bulk AIL insertion xfs: use deferred frees for btree block freeing xfs: pass alloc flags through to xfs_extent_busy_flush() xfs: allow extent free intents to be retried xfs: don't block in busy flushing when freeing extents xfs: journal geometry is not properly bounds checked xfs: AGF length has never been bounds checked xfs: fix bounds check in xfs_defer_agfl_block() xfs: block reservation too large for minleft allocation xfs: punching delalloc extents on write failure is racy xfs: use byte ranges for write cleanup ranges xfs,iomap: move delalloc punching to iomap iomap: buffered write failure should not truncate the page cache xfs: xfs_bmap_punch_delalloc_range() should take a byte range xfs: fix off-by-one-block in xfs_discard_folio()
Gaosheng Cui (1): xfs: remove xfs_setattr_time() declaration
Guo Xuenan (1): xfs: set minleft correctly for randomly sparse inode allocations
Jiapeng Chong (1): xfs: Remove redundant assignment to busy
Long Li (6): xfs: fix dir3 block read verify fail during log recover Revert "xfs: propagate the return value of xfs_log_force() to avoid soft lockup" xfs: xfs_trans_cancel() path must check for log shutdown xfs: don't verify agf length when log recovery xfs: shutdown to ensure submits buffers on LSN boundaries xfs: update the last_sync_lsn with ctx start lsn
yangerkun (4): xfs: keep growfs sb log item active until ail flush success xfs: fix xfs shutdown since we reserve more blocks in agfl fixup xfs: longest free extent no need consider postalloc xfs: shutdown xfs once inode double free
fs/xfs/kmem.h | 4 - fs/xfs/libxfs/xfs_alloc.c | 390 +++++++++++++++++++++-------- fs/xfs/libxfs/xfs_alloc.h | 51 +++- fs/xfs/libxfs/xfs_alloc_btree.c | 2 +- fs/xfs/libxfs/xfs_attr_leaf.c | 2 - fs/xfs/libxfs/xfs_bmap.c | 90 +++---- fs/xfs/libxfs/xfs_bmap.h | 37 +-- fs/xfs/libxfs/xfs_bmap_btree.c | 27 +- fs/xfs/libxfs/xfs_btree.c | 4 +- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 6 +- fs/xfs/libxfs/xfs_da_btree.h | 3 +- fs/xfs/libxfs/xfs_defer.c | 70 +++++- fs/xfs/libxfs/xfs_defer.h | 3 + fs/xfs/libxfs/xfs_ialloc.c | 32 ++- fs/xfs/libxfs/xfs_ialloc_btree.c | 8 +- fs/xfs/libxfs/xfs_inode_fork.c | 4 +- fs/xfs/libxfs/xfs_inode_fork.h | 2 +- fs/xfs/libxfs/xfs_refcount.c | 56 +++-- fs/xfs/libxfs/xfs_refcount.h | 7 +- fs/xfs/libxfs/xfs_refcount_btree.c | 11 +- fs/xfs/libxfs/xfs_rmap.c | 21 +- fs/xfs/libxfs/xfs_rmap.h | 7 +- fs/xfs/libxfs/xfs_rmap_btree.c | 2 +- fs/xfs/libxfs/xfs_sb.c | 56 ++++- fs/xfs/libxfs/xfs_types.c | 23 ++ fs/xfs/libxfs/xfs_types.h | 2 + fs/xfs/xfs_aops.c | 32 +-- fs/xfs/xfs_bmap_item.c | 16 +- fs/xfs/xfs_bmap_item.h | 6 +- fs/xfs/xfs_bmap_util.c | 19 +- fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_buf.c | 16 +- fs/xfs/xfs_buf_item.c | 10 +- fs/xfs/xfs_buf_item.h | 11 +- fs/xfs/xfs_buf_item_recover.c | 9 +- fs/xfs/xfs_dquot.c | 26 +- fs/xfs/xfs_extent_busy.c | 36 ++- fs/xfs/xfs_extent_busy.h | 6 +- fs/xfs/xfs_extfree_item.c | 137 +++++++--- fs/xfs/xfs_extfree_item.h | 6 +- fs/xfs/xfs_file.c | 8 - fs/xfs/xfs_icache.c | 8 +- fs/xfs/xfs_icreate_item.c | 6 +- fs/xfs/xfs_icreate_item.h | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_inode_item.c | 6 +- fs/xfs/xfs_inode_item.h | 2 +- fs/xfs/xfs_iomap.c | 292 ++++++++++++++++++--- fs/xfs/xfs_iops.c | 56 +---- fs/xfs/xfs_iops.h | 1 - fs/xfs/xfs_log.c | 72 +++--- fs/xfs/xfs_log_priv.h | 2 +- fs/xfs/xfs_log_recover.c | 6 +- fs/xfs/xfs_mount.c | 12 +- fs/xfs/xfs_mru_cache.c | 2 +- fs/xfs/xfs_pnfs.c | 3 +- fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_refcount_item.c | 16 +- fs/xfs/xfs_refcount_item.h | 6 +- fs/xfs/xfs_reflink.c | 7 +- fs/xfs/xfs_rmap_item.c | 16 +- fs/xfs/xfs_rmap_item.h | 6 +- fs/xfs/xfs_super.c | 233 ++++++++--------- fs/xfs/xfs_trans.c | 24 +- fs/xfs/xfs_trans.h | 2 +- fs/xfs/xfs_trans_ail.c | 5 +- fs/xfs/xfs_trans_dquot.c | 4 +- mm/filemap.c | 1 + 70 files changed, 1358 insertions(+), 700 deletions(-)
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 188870, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
Our growfs test trigger mount error show as below:
[ 40.100164] ------------[ cut here ]------------ [ 40.101808] WARNING: CPU: 5 PID: 769 at fs/xfs/xfs_buf.c:615 xfs_buf_find+0x40d/0x510 ... [ 40.127061] Call Trace: [ 40.127502] xfs_buf_get_map+0x45/0x240 [ 40.128205] xfs_buf_read_map+0x54/0x230 [ 40.129854] xlog_recover_buf_commit_pass2+0x14e/0x490 [ 40.131743] xlog_recover_items_pass2+0x4d/0xa0 [ 40.132605] xlog_recover_commit_trans+0x325/0x350 [ 40.133496] xlog_recovery_process_trans+0xa7/0xe0 [ 40.134408] xlog_recover_process_data+0x8e/0x130 [ 40.135288] xlog_do_recovery_pass+0x3a4/0x730 [ 40.136874] xlog_do_log_recovery+0x62/0xb0 [ 40.137672] xlog_do_recover+0x34/0x1b0 [ 40.138392] xlog_recover+0xd9/0x170 [ 40.139072] xfs_log_mount+0x17f/0x2e0 [ 40.139790] xfs_mountfs+0x3de/0x8a0 [ 40.140473] xfs_fc_fill_super+0x485/0x7f0 [ 40.142095] get_tree_bdev+0x169/0x260 [ 40.142771] vfs_get_tree+0x1f/0xb0 [ 40.143438] do_new_mount+0x15e/0x2d0 [ 40.144141] __x64_sys_mount+0x101/0x140 [ 40.144900] do_syscall_64+0x2d/0x40 [ 40.145591] entry_SYSCALL_64_after_hwframe+0x61/0xc6 ... [ 40.158456] ---[ end trace 0fa38b12a77950ba ]--- [ 40.159354] XFS (loop0): log mount/recovery failed: error -117 [ 40.160956] XFS (loop0): log mount failed
---------------------------> time line +-------------------+----------------+------------------+ | growfs sb item ...|new agi item ...|growfs sb item ...| SHUTDOWN +-------------------+----------------+------------------+ CTX1 CTX2 CTX3
The testcase do multi growfs and then fail the IO which shutdown the xfs. Like the upper order, CTX1 add a new ag, CTX2 may log the agi, CTX3 will do another gorwfs. The sb item may still exist in ail after CTX3 iclog bio success, then the item lsn will change, which may change the tail_lsn.
Then, when we mount the img, and when we recover CTX2, the read for new agi will fail since the ag number still keep invalid.
Fix it by pin the sb lsn as CTX1, and then mount will first replay CTX1.
Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_buf_item.c | 2 +- fs/xfs/xfs_buf_item.h | 9 ++++++++- fs/xfs/xfs_trans.c | 14 +++++++++++--- 3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 3a9e006b7220..245d8d1899dd 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -715,7 +715,7 @@ xfs_buf_item_committed(
trace_xfs_buf_item_committed(bip);
- if ((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) && lip->li_lsn != 0) + if ((bip->bli_flags & XFS_BLI_KEEP_LSN) && lip->li_lsn != 0) return lip->li_lsn; return lsn; } diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index 50aa0f5ef959..d890fcef4543 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -17,6 +17,12 @@ #define XFS_BLI_STALE_INODE 0x20 #define XFS_BLI_INODE_BUF 0x40 #define XFS_BLI_ORDERED 0x80 +#define XFS_BLI_GROW_SB_BUF 0x100 + +#define XFS_BLI_KEEP_LSN \ + (XFS_BLI_INODE_ALLOC_BUF | \ + XFS_BLI_GROW_SB_BUF) +
#define XFS_BLI_FLAGS \ { XFS_BLI_HOLD, "HOLD" }, \ @@ -26,7 +32,8 @@ { XFS_BLI_INODE_ALLOC_BUF, "INODE_ALLOC" }, \ { XFS_BLI_STALE_INODE, "STALE_INODE" }, \ { XFS_BLI_INODE_BUF, "INODE_BUF" }, \ - { XFS_BLI_ORDERED, "ORDERED" } + { XFS_BLI_ORDERED, "ORDERED" }, \ + { XFS_BLI_GROW_SB_BUF, "GROW_SB" }
struct xfs_buf; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 8b6617833c58..12b0163d321c 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -24,6 +24,7 @@ #include "xfs_dquot_item.h" #include "xfs_dquot.h" #include "xfs_icache.h" +#include "xfs_buf_item.h"
kmem_zone_t *xfs_trans_zone;
@@ -478,11 +479,14 @@ STATIC void xfs_trans_apply_sb_deltas( xfs_trans_t *tp) { - xfs_dsb_t *sbp; - xfs_buf_t *bp; - int whole = 0; + xfs_dsb_t *sbp; + xfs_buf_t *bp; + struct xfs_buf_log_item *bip; + int whole = 0; + int grow = 0;
bp = xfs_trans_getsb(tp); + bip = bp->b_log_item; sbp = bp->b_addr;
/* @@ -507,10 +511,12 @@ xfs_trans_apply_sb_deltas( if (tp->t_dblocks_delta) { be64_add_cpu(&sbp->sb_dblocks, tp->t_dblocks_delta); whole = 1; + grow = 1; } if (tp->t_agcount_delta) { be32_add_cpu(&sbp->sb_agcount, tp->t_agcount_delta); whole = 1; + grow = 1; } if (tp->t_imaxpct_delta) { sbp->sb_imax_pct += tp->t_imaxpct_delta; @@ -538,6 +544,8 @@ xfs_trans_apply_sb_deltas( }
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); + if (grow) + bip->bli_flags |= XFS_BLI_GROW_SB_BUF; if (whole) /* * Log the whole thing, the fields are noncontiguous.
From: Baokun Li libaokun1@huawei.com
hulk inclusion category: bugfix bugzilla: 188871, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
xfs_alloc_ag_vextent_near() may keep restarting after xfs shutdown, triggering soft lockup, because although xfs_log_force() detects that the current filesystem has been shut down and returns an error, but xfs_extent_busy _flush() is a void function, so xfs_alloc_ag_vextent_near() does not sense that the current filesystem has been shut down. So we avoid this problem by propagating the return value of xfs_log_force() so that xfs_alloc_ag_vextent_near() does not restart and retry when the file system is shut down, and exits directly.
Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 22 ++++++++++++++++------ fs/xfs/xfs_extent_busy.c | 6 ++++-- fs/xfs/xfs_extent_busy.h | 2 +- 3 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 60aab422e818..c90324354e3e 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1629,8 +1629,11 @@ xfs_alloc_ag_vextent_near( if (!acur.len) { if (acur.busy) { trace_xfs_alloc_near_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, + error = xfs_extent_busy_flush(args->mp, args->pag, acur.busy_gen); + if (error) + goto out; + goto restart; } trace_xfs_alloc_size_neither(args); @@ -1733,11 +1736,14 @@ xfs_alloc_ag_vextent_size( * Make it unbusy by forcing the log out and * retrying. */ - xfs_btree_del_cursor(cnt_cur, - XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, + error = xfs_extent_busy_flush(args->mp, args->pag, busy_gen); + if (error) + goto error0; + + xfs_btree_del_cursor(cnt_cur, + XFS_BTREE_NOERROR); goto restart; } } @@ -1819,9 +1825,13 @@ xfs_alloc_ag_vextent_size( args->len = rlen; if (rlen < args->minlen) { if (busy) { - xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, busy_gen); + error = xfs_extent_busy_flush(args->mp, args->pag, + busy_gen); + if (error) + goto error0; + + xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); goto restart; } goto out_nominleft; diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 26680444969c..ea3cee00149a 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -579,7 +579,7 @@ xfs_extent_busy_clear( /* * Flush out all busy extents for this AG. */ -void +int xfs_extent_busy_flush( struct xfs_mount *mp, struct xfs_perag *pag, @@ -590,7 +590,7 @@ xfs_extent_busy_flush(
error = xfs_log_force(mp, XFS_LOG_SYNC); if (error) - return; + return error;
do { prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); @@ -600,6 +600,8 @@ xfs_extent_busy_flush( } while (1);
finish_wait(&pag->pagb_wait, &wait); + + return 0; }
void diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index 8aea07100092..7099f4bb358c 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -50,7 +50,7 @@ bool xfs_extent_busy_trim(struct xfs_alloc_arg *args, xfs_agblock_t *bno, xfs_extlen_t *len, unsigned *busy_gen);
-void +int xfs_extent_busy_flush(struct xfs_mount *mp, struct xfs_perag *pag, unsigned busy_gen);
hulk inclusion category: bugfix bugzilla: 188996, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
Our growfs test trigger mount error show as below:
XFS (dm-3): Starting recovery (logdev: internal) XFS (dm-3): Internal error !ino_ok at line 201 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir_ino_validate+0x54/0xc0 [xfs] CPU: 0 PID: 3719345 Comm: mount Kdump: loaded Not tainted 5.10.0-136.12.0.86.h1036.kasan.eulerosv2r12.aarch64 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x3a4 show_stack+0x34/0x4c dump_stack+0x170/0x1dc xfs_corruption_error+0x104/0x11c [xfs] xfs_dir_ino_validate+0x8c/0xc0 [xfs] __xfs_dir3_data_check+0x5e4/0xb60 [xfs] xfs_dir3_block_verify+0x108/0x190 [xfs] xfs_dir3_block_read_verify+0x10c/0x184 [xfs] xlog_recover_buf_commit_pass2+0x388/0x8e0 [xfs] xlog_recover_items_pass2+0xc8/0x160 [xfs] xlog_recover_commit_trans+0x56c/0x58c [xfs] xlog_recovery_process_trans+0x174/0x180 [xfs] xlog_recover_process_ophdr+0x120/0x210 [xfs] xlog_recover_process_data+0xcc/0x1b4 [xfs] xlog_recover_process+0x124/0x25c [xfs] xlog_do_recovery_pass+0x534/0x864 [xfs] xlog_do_log_recovery+0x98/0xc4 [xfs] xlog_do_recover+0x64/0x2ec [xfs] xlog_recover+0x1c4/0x2f0 [xfs] xfs_log_mount+0x1b8/0x550 [xfs] xfs_mountfs+0x768/0xe40 [xfs] xfs_fc_fill_super+0xb54/0xeb0 [xfs] get_tree_bdev+0x240/0x3e0 xfs_fc_get_tree+0x30/0x40 [xfs] vfs_get_tree+0x5c/0x1a4 do_new_mount+0x1c8/0x220 path_mount+0x2a8/0x3f0 __arm64_sys_mount+0x1cc/0x220 el0_svc_common.constprop.0+0xc0/0x2c4 do_el0_svc+0xb4/0xec el0_svc+0x24/0x3c el0_sync_handler+0x160/0x164 el0_sync+0x160/0x180 XFS (dm-3): Corruption detected. Unmount and run xfs_repair XFS (dm-3): Invalid inode number 0xd00082 XFS (dm-3): Metadata corruption detected at __xfs_dir3_data_check+0xa08/0xb60 [xfs], xfs_dir3_block block 0x100070 XFS (dm-3): Unmount and run xfs_repair XFS (dm-3): First 128 bytes of corrupted metadata buffer: 00000000: 58 44 42 33 f3 c3 ae cf 00 00 00 00 00 10 00 70 XDB3...........p 00000010: 00 00 00 01 00 00 0c 3e 85 a6 68 6c 63 b3 42 30 .......>..hlc.B0 00000020: b0 6b d1 b5 9d eb 55 7d 00 00 00 00 00 10 00 90 .k....U}........ 00000030: 03 60 0b 78 01 40 00 40 00 b0 00 40 00 00 00 00 .`.x.@.@...@.... 00000040: 00 00 00 00 00 10 00 90 01 2e 02 00 00 00 00 40 ...............@ 00000050: 00 00 00 00 00 10 00 81 02 2e 2e 02 00 00 00 50 ...............P 00000060: 00 00 00 00 00 10 00 91 02 66 65 01 00 00 00 60 .........fe....` 00000070: 00 00 00 00 00 10 00 92 02 66 66 01 00 00 00 70 .........ff....p
Consider the following log format, dir3 block has 2 items in the log, and the inode number recorded ondisk in diri3 block exceeds the current file system boundary. When replaying log items, it will skipping replay of first dir3 buffer log item due to the log item LSN being behind the ondisk buffer, but verification is still required. Since the superblock hasn't been replayed yet, the inode number in dir3 block exceeds the file system boundary and causes log recovery to fail.
log record: +---------------+----------------+--------------+-------------------+ | dir3 buf item | growfs sb item | inode item | dir3 buf item .. | +---------------+----------------+--------------+-------------------+ lsn X lsn X+A lsn X+A+B lsn X+A+B+C
metadata block: +-----------+-----------+------------+-----------+-----------+ | sb block | ... | dir3 block | ... | inodes | +-----------+-----------+------------+-----------+-----------+ lsn < X lsn X+A+B+C
Remove buffer read verify during log recovry pass2, clear buffer's XBF_DONE flag, so it can be verified in the next buf read after log recover.
Fixes: d38a530e3710 ("xfs: verify buffer contents when we skip log replay") Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_buf_item_recover.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 9cb79a494d06..d214e0f9cc09 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -950,13 +950,10 @@ xlog_recover_buf_commit_pass2(
/* * We're skipping replay of this buffer log item due to the log - * item LSN being behind the ondisk buffer. Verify the buffer - * contents since we aren't going to run the write verifier. + * item LSN being behind the ondisk buffer. clear XBF_DONE flag + * of the buffer to prevent buffer from being used without verify. */ - if (bp->b_ops) { - bp->b_ops->verify_read(bp); - error = bp->b_error; - } + bp->b_flags &= ~XBF_DONE; goto out_release; }
From: Dave Chinner dchinner@redhat.com
stable inclusion from stable-v5.10.167 commit d6f223cfef322d92305a697b848852ba4c2caecc category: bugfix bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
--------------------------------
commit 5b55cbc2d72632e874e50d2e36bce608e55aaaea upstream.
[backport for 5.10.y, prior to perag refactoring in v5.14]
Not fatal, the assert is there to catch developer attention. I'm seeing this occasionally during recoveryloop testing after a shutdown, and I don't want this to stop an overnight recoveryloop run as it is currently doing.
Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a corruption report into the log and cause a test failure that way, but it won't stop the machine dead.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: fs/xfs/xfs_mount.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_mount.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 8346441985b0..3f7044611286 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -128,7 +128,6 @@ __xfs_free_perag( struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
ASSERT(!delayed_work_pending(&pag->pag_blockgc_work)); - ASSERT(atomic_read(&pag->pag_ref) == 0); kmem_free(pag); }
@@ -147,7 +146,7 @@ xfs_free_perag( pag = radix_tree_delete(&mp->m_perag_tree, agno); spin_unlock(&mp->m_perag_lock); ASSERT(pag); - ASSERT(atomic_read(&pag->pag_ref) == 0); + XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0); cancel_delayed_work_sync(&pag->pag_blockgc_work); xfs_iunlink_destroy(pag); xfs_buf_hash_destroy(pag);
From: Dave Chinner dchinner@redhat.com
stable inclusion from stable-v5.10.167 commit 8cf9400f8948781cc9175f94666421e77622d639 category: bugfix bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
--------------------------------
commit 0b02c8c0d75a738c98c35f02efb36217c170d78c upstream.
[backport for 5.10.y]
Now that we only call xfs_update_prealloc_flags() from xfs_file_fallocate() in the case where we need to set the preallocation flag, do this in xfs_alloc_file_space() where we already have the inode joined into a transaction and get rid of the call to xfs_update_prealloc_flags() from the fallocate code.
This also means that we now correctly avoid setting the XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as these inodes will never have preallocated extents.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: fs/xfs/xfs_bmap_util.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_bmap_util.c | 9 +++------ fs/xfs/xfs_file.c | 8 -------- 2 files changed, 3 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 85d3804ebbe7..a44416597880 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -798,9 +798,6 @@ xfs_alloc_file_space( rblocks = 0; }
- /* - * Allocate and setup the transaction. - */ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, dblocks, rblocks, false, &tp); if (error) @@ -817,9 +814,9 @@ xfs_alloc_file_space( if (error) goto error;
- /* - * Complete the transaction - */ + ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + error = xfs_trans_commit(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); if (error) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a6a59bd6c189..9f52365995c7 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -894,7 +894,6 @@ xfs_file_fallocate( struct inode *inode = file_inode(file); struct xfs_inode *ip = XFS_I(inode); long error; - enum xfs_prealloc_flags flags = 0; uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; loff_t new_size = 0; bool do_file_insert = false; @@ -992,8 +991,6 @@ xfs_file_fallocate( } do_file_insert = true; } else { - flags |= XFS_PREALLOC_SET; - if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > i_size_read(inode)) { new_size = offset + len; @@ -1044,11 +1041,6 @@ xfs_file_fallocate( if (error) goto out_unlock; } - - error = xfs_update_prealloc_flags(ip, XFS_PREALLOC_SET); - if (error) - goto out_unlock; - }
/* Change file size if needed */
From: "Darrick J. Wong" djwong@kernel.org
stable inclusion from stable-v5.10.167 commit f60b68c46444e4e9321a4313efd5ea0eddd77fb7 category: bugfix bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
--------------------------------
commit e014f37db1a2d109afa750042ac4d69cf3e3d88e upstream.
[remove userns argument of setattr_copy() for 5.10.y backport]
Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid revocation isn't consistent with btrfs[1] or ext4. Those two filesystems use the VFS function setattr_copy to convey certain attributes from struct iattr into the VFS inode structure.
Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to decide if it should clear setgid and setuid on a file attribute update. This is a second symptom of the problem that Filipe noticed.
XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode, xfs_setattr_nonsize, and xfs_setattr_time. Regrettably, setattr_copy is /not/ a simple copy function; it contains additional logic to clear the setgid bit when setting the mode, and XFS' version no longer matches.
The VFS implements its own setuid/setgid stripping logic, which establishes consistent behavior. It's a tad unfortunate that it's scattered across notify_change, should_remove_suid, and setattr_copy but XFS should really follow the Linux VFS. Adapt XFS to use the VFS functions and get rid of the old functions.
[1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y... [2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@...
Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Christian Brauner brauner@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: fs/xfs/xfs_iops.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_iops.c | 56 +++-------------------------------------------- fs/xfs/xfs_pnfs.c | 3 ++- 2 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index cc478df14996..a527a544a684 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -573,37 +573,6 @@ xfs_vn_getattr( return 0; }
-static void -xfs_setattr_mode( - struct xfs_inode *ip, - struct iattr *iattr) -{ - struct inode *inode = VFS_I(ip); - umode_t mode = iattr->ia_mode; - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - - inode->i_mode &= S_IFMT; - inode->i_mode |= mode & ~S_IFMT; -} - -void -xfs_setattr_time( - struct xfs_inode *ip, - struct iattr *iattr) -{ - struct inode *inode = VFS_I(ip); - - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); - - if (iattr->ia_valid & ATTR_ATIME) - inode->i_atime = iattr->ia_atime; - if (iattr->ia_valid & ATTR_CTIME) - inode->i_ctime = iattr->ia_ctime; - if (iattr->ia_valid & ATTR_MTIME) - inode->i_mtime = iattr->ia_mtime; -} - static int xfs_vn_change_ok( struct dentry *dentry, @@ -701,16 +670,6 @@ xfs_setattr_nonsize( gid = (mask & ATTR_GID) ? iattr->ia_gid : igid; uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
- /* - * CAP_FSETID overrides the following restrictions: - * - * The set-user-ID and set-group-ID bits of a file will be - * cleared upon successful return from chown() - */ - if ((inode->i_mode & (S_ISUID|S_ISGID)) && - !capable(CAP_FSETID)) - inode->i_mode &= ~(S_ISUID|S_ISGID); - /* * Change the ownerships and register quota modifications * in the transaction. @@ -722,7 +681,6 @@ xfs_setattr_nonsize( olddquot1 = xfs_qm_vop_chown(tp, ip, &ip->i_udquot, udqp); } - inode->i_uid = uid; } if (!gid_eq(igid, gid)) { if (XFS_IS_GQUOTA_ON(mp)) { @@ -733,15 +691,10 @@ xfs_setattr_nonsize( olddquot2 = xfs_qm_vop_chown(tp, ip, &ip->i_gdquot, gdqp); } - inode->i_gid = gid; } }
- if (mask & ATTR_MODE) - xfs_setattr_mode(ip, iattr); - if (mask & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME)) - xfs_setattr_time(ip, iattr); - + setattr_copy(inode, iattr); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
XFS_STATS_INC(mp, xs_ig_attrchg); @@ -981,11 +934,8 @@ xfs_setattr_size( xfs_inode_clear_eofblocks_tag(ip); }
- if (iattr->ia_valid & ATTR_MODE) - xfs_setattr_mode(ip, iattr); - if (iattr->ia_valid & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME)) - xfs_setattr_time(ip, iattr); - + ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID))); + setattr_copy(inode, iattr); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
XFS_STATS_INC(mp, xs_ig_attrchg); diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index 2876b1808e33..21839e35e098 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -287,7 +287,8 @@ xfs_fs_commit_blocks( xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- xfs_setattr_time(ip, iattr); + ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID))); + setattr_copy(inode, iattr); if (update_isize) { i_size_write(inode, iattr->ia_size); ip->i_d.di_size = iattr->ia_size;
From: Gaosheng Cui cuigaosheng1@huawei.com
stable inclusion from stable-v5.10.167 commit daa97e770e780b5e74d85afd236849359711a8d0 category: bugfix bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
--------------------------------
commit b0463b9dd7030a766133ad2f1571f97f204d7bdf upstream.
xfs_setattr_time() has been removed since commit e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes"), so remove it.
Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Reviewed-by: Carlos Maiolino cmaiolino@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_iops.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h index fb3604b68208..42aab2a39a95 100644 --- a/fs/xfs/xfs_iops.h +++ b/fs/xfs/xfs_iops.h @@ -18,7 +18,6 @@ extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size); */ #define XFS_ATTR_NOACL 0x01 /* Don't call posix_acl_chmod */
-extern void xfs_setattr_time(struct xfs_inode *ip, struct iattr *iattr); extern int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap, int flags); extern int xfs_vn_setattr_nonsize(struct dentry *dentry, struct iattr *vap);
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit e7720afad068a6729d9cd3aaa08212f2f5a7ceff category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Remove these typedefs by referencing kmem_cache directly.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_alloc_btree.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/libxfs/xfs_btree.h fs/xfs/libxfs/xfs_ialloc_btree.c fs/xfs/libxfs/xfs_refcount_btree.c fs/xfs/libxfs/xfs_rmap_btree.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/kmem.h | 4 ---- fs/xfs/libxfs/xfs_alloc.c | 2 +- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap.h | 2 +- fs/xfs/libxfs/xfs_btree.c | 2 +- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 2 +- fs/xfs/libxfs/xfs_da_btree.h | 2 +- fs/xfs/libxfs/xfs_inode_fork.c | 2 +- fs/xfs/libxfs/xfs_inode_fork.h | 2 +- fs/xfs/xfs_bmap_item.c | 4 ++-- fs/xfs/xfs_bmap_item.h | 6 +++--- fs/xfs/xfs_buf.c | 2 +- fs/xfs/xfs_buf_item.c | 2 +- fs/xfs/xfs_buf_item.h | 2 +- fs/xfs/xfs_dquot.c | 4 ++-- fs/xfs/xfs_extfree_item.c | 4 ++-- fs/xfs/xfs_extfree_item.h | 6 +++--- fs/xfs/xfs_icreate_item.c | 2 +- fs/xfs/xfs_icreate_item.h | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_inode_item.c | 2 +- fs/xfs/xfs_inode_item.h | 2 +- fs/xfs/xfs_log.c | 2 +- fs/xfs/xfs_log_priv.h | 2 +- fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_refcount_item.c | 4 ++-- fs/xfs/xfs_refcount_item.h | 6 +++--- fs/xfs/xfs_rmap_item.c | 4 ++-- fs/xfs/xfs_rmap_item.h | 6 +++--- fs/xfs/xfs_trans.c | 2 +- fs/xfs/xfs_trans.h | 2 +- 33 files changed, 45 insertions(+), 49 deletions(-)
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index 54da6d717a06..b987dc2c6851 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -72,10 +72,6 @@ kmem_zalloc(size_t size, xfs_km_flags_t flags) /* * Zone interfaces */ - -#define kmem_zone kmem_cache -#define kmem_zone_t struct kmem_cache - static inline struct page * kmem_to_page(void *addr) { diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index c90324354e3e..5a09e343c753 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -27,7 +27,7 @@ #include "xfs_ag_resv.h" #include "xfs_bmap.h"
-extern kmem_zone_t *xfs_bmap_free_item_zone; +extern struct kmem_cache *xfs_bmap_free_item_zone;
struct workqueue_struct *xfs_alloc_wq;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 34fe4aed0ba8..1cfd57809a35 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -37,7 +37,7 @@ #include "xfs_iomap.h"
-kmem_zone_t *xfs_bmap_free_item_zone; +struct kmem_cache *xfs_bmap_free_item_zone;
/* * Miscellaneous helper functions diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 6747e97a7949..522d384a1fa7 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -13,7 +13,7 @@ struct xfs_inode; struct xfs_mount; struct xfs_trans;
-extern kmem_zone_t *xfs_bmap_free_item_zone; +extern struct kmem_cache *xfs_bmap_free_item_zone;
/* * Argument structure for xfs_bmap_alloc. diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 145ee148a6e0..b9bf326a6f59 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -25,7 +25,7 @@ /* * Cursor allocation zone. */ -kmem_zone_t *xfs_btree_cur_zone; +struct kmem_cache *xfs_btree_cur_zone;
/* * Btree magic numbers. diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index ba11d2a4b686..08e53dbc8963 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -12,7 +12,7 @@ struct xfs_mount; struct xfs_trans; struct xfs_ifork;
-extern kmem_zone_t *xfs_btree_cur_zone; +extern struct kmem_cache *xfs_btree_cur_zone;
/* * Generic key, ptr and record wrapper structures. diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index 10e93c9ce827..b7e7c24f7092 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -72,7 +72,7 @@ STATIC int xfs_da3_blk_unlink(xfs_da_state_t *state, xfs_da_state_blk_t *save_blk);
-kmem_zone_t *xfs_da_state_zone; /* anchor for state struct zone */ +struct kmem_cache *xfs_da_state_zone; /* anchor for state struct zone */
/* * Allocate a dir-state structure. diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h index adb3c4419051..cf9c5c8d6626 100644 --- a/fs/xfs/libxfs/xfs_da_btree.h +++ b/fs/xfs/libxfs/xfs_da_btree.h @@ -238,6 +238,6 @@ void xfs_da3_node_hdr_from_disk(struct xfs_mount *mp, void xfs_da3_node_hdr_to_disk(struct xfs_mount *mp, struct xfs_da_intnode *to, struct xfs_da3_icnode_hdr *from);
-extern struct kmem_zone *xfs_da_state_zone; +extern struct kmem_cache *xfs_da_state_zone;
#endif /* __XFS_DA_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 11000cb8129a..d083654e3197 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -25,7 +25,7 @@ #include "xfs_attr_leaf.h" #include "xfs_types.h"
-kmem_zone_t *xfs_ifork_zone; +struct kmem_cache *xfs_ifork_zone;
void xfs_init_local_fork( diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 64465472f0f0..910c895ed5a2 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -211,7 +211,7 @@ static inline bool xfs_iext_peek_prev_extent(struct xfs_ifork *ifp, xfs_iext_get_extent((ifp), (ext), (got)); \ xfs_iext_next((ifp), (ext)))
-extern struct kmem_zone *xfs_ifork_zone; +extern struct kmem_cache *xfs_ifork_zone;
extern void xfs_ifork_init_cow(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index fb26c6123a21..201233a5e19e 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -26,8 +26,8 @@ #include "xfs_log_recover.h" #include "xfs_quota.h"
-kmem_zone_t *xfs_bui_zone; -kmem_zone_t *xfs_bud_zone; +struct kmem_cache *xfs_bui_zone; +struct kmem_cache *xfs_bud_zone;
static const struct xfs_item_ops xfs_bui_item_ops;
diff --git a/fs/xfs/xfs_bmap_item.h b/fs/xfs/xfs_bmap_item.h index b9be62f8bd52..6af6b02d4b66 100644 --- a/fs/xfs/xfs_bmap_item.h +++ b/fs/xfs/xfs_bmap_item.h @@ -25,7 +25,7 @@ /* kernel only BUI/BUD definitions */
struct xfs_mount; -struct kmem_zone; +struct kmem_cache;
/* * Max number of extents in fast allocation path. @@ -65,7 +65,7 @@ struct xfs_bud_log_item { struct xfs_bud_log_format bud_format; };
-extern struct kmem_zone *xfs_bui_zone; -extern struct kmem_zone *xfs_bud_zone; +extern struct kmem_cache *xfs_bui_zone; +extern struct kmem_cache *xfs_bud_zone;
#endif /* __XFS_BMAP_ITEM_H__ */ diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 0d3d057c4af4..798a8ce0cc76 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -21,7 +21,7 @@ #include "xfs_errortag.h" #include "xfs_error.h"
-static kmem_zone_t *xfs_buf_zone; +static struct kmem_cache *xfs_buf_zone;
#define xb_to_gfp(flags) \ ((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 245d8d1899dd..8621b75a7a4f 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -24,7 +24,7 @@ #include "xfs_log_priv.h"
-kmem_zone_t *xfs_buf_item_zone; +struct kmem_cache *xfs_buf_item_zone;
static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip) { diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index d890fcef4543..7b3db6f5a9f0 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -78,6 +78,6 @@ static inline void xfs_buf_dquot_io_fail(struct xfs_buf *bp) void xfs_buf_iodone(struct xfs_buf *); bool xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
-extern kmem_zone_t *xfs_buf_item_zone; +extern struct kmem_cache *xfs_buf_item_zone;
#endif /* __XFS_BUF_ITEM_H__ */ diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index c0c62f37c396..a829a79231ec 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -38,8 +38,8 @@ * otherwise by the lowest id first, see xfs_dqlock2. */
-struct kmem_zone *xfs_qm_dqtrxzone; -static struct kmem_zone *xfs_qm_dqzone; +struct kmem_cache *xfs_qm_dqtrxzone; +static struct kmem_cache *xfs_qm_dqzone;
static struct lock_class_key xfs_dquot_group_class; static struct lock_class_key xfs_dquot_project_class; diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index b33276cb525f..c2430ac399f3 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -25,8 +25,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-kmem_zone_t *xfs_efi_zone; -kmem_zone_t *xfs_efd_zone; +struct kmem_cache *xfs_efi_zone; +struct kmem_cache *xfs_efd_zone;
static const struct xfs_item_ops xfs_efi_item_ops;
diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h index cd2860c875bf..e8644945290e 100644 --- a/fs/xfs/xfs_extfree_item.h +++ b/fs/xfs/xfs_extfree_item.h @@ -9,7 +9,7 @@ /* kernel only EFI/EFD definitions */
struct xfs_mount; -struct kmem_zone; +struct kmem_cache;
/* * Max number of extents in fast allocation path. @@ -69,7 +69,7 @@ struct xfs_efd_log_item { */ #define XFS_EFD_MAX_FAST_EXTENTS 16
-extern struct kmem_zone *xfs_efi_zone; -extern struct kmem_zone *xfs_efd_zone; +extern struct kmem_cache *xfs_efi_zone; +extern struct kmem_cache *xfs_efd_zone;
#endif /* __XFS_EXTFREE_ITEM_H__ */ diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c index aa8c7c261d24..288c3af1b8e9 100644 --- a/fs/xfs/xfs_icreate_item.c +++ b/fs/xfs/xfs_icreate_item.c @@ -20,7 +20,7 @@ #include "xfs_ialloc.h" #include "xfs_trace.h"
-kmem_zone_t *xfs_icreate_zone; /* inode create item zone */ +struct kmem_cache *xfs_icreate_zone; /* inode create item zone */
static inline struct xfs_icreate_item *ICR_ITEM(struct xfs_log_item *lip) { diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h index a50d0b01e15a..944427b33645 100644 --- a/fs/xfs/xfs_icreate_item.h +++ b/fs/xfs/xfs_icreate_item.h @@ -12,7 +12,7 @@ struct xfs_icreate_item { struct xfs_icreate_log ic_format; };
-extern kmem_zone_t *xfs_icreate_zone; /* inode create item zone */ +extern struct kmem_cache *xfs_icreate_zone; /* inode create item zone */
void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno, xfs_agblock_t agbno, unsigned int count, diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7b777540a44c..9f7cafc6f737 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -37,7 +37,7 @@ #include "xfs_reflink.h" #include "xfs_log_priv.h"
-kmem_zone_t *xfs_inode_zone; +struct kmem_cache *xfs_inode_zone;
/* * Used in xfs_itruncate_extents(). This is the maximum number of extents diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index ad51b5707677..e320f1e0290f 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -546,7 +546,7 @@ static inline void xfs_setup_existing_inode(struct xfs_inode *ip)
void xfs_irele(struct xfs_inode *ip);
-extern struct kmem_zone *xfs_inode_zone; +extern struct kmem_cache *xfs_inode_zone;
/* The default CoW extent size hint. */ #define XFS_DEFAULT_COWEXTSZ_HINT 32 diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index fb1d482c7200..c3ffb9536393 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -22,7 +22,7 @@
#include <linux/iversion.h>
-kmem_zone_t *xfs_ili_zone; /* inode log item zone */ +struct kmem_cache *xfs_ili_zone; /* inode log item zone */
static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip) { diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index 9c829cf5c839..35f2be490287 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -48,6 +48,6 @@ extern void xfs_iflush_shutdown_abort(struct xfs_inode *); extern int xfs_inode_item_format_convert(xfs_log_iovec_t *, struct xfs_inode_log_format *);
-extern struct kmem_zone *xfs_ili_zone; +extern struct kmem_cache *xfs_ili_zone;
#endif /* __XFS_INODE_ITEM_H__ */ diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 7220945cf816..8cd7bf27f965 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -21,7 +21,7 @@ #include "xfs_sb.h" #include "xfs_health.h"
-kmem_zone_t *xfs_log_ticket_zone; +struct kmem_cache *xfs_log_ticket_zone;
/* Local miscellaneous function prototypes */ STATIC struct xlog * diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index ecb9ec8a4d05..fe399acc0463 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -502,7 +502,7 @@ xlog_recover_cancel(struct xlog *); extern __le32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, char *dp, int size);
-extern kmem_zone_t *xfs_log_ticket_zone; +extern struct kmem_cache *xfs_log_ticket_zone; struct xlog_ticket * xlog_ticket_alloc( struct xlog *log, diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index 442a0f97a9d4..5e8b70526538 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -11,7 +11,7 @@
struct xfs_inode;
-extern struct kmem_zone *xfs_qm_dqtrxzone; +extern struct kmem_cache *xfs_qm_dqtrxzone;
/* * Number of bmaps that we ask from bmapi when doing a quotacheck. diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 93535c110ca1..ccfcc91d616e 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -21,8 +21,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-kmem_zone_t *xfs_cui_zone; -kmem_zone_t *xfs_cud_zone; +struct kmem_cache *xfs_cui_zone; +struct kmem_cache *xfs_cud_zone;
static const struct xfs_item_ops xfs_cui_item_ops;
diff --git a/fs/xfs/xfs_refcount_item.h b/fs/xfs/xfs_refcount_item.h index f4f2e836540b..22c69c5a8394 100644 --- a/fs/xfs/xfs_refcount_item.h +++ b/fs/xfs/xfs_refcount_item.h @@ -25,7 +25,7 @@ /* kernel only CUI/CUD definitions */
struct xfs_mount; -struct kmem_zone; +struct kmem_cache;
/* * Max number of extents in fast allocation path. @@ -68,7 +68,7 @@ struct xfs_cud_log_item { struct xfs_cud_log_format cud_format; };
-extern struct kmem_zone *xfs_cui_zone; -extern struct kmem_zone *xfs_cud_zone; +extern struct kmem_cache *xfs_cui_zone; +extern struct kmem_cache *xfs_cud_zone;
#endif /* __XFS_REFCOUNT_ITEM_H__ */ diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 46ada19c3c26..597e55668cb5 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -21,8 +21,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-kmem_zone_t *xfs_rui_zone; -kmem_zone_t *xfs_rud_zone; +struct kmem_cache *xfs_rui_zone; +struct kmem_cache *xfs_rud_zone;
static const struct xfs_item_ops xfs_rui_item_ops;
diff --git a/fs/xfs/xfs_rmap_item.h b/fs/xfs/xfs_rmap_item.h index 31e6cdfff71f..b062b983a82f 100644 --- a/fs/xfs/xfs_rmap_item.h +++ b/fs/xfs/xfs_rmap_item.h @@ -28,7 +28,7 @@ /* kernel only RUI/RUD definitions */
struct xfs_mount; -struct kmem_zone; +struct kmem_cache;
/* * Max number of extents in fast allocation path. @@ -68,7 +68,7 @@ struct xfs_rud_log_item { struct xfs_rud_log_format rud_format; };
-extern struct kmem_zone *xfs_rui_zone; -extern struct kmem_zone *xfs_rud_zone; +extern struct kmem_cache *xfs_rui_zone; +extern struct kmem_cache *xfs_rud_zone;
#endif /* __XFS_RMAP_ITEM_H__ */ diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 12b0163d321c..fa5132bf9180 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -26,7 +26,7 @@ #include "xfs_icache.h" #include "xfs_buf_item.h"
-kmem_zone_t *xfs_trans_zone; +struct kmem_cache *xfs_trans_zone;
#if defined(CONFIG_TRACEPOINTS) static void diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 2aad408b1313..a9ea901d7449 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -244,7 +244,7 @@ void xfs_trans_buf_set_type(struct xfs_trans *, struct xfs_buf *, void xfs_trans_buf_copy_type(struct xfs_buf *dst_bp, struct xfs_buf *src_bp);
-extern kmem_zone_t *xfs_trans_zone; +extern struct kmem_cache *xfs_trans_zone;
static inline struct xfs_log_item * xfs_trans_item_relog(
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit 182696fb021fc196e5cbe641565ca40fcf0f885a category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_attr_leaf.c fs/xfs/libxfs/xfs_inode_fork.c fs/xfs/libxfs/xfs_btree.h fs/xfs/libxfs/xfs_ialloc_btree.c fs/xfs/libxfs/xfs_alloc_btree.c fs/xfs/libxfs/xfs_rmap_btree.c fs/xfs/libxfs/xfs_refcount_btree.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/xfs_attr_inactive.c fs/xfs/xfs_bmap_item.c fs/xfs/xfs_icache.c fs/xfs/xfs_icreate_item.c fs/xfs/xfs_refcount_item.c fs/xfs/xfs_rmap_item.c fs/xfs/xfs_super.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 6 +- fs/xfs/libxfs/xfs_alloc_btree.c | 2 +- fs/xfs/libxfs/xfs_bmap.c | 6 +- fs/xfs/libxfs/xfs_bmap.h | 2 +- fs/xfs/libxfs/xfs_bmap_btree.c | 2 +- fs/xfs/libxfs/xfs_btree.c | 4 +- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 6 +- fs/xfs/libxfs/xfs_da_btree.h | 3 +- fs/xfs/libxfs/xfs_ialloc_btree.c | 2 +- fs/xfs/libxfs/xfs_inode_fork.c | 4 +- fs/xfs/libxfs/xfs_inode_fork.h | 2 +- fs/xfs/libxfs/xfs_refcount_btree.c | 2 +- fs/xfs/libxfs/xfs_rmap_btree.c | 2 +- fs/xfs/xfs_bmap_item.c | 12 +- fs/xfs/xfs_bmap_item.h | 4 +- fs/xfs/xfs_buf.c | 16 +- fs/xfs/xfs_buf_item.c | 8 +- fs/xfs/xfs_buf_item.h | 2 +- fs/xfs/xfs_dquot.c | 26 ++-- fs/xfs/xfs_extfree_item.c | 18 +-- fs/xfs/xfs_extfree_item.h | 4 +- fs/xfs/xfs_icache.c | 8 +- fs/xfs/xfs_icreate_item.c | 6 +- fs/xfs/xfs_icreate_item.h | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_inode_item.c | 6 +- fs/xfs/xfs_inode_item.h | 2 +- fs/xfs/xfs_log.c | 6 +- fs/xfs/xfs_log_priv.h | 2 +- fs/xfs/xfs_mru_cache.c | 2 +- fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_refcount_item.c | 12 +- fs/xfs/xfs_refcount_item.h | 4 +- fs/xfs/xfs_rmap_item.c | 12 +- fs/xfs/xfs_rmap_item.h | 4 +- fs/xfs/xfs_super.c | 226 ++++++++++++++--------------- fs/xfs/xfs_trans.c | 8 +- fs/xfs/xfs_trans.h | 2 +- fs/xfs/xfs_trans_dquot.c | 4 +- 41 files changed, 223 insertions(+), 224 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 5a09e343c753..9d1097331b49 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -27,7 +27,7 @@ #include "xfs_ag_resv.h" #include "xfs_bmap.h"
-extern struct kmem_cache *xfs_bmap_free_item_zone; +extern struct kmem_cache *xfs_bmap_free_item_cache;
struct workqueue_struct *xfs_alloc_wq;
@@ -2468,10 +2468,10 @@ xfs_defer_agfl_block( struct xfs_mount *mp = tp->t_mountp; struct xfs_extent_free_item *new; /* new element */
- ASSERT(xfs_bmap_free_item_zone != NULL); + ASSERT(xfs_bmap_free_item_cache != NULL); ASSERT(oinfo != NULL);
- new = kmem_cache_alloc(xfs_bmap_free_item_zone, + new = kmem_cache_alloc(xfs_bmap_free_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); new->xefi_blockcount = 1; diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index 4452b4255917..613a1743fdd5 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -480,7 +480,7 @@ xfs_allocbt_init_common(
ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
- cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL); + cur = kmem_cache_zalloc(xfs_btree_cur_cache, GFP_NOFS | __GFP_NOFAIL);
cur->bc_tp = tp; cur->bc_mp = mp; diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 1cfd57809a35..180c212c862b 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -37,7 +37,7 @@ #include "xfs_iomap.h"
-struct kmem_cache *xfs_bmap_free_item_zone; +struct kmem_cache *xfs_bmap_free_item_cache;
/* * Miscellaneous helper functions @@ -551,9 +551,9 @@ __xfs_bmap_add_free( ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif - ASSERT(xfs_bmap_free_item_zone != NULL); + ASSERT(xfs_bmap_free_item_cache != NULL);
- new = kmem_cache_alloc(xfs_bmap_free_item_zone, + new = kmem_cache_alloc(xfs_bmap_free_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = bno; new->xefi_blockcount = (xfs_extlen_t)len; diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 522d384a1fa7..05c5db07d9fa 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -13,7 +13,7 @@ struct xfs_inode; struct xfs_mount; struct xfs_trans;
-extern struct kmem_cache *xfs_bmap_free_item_zone; +extern struct kmem_cache *xfs_bmap_free_item_cache;
/* * Argument structure for xfs_bmap_alloc. diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 64a3cbcb22a2..65762ae9b40a 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -552,7 +552,7 @@ xfs_bmbt_init_cursor( struct xfs_btree_cur *cur; ASSERT(whichfork != XFS_COW_FORK);
- cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL); + cur = kmem_cache_zalloc(xfs_btree_cur_cache, GFP_NOFS | __GFP_NOFAIL);
cur->bc_tp = tp; cur->bc_mp = mp; diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index b9bf326a6f59..04ab18e209fb 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -25,7 +25,7 @@ /* * Cursor allocation zone. */ -struct kmem_cache *xfs_btree_cur_zone; +struct kmem_cache *xfs_btree_cur_cache;
/* * Btree magic numbers. @@ -454,7 +454,7 @@ xfs_btree_del_cursor( xfs_is_shutdown(cur->bc_mp) || error != 0); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) kmem_free(cur->bc_ops); - kmem_cache_free(xfs_btree_cur_zone, cur); + kmem_cache_free(xfs_btree_cur_cache, cur); }
/* diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 08e53dbc8963..d172046ae833 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -12,7 +12,7 @@ struct xfs_mount; struct xfs_trans; struct xfs_ifork;
-extern struct kmem_cache *xfs_btree_cur_zone; +extern struct kmem_cache *xfs_btree_cur_cache;
/* * Generic key, ptr and record wrapper structures. diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index b7e7c24f7092..f1258e294ead 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -72,7 +72,7 @@ STATIC int xfs_da3_blk_unlink(xfs_da_state_t *state, xfs_da_state_blk_t *save_blk);
-struct kmem_cache *xfs_da_state_zone; /* anchor for state struct zone */ +struct kmem_cache *xfs_da_state_cache; /* anchor for dir/attr state */
/* * Allocate a dir-state structure. @@ -84,7 +84,7 @@ xfs_da_state_alloc( { struct xfs_da_state *state;
- state = kmem_cache_zalloc(xfs_da_state_zone, GFP_NOFS | __GFP_NOFAIL); + state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL); state->args = args; state->mp = args->dp->i_mount; return state; @@ -113,7 +113,7 @@ xfs_da_state_free(xfs_da_state_t *state) #ifdef DEBUG memset((char *)state, 0, sizeof(*state)); #endif /* DEBUG */ - kmem_cache_free(xfs_da_state_zone, state); + kmem_cache_free(xfs_da_state_cache, state); }
static inline int xfs_dabuf_nfsb(struct xfs_mount *mp, int whichfork) diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h index cf9c5c8d6626..6902fdda184e 100644 --- a/fs/xfs/libxfs/xfs_da_btree.h +++ b/fs/xfs/libxfs/xfs_da_btree.h @@ -9,7 +9,6 @@
struct xfs_inode; struct xfs_trans; -struct zone;
/* * Directory/attribute geometry information. There will be one of these for each @@ -238,6 +237,6 @@ void xfs_da3_node_hdr_from_disk(struct xfs_mount *mp, void xfs_da3_node_hdr_to_disk(struct xfs_mount *mp, struct xfs_da_intnode *to, struct xfs_da3_icnode_hdr *from);
-extern struct kmem_cache *xfs_da_state_zone; +extern struct kmem_cache *xfs_da_state_cache;
#endif /* __XFS_DA_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index e554b58abc65..0733270062f6 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -432,7 +432,7 @@ xfs_inobt_init_common( { struct xfs_btree_cur *cur;
- cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL); + cur = kmem_cache_zalloc(xfs_btree_cur_cache, GFP_NOFS | __GFP_NOFAIL); cur->bc_tp = tp; cur->bc_mp = mp; cur->bc_btnum = btnum; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index d083654e3197..26a1696e1bce 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -25,7 +25,7 @@ #include "xfs_attr_leaf.h" #include "xfs_types.h"
-struct kmem_cache *xfs_ifork_zone; +struct kmem_cache *xfs_ifork_cache;
void xfs_init_local_fork( @@ -688,7 +688,7 @@ xfs_ifork_init_cow( if (ip->i_cowfp) return;
- ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_zone, + ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_cache, GFP_NOFS | __GFP_NOFAIL); ip->i_cowfp->if_flags = XFS_IFEXTENTS; ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS; diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 910c895ed5a2..fe2324c53855 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -211,7 +211,7 @@ static inline bool xfs_iext_peek_prev_extent(struct xfs_ifork *ifp, xfs_iext_get_extent((ifp), (ext), (got)); \ xfs_iext_next((ifp), (ext)))
-extern struct kmem_cache *xfs_ifork_zone; +extern struct kmem_cache *xfs_ifork_cache;
extern void xfs_ifork_init_cow(struct xfs_inode *ip);
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index 679f5d9763da..59b1fee4f680 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -323,7 +323,7 @@ xfs_refcountbt_init_common( ASSERT(agno != NULLAGNUMBER); ASSERT(agno < mp->m_sb.sb_agcount);
- cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL); + cur = kmem_cache_zalloc(xfs_btree_cur_cache, GFP_NOFS | __GFP_NOFAIL); cur->bc_tp = tp; cur->bc_mp = mp; cur->bc_btnum = XFS_BTNUM_REFC; diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c index 19cb2c11a3a3..97c519ede1d6 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.c +++ b/fs/xfs/libxfs/xfs_rmap_btree.c @@ -454,7 +454,7 @@ xfs_rmapbt_init_common( { struct xfs_btree_cur *cur;
- cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL); + cur = kmem_cache_zalloc(xfs_btree_cur_cache, GFP_NOFS | __GFP_NOFAIL); cur->bc_tp = tp; cur->bc_mp = mp; /* Overlapping btree; 2 keys per pointer. */ diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 201233a5e19e..ec3c3c43b4f9 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -26,8 +26,8 @@ #include "xfs_log_recover.h" #include "xfs_quota.h"
-struct kmem_cache *xfs_bui_zone; -struct kmem_cache *xfs_bud_zone; +struct kmem_cache *xfs_bui_cache; +struct kmem_cache *xfs_bud_cache;
static const struct xfs_item_ops xfs_bui_item_ops;
@@ -41,7 +41,7 @@ xfs_bui_item_free( struct xfs_bui_log_item *buip) { kmem_free(buip->bui_item.li_lv_shadow); - kmem_cache_free(xfs_bui_zone, buip); + kmem_cache_free(xfs_bui_cache, buip); }
/* @@ -140,7 +140,7 @@ xfs_bui_init( { struct xfs_bui_log_item *buip;
- buip = kmem_cache_zalloc(xfs_bui_zone, GFP_KERNEL | __GFP_NOFAIL); + buip = kmem_cache_zalloc(xfs_bui_cache, GFP_KERNEL | __GFP_NOFAIL);
xfs_log_item_init(mp, &buip->bui_item, XFS_LI_BUI, &xfs_bui_item_ops); buip->bui_format.bui_nextents = XFS_BUI_MAX_FAST_EXTENTS; @@ -201,7 +201,7 @@ xfs_bud_item_release(
xfs_bui_release(budp->bud_buip); kmem_free(budp->bud_item.li_lv_shadow); - kmem_cache_free(xfs_bud_zone, budp); + kmem_cache_free(xfs_bud_cache, budp); }
static const struct xfs_item_ops xfs_bud_item_ops = { @@ -218,7 +218,7 @@ xfs_trans_get_bud( { struct xfs_bud_log_item *budp;
- budp = kmem_cache_zalloc(xfs_bud_zone, GFP_KERNEL | __GFP_NOFAIL); + budp = kmem_cache_zalloc(xfs_bud_cache, GFP_KERNEL | __GFP_NOFAIL); xfs_log_item_init(tp->t_mountp, &budp->bud_item, XFS_LI_BUD, &xfs_bud_item_ops); budp->bud_buip = buip; diff --git a/fs/xfs/xfs_bmap_item.h b/fs/xfs/xfs_bmap_item.h index 6af6b02d4b66..3fafd3881a0b 100644 --- a/fs/xfs/xfs_bmap_item.h +++ b/fs/xfs/xfs_bmap_item.h @@ -65,7 +65,7 @@ struct xfs_bud_log_item { struct xfs_bud_log_format bud_format; };
-extern struct kmem_cache *xfs_bui_zone; -extern struct kmem_cache *xfs_bud_zone; +extern struct kmem_cache *xfs_bui_cache; +extern struct kmem_cache *xfs_bud_cache;
#endif /* __XFS_BMAP_ITEM_H__ */ diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 798a8ce0cc76..af5dc20c5e27 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -21,7 +21,7 @@ #include "xfs_errortag.h" #include "xfs_error.h"
-static struct kmem_cache *xfs_buf_zone; +static struct kmem_cache *xfs_buf_cache;
#define xb_to_gfp(flags) \ ((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN) @@ -224,7 +224,7 @@ _xfs_buf_alloc( int i;
*bpp = NULL; - bp = kmem_cache_zalloc(xfs_buf_zone, GFP_NOFS | __GFP_NOFAIL); + bp = kmem_cache_zalloc(xfs_buf_cache, GFP_NOFS | __GFP_NOFAIL);
/* * We don't want certain flags to appear in b_flags unless they are @@ -251,7 +251,7 @@ _xfs_buf_alloc( */ error = xfs_buf_get_maps(bp, nmaps); if (error) { - kmem_cache_free(xfs_buf_zone, bp); + kmem_cache_free(xfs_buf_cache, bp); return error; }
@@ -345,7 +345,7 @@ xfs_buf_free( kmem_free(bp->b_addr); _xfs_buf_free_pages(bp); xfs_buf_free_maps(bp); - kmem_cache_free(xfs_buf_zone, bp); + kmem_cache_free(xfs_buf_cache, bp); }
/* @@ -1005,7 +1005,7 @@ xfs_buf_get_uncached( _xfs_buf_free_pages(bp); fail_free_buf: xfs_buf_free_maps(bp); - kmem_cache_free(xfs_buf_zone, bp); + kmem_cache_free(xfs_buf_cache, bp); fail: return error; } @@ -2344,12 +2344,12 @@ xfs_buf_delwri_pushbuf( int __init xfs_buf_init(void) { - xfs_buf_zone = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0, + xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0, SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL); - if (!xfs_buf_zone) + if (!xfs_buf_cache) goto out;
return 0; @@ -2361,7 +2361,7 @@ xfs_buf_init(void) void xfs_buf_terminate(void) { - kmem_cache_destroy(xfs_buf_zone); + kmem_cache_destroy(xfs_buf_cache); }
void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 8621b75a7a4f..7826028a5861 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -24,7 +24,7 @@ #include "xfs_log_priv.h"
-struct kmem_cache *xfs_buf_item_zone; +struct kmem_cache *xfs_buf_item_cache;
static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip) { @@ -786,7 +786,7 @@ xfs_buf_item_init( return 0; }
- bip = kmem_cache_zalloc(xfs_buf_item_zone, GFP_KERNEL | __GFP_NOFAIL); + bip = kmem_cache_zalloc(xfs_buf_item_cache, GFP_KERNEL | __GFP_NOFAIL); xfs_log_item_init(mp, &bip->bli_item, XFS_LI_BUF, &xfs_buf_item_ops); bip->bli_buf = bp;
@@ -807,7 +807,7 @@ xfs_buf_item_init( map_size = DIV_ROUND_UP(chunks, NBWORD);
if (map_size > XFS_BLF_DATAMAP_SIZE) { - kmem_cache_free(xfs_buf_item_zone, bip); + kmem_cache_free(xfs_buf_item_cache, bip); xfs_err(mp, "buffer item dirty bitmap (%u uints) too small to reflect %u bytes!", map_size, @@ -984,7 +984,7 @@ xfs_buf_item_free( { xfs_buf_item_free_format(bip); kmem_free(bip->bli_item.li_lv_shadow); - kmem_cache_free(xfs_buf_item_zone, bip); + kmem_cache_free(xfs_buf_item_cache, bip); }
/* diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index 7b3db6f5a9f0..30f55e133696 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -78,6 +78,6 @@ static inline void xfs_buf_dquot_io_fail(struct xfs_buf *bp) void xfs_buf_iodone(struct xfs_buf *); bool xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
-extern struct kmem_cache *xfs_buf_item_zone; +extern struct kmem_cache *xfs_buf_item_cache;
#endif /* __XFS_BUF_ITEM_H__ */ diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index a829a79231ec..da8ab127d161 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -38,8 +38,8 @@ * otherwise by the lowest id first, see xfs_dqlock2. */
-struct kmem_cache *xfs_qm_dqtrxzone; -static struct kmem_cache *xfs_qm_dqzone; +struct kmem_cache *xfs_dqtrx_cache; +static struct kmem_cache *xfs_dquot_cache;
static struct lock_class_key xfs_dquot_group_class; static struct lock_class_key xfs_dquot_project_class; @@ -57,7 +57,7 @@ xfs_qm_dqdestroy( mutex_destroy(&dqp->q_qlock);
XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot); - kmem_cache_free(xfs_qm_dqzone, dqp); + kmem_cache_free(xfs_dquot_cache, dqp); }
/* @@ -458,7 +458,7 @@ xfs_dquot_alloc( { struct xfs_dquot *dqp;
- dqp = kmem_cache_zalloc(xfs_qm_dqzone, GFP_KERNEL | __GFP_NOFAIL); + dqp = kmem_cache_zalloc(xfs_dquot_cache, GFP_KERNEL | __GFP_NOFAIL);
dqp->q_type = type; dqp->q_id = id; @@ -1365,22 +1365,22 @@ xfs_dqlock2( int __init xfs_qm_init(void) { - xfs_qm_dqzone = kmem_cache_create("xfs_dquot", + xfs_dquot_cache = kmem_cache_create("xfs_dquot", sizeof(struct xfs_dquot), 0, 0, NULL); - if (!xfs_qm_dqzone) + if (!xfs_dquot_cache) goto out;
- xfs_qm_dqtrxzone = kmem_cache_create("xfs_dqtrx", + xfs_dqtrx_cache = kmem_cache_create("xfs_dqtrx", sizeof(struct xfs_dquot_acct), 0, 0, NULL); - if (!xfs_qm_dqtrxzone) - goto out_free_dqzone; + if (!xfs_dqtrx_cache) + goto out_free_dquot_cache;
return 0;
-out_free_dqzone: - kmem_cache_destroy(xfs_qm_dqzone); +out_free_dquot_cache: + kmem_cache_destroy(xfs_dquot_cache); out: return -ENOMEM; } @@ -1388,8 +1388,8 @@ xfs_qm_init(void) void xfs_qm_exit(void) { - kmem_cache_destroy(xfs_qm_dqtrxzone); - kmem_cache_destroy(xfs_qm_dqzone); + kmem_cache_destroy(xfs_dqtrx_cache); + kmem_cache_destroy(xfs_dquot_cache); }
/* diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index c2430ac399f3..4bcdc363ec12 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -25,8 +25,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-struct kmem_cache *xfs_efi_zone; -struct kmem_cache *xfs_efd_zone; +struct kmem_cache *xfs_efi_cache; +struct kmem_cache *xfs_efd_cache;
static const struct xfs_item_ops xfs_efi_item_ops;
@@ -43,7 +43,7 @@ xfs_efi_item_free( if (efip->efi_format.efi_nextents > XFS_EFI_MAX_FAST_EXTENTS) kmem_free(efip); else - kmem_cache_free(xfs_efi_zone, efip); + kmem_cache_free(xfs_efi_cache, efip); }
/* @@ -161,7 +161,7 @@ xfs_efi_init( (nextents * sizeof(xfs_extent_t))); efip = kmem_zalloc(size, 0); } else { - efip = kmem_cache_zalloc(xfs_efi_zone, + efip = kmem_cache_zalloc(xfs_efi_cache, GFP_KERNEL | __GFP_NOFAIL); }
@@ -246,7 +246,7 @@ xfs_efd_item_free(struct xfs_efd_log_item *efdp) if (efdp->efd_format.efd_nextents > XFS_EFD_MAX_FAST_EXTENTS) kmem_free(efdp); else - kmem_cache_free(xfs_efd_zone, efdp); + kmem_cache_free(xfs_efd_cache, efdp); }
/* @@ -338,7 +338,7 @@ xfs_trans_get_efd( nextents * sizeof(struct xfs_extent), 0); } else { - efdp = kmem_cache_zalloc(xfs_efd_zone, + efdp = kmem_cache_zalloc(xfs_efd_cache, GFP_KERNEL | __GFP_NOFAIL); }
@@ -487,7 +487,7 @@ xfs_extent_free_finish_item( free->xefi_startblock, free->xefi_blockcount, &free->xefi_oinfo, free->xefi_skip_discard); - kmem_cache_free(xfs_bmap_free_item_zone, free); + kmem_cache_free(xfs_bmap_free_item_cache, free); return error; }
@@ -507,7 +507,7 @@ xfs_extent_free_cancel_item( struct xfs_extent_free_item *free;
free = container_of(item, struct xfs_extent_free_item, xefi_list); - kmem_cache_free(xfs_bmap_free_item_zone, free); + kmem_cache_free(xfs_bmap_free_item_cache, free); }
const struct xfs_defer_op_type xfs_extent_free_defer_type = { @@ -569,7 +569,7 @@ xfs_agfl_free_finish_item( extp->ext_len = free->xefi_blockcount; efdp->efd_next_extent++;
- kmem_cache_free(xfs_bmap_free_item_zone, free); + kmem_cache_free(xfs_bmap_free_item_cache, free); return error; }
diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h index e8644945290e..186d0f2137f1 100644 --- a/fs/xfs/xfs_extfree_item.h +++ b/fs/xfs/xfs_extfree_item.h @@ -69,7 +69,7 @@ struct xfs_efd_log_item { */ #define XFS_EFD_MAX_FAST_EXTENTS 16
-extern struct kmem_cache *xfs_efi_zone; -extern struct kmem_cache *xfs_efd_zone; +extern struct kmem_cache *xfs_efi_cache; +extern struct kmem_cache *xfs_efd_cache;
#endif /* __XFS_EXTFREE_ITEM_H__ */ diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 6847bbb5a735..bdf55ed88680 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -87,10 +87,10 @@ xfs_inode_alloc( * XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL * and return NULL here on ENOMEM. */ - ip = kmem_cache_alloc(xfs_inode_zone, GFP_KERNEL | __GFP_NOFAIL); + ip = kmem_cache_alloc(xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL);
if (inode_init_always(mp->m_super, VFS_I(ip))) { - kmem_cache_free(xfs_inode_zone, ip); + kmem_cache_free(xfs_inode_cache, ip); return NULL; }
@@ -141,7 +141,7 @@ xfs_inode_free_callback(
if (ip->i_cowfp) { xfs_idestroy_fork(ip->i_cowfp); - kmem_cache_free(xfs_ifork_zone, ip->i_cowfp); + kmem_cache_free(xfs_ifork_cache, ip->i_cowfp); } if (ip->i_itemp) { ASSERT(!test_bit(XFS_LI_IN_AIL, @@ -150,7 +150,7 @@ xfs_inode_free_callback( ip->i_itemp = NULL; }
- kmem_cache_free(xfs_inode_zone, ip); + kmem_cache_free(xfs_inode_cache, ip); }
static void diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c index 288c3af1b8e9..cd90b4f31495 100644 --- a/fs/xfs/xfs_icreate_item.c +++ b/fs/xfs/xfs_icreate_item.c @@ -20,7 +20,7 @@ #include "xfs_ialloc.h" #include "xfs_trace.h"
-struct kmem_cache *xfs_icreate_zone; /* inode create item zone */ +struct kmem_cache *xfs_icreate_cache; /* inode create item */
static inline struct xfs_icreate_item *ICR_ITEM(struct xfs_log_item *lip) { @@ -64,7 +64,7 @@ xfs_icreate_item_release( struct xfs_log_item *lip) { kmem_free(ICR_ITEM(lip)->ic_item.li_lv_shadow); - kmem_cache_free(xfs_icreate_zone, ICR_ITEM(lip)); + kmem_cache_free(xfs_icreate_cache, ICR_ITEM(lip)); }
static const struct xfs_item_ops xfs_icreate_item_ops = { @@ -98,7 +98,7 @@ xfs_icreate_log( { struct xfs_icreate_item *icp;
- icp = kmem_cache_zalloc(xfs_icreate_zone, GFP_KERNEL | __GFP_NOFAIL); + icp = kmem_cache_zalloc(xfs_icreate_cache, GFP_KERNEL | __GFP_NOFAIL);
xfs_log_item_init(tp->t_mountp, &icp->ic_item, XFS_LI_ICREATE, &xfs_icreate_item_ops); diff --git a/fs/xfs/xfs_icreate_item.h b/fs/xfs/xfs_icreate_item.h index 944427b33645..64992823108a 100644 --- a/fs/xfs/xfs_icreate_item.h +++ b/fs/xfs/xfs_icreate_item.h @@ -12,7 +12,7 @@ struct xfs_icreate_item { struct xfs_icreate_log ic_format; };
-extern struct kmem_cache *xfs_icreate_zone; /* inode create item zone */ +extern struct kmem_cache *xfs_icreate_cache; /* inode create item */
void xfs_icreate_log(struct xfs_trans *tp, xfs_agnumber_t agno, xfs_agblock_t agbno, unsigned int count, diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 9f7cafc6f737..268bbc2d978b 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -37,7 +37,7 @@ #include "xfs_reflink.h" #include "xfs_log_priv.h"
-struct kmem_cache *xfs_inode_zone; +struct kmem_cache *xfs_inode_cache;
/* * Used in xfs_itruncate_extents(). This is the maximum number of extents diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index e320f1e0290f..b552daae323f 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -546,7 +546,7 @@ static inline void xfs_setup_existing_inode(struct xfs_inode *ip)
void xfs_irele(struct xfs_inode *ip);
-extern struct kmem_cache *xfs_inode_zone; +extern struct kmem_cache *xfs_inode_cache;
/* The default CoW extent size hint. */ #define XFS_DEFAULT_COWEXTSZ_HINT 32 diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index c3ffb9536393..9f496cdcde3b 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -22,7 +22,7 @@
#include <linux/iversion.h>
-struct kmem_cache *xfs_ili_zone; /* inode log item zone */ +struct kmem_cache *xfs_ili_cache; /* inode log item */
static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip) { @@ -661,7 +661,7 @@ xfs_inode_item_init( struct xfs_inode_log_item *iip;
ASSERT(ip->i_itemp == NULL); - iip = ip->i_itemp = kmem_cache_zalloc(xfs_ili_zone, + iip = ip->i_itemp = kmem_cache_zalloc(xfs_ili_cache, GFP_KERNEL | __GFP_NOFAIL);
iip->ili_inode = ip; @@ -683,7 +683,7 @@ xfs_inode_item_destroy(
ip->i_itemp = NULL; kmem_free(iip->ili_item.li_lv_shadow); - kmem_cache_free(xfs_ili_zone, iip); + kmem_cache_free(xfs_ili_cache, iip); }
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index 35f2be490287..bbd836a44ff0 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -48,6 +48,6 @@ extern void xfs_iflush_shutdown_abort(struct xfs_inode *); extern int xfs_inode_item_format_convert(xfs_log_iovec_t *, struct xfs_inode_log_format *);
-extern struct kmem_cache *xfs_ili_zone; +extern struct kmem_cache *xfs_ili_cache;
#endif /* __XFS_INODE_ITEM_H__ */ diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 8cd7bf27f965..c9bfafd0b9f5 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -21,7 +21,7 @@ #include "xfs_sb.h" #include "xfs_health.h"
-struct kmem_cache *xfs_log_ticket_zone; +struct kmem_cache *xfs_log_ticket_cache;
/* Local miscellaneous function prototypes */ STATIC struct xlog * @@ -3397,7 +3397,7 @@ xfs_log_ticket_put( { ASSERT(atomic_read(&ticket->t_ref) > 0); if (atomic_dec_and_test(&ticket->t_ref)) - kmem_cache_free(xfs_log_ticket_zone, ticket); + kmem_cache_free(xfs_log_ticket_cache, ticket); }
xlog_ticket_t * @@ -3521,7 +3521,7 @@ xlog_ticket_alloc( struct xlog_ticket *tic; int unit_res;
- tic = kmem_cache_zalloc(xfs_log_ticket_zone, GFP_NOFS | __GFP_NOFAIL); + tic = kmem_cache_zalloc(xfs_log_ticket_cache, GFP_NOFS | __GFP_NOFAIL);
unit_res = xlog_calc_unit_res(log, unit_bytes);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index fe399acc0463..ee004a779d36 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -502,7 +502,7 @@ xlog_recover_cancel(struct xlog *); extern __le32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, char *dp, int size);
-extern struct kmem_cache *xfs_log_ticket_zone; +extern struct kmem_cache *xfs_log_ticket_cache; struct xlog_ticket * xlog_ticket_alloc( struct xlog *log, diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c index 34c3b16f834f..f85e3b07ab44 100644 --- a/fs/xfs/xfs_mru_cache.c +++ b/fs/xfs/xfs_mru_cache.c @@ -219,7 +219,7 @@ _xfs_mru_cache_list_insert( * When destroying or reaping, all the elements that were migrated to the reap * list need to be deleted. For each element this involves removing it from the * data store, removing it from the reap list, calling the client's free - * function and deleting the element from the element zone. + * function and deleting the element from the element cache. * * We get called holding the mru->lock, which we drop and then reacquire. * Sparse need special help with this to tell it we know what we are doing. diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index 5e8b70526538..5bb12717ea28 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -11,7 +11,7 @@
struct xfs_inode;
-extern struct kmem_cache *xfs_qm_dqtrxzone; +extern struct kmem_cache *xfs_dqtrx_cache;
/* * Number of bmaps that we ask from bmapi when doing a quotacheck. diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index ccfcc91d616e..09efa723ddfa 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -21,8 +21,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-struct kmem_cache *xfs_cui_zone; -struct kmem_cache *xfs_cud_zone; +struct kmem_cache *xfs_cui_cache; +struct kmem_cache *xfs_cud_cache;
static const struct xfs_item_ops xfs_cui_item_ops;
@@ -39,7 +39,7 @@ xfs_cui_item_free( if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS) kmem_free(cuip); else - kmem_cache_free(xfs_cui_zone, cuip); + kmem_cache_free(xfs_cui_cache, cuip); }
/* @@ -144,7 +144,7 @@ xfs_cui_init( cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents), 0); else - cuip = kmem_cache_zalloc(xfs_cui_zone, + cuip = kmem_cache_zalloc(xfs_cui_cache, GFP_KERNEL | __GFP_NOFAIL);
xfs_log_item_init(mp, &cuip->cui_item, XFS_LI_CUI, &xfs_cui_item_ops); @@ -206,7 +206,7 @@ xfs_cud_item_release(
xfs_cui_release(cudp->cud_cuip); kmem_free(cudp->cud_item.li_lv_shadow); - kmem_cache_free(xfs_cud_zone, cudp); + kmem_cache_free(xfs_cud_cache, cudp); }
static const struct xfs_item_ops xfs_cud_item_ops = { @@ -223,7 +223,7 @@ xfs_trans_get_cud( { struct xfs_cud_log_item *cudp;
- cudp = kmem_cache_zalloc(xfs_cud_zone, GFP_KERNEL | __GFP_NOFAIL); + cudp = kmem_cache_zalloc(xfs_cud_cache, GFP_KERNEL | __GFP_NOFAIL); xfs_log_item_init(tp->t_mountp, &cudp->cud_item, XFS_LI_CUD, &xfs_cud_item_ops); cudp->cud_cuip = cuip; diff --git a/fs/xfs/xfs_refcount_item.h b/fs/xfs/xfs_refcount_item.h index 22c69c5a8394..eb0ab13682d0 100644 --- a/fs/xfs/xfs_refcount_item.h +++ b/fs/xfs/xfs_refcount_item.h @@ -68,7 +68,7 @@ struct xfs_cud_log_item { struct xfs_cud_log_format cud_format; };
-extern struct kmem_cache *xfs_cui_zone; -extern struct kmem_cache *xfs_cud_zone; +extern struct kmem_cache *xfs_cui_cache; +extern struct kmem_cache *xfs_cud_cache;
#endif /* __XFS_REFCOUNT_ITEM_H__ */ diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 597e55668cb5..52e3f983821a 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -21,8 +21,8 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h"
-struct kmem_cache *xfs_rui_zone; -struct kmem_cache *xfs_rud_zone; +struct kmem_cache *xfs_rui_cache; +struct kmem_cache *xfs_rud_cache;
static const struct xfs_item_ops xfs_rui_item_ops;
@@ -39,7 +39,7 @@ xfs_rui_item_free( if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS) kmem_free(ruip); else - kmem_cache_free(xfs_rui_zone, ruip); + kmem_cache_free(xfs_rui_cache, ruip); }
/* @@ -142,7 +142,7 @@ xfs_rui_init( if (nextents > XFS_RUI_MAX_FAST_EXTENTS) ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0); else - ruip = kmem_cache_zalloc(xfs_rui_zone, + ruip = kmem_cache_zalloc(xfs_rui_cache, GFP_KERNEL | __GFP_NOFAIL);
xfs_log_item_init(mp, &ruip->rui_item, XFS_LI_RUI, &xfs_rui_item_ops); @@ -204,7 +204,7 @@ xfs_rud_item_release(
xfs_rui_release(rudp->rud_ruip); kmem_free(rudp->rud_item.li_lv_shadow); - kmem_cache_free(xfs_rud_zone, rudp); + kmem_cache_free(xfs_rud_cache, rudp); }
static const struct xfs_item_ops xfs_rud_item_ops = { @@ -221,7 +221,7 @@ xfs_trans_get_rud( { struct xfs_rud_log_item *rudp;
- rudp = kmem_cache_zalloc(xfs_rud_zone, GFP_KERNEL | __GFP_NOFAIL); + rudp = kmem_cache_zalloc(xfs_rud_cache, GFP_KERNEL | __GFP_NOFAIL); xfs_log_item_init(tp->t_mountp, &rudp->rud_item, XFS_LI_RUD, &xfs_rud_item_ops); rudp->rud_ruip = ruip; diff --git a/fs/xfs/xfs_rmap_item.h b/fs/xfs/xfs_rmap_item.h index b062b983a82f..802e5119eaca 100644 --- a/fs/xfs/xfs_rmap_item.h +++ b/fs/xfs/xfs_rmap_item.h @@ -68,7 +68,7 @@ struct xfs_rud_log_item { struct xfs_rud_log_format rud_format; };
-extern struct kmem_cache *xfs_rui_zone; -extern struct kmem_cache *xfs_rud_zone; +extern struct kmem_cache *xfs_rui_cache; +extern struct kmem_cache *xfs_rud_cache;
#endif /* __XFS_RMAP_ITEM_H__ */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e83af9027131..935cd48df9a2 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1989,196 +1989,196 @@ static struct file_system_type xfs_fs_type = { MODULE_ALIAS_FS("xfs");
STATIC int __init -xfs_init_zones(void) +xfs_init_caches(void) { - xfs_log_ticket_zone = kmem_cache_create("xfs_log_ticket", + xfs_log_ticket_cache = kmem_cache_create("xfs_log_ticket", sizeof(struct xlog_ticket), 0, 0, NULL); - if (!xfs_log_ticket_zone) + if (!xfs_log_ticket_cache) goto out;
- xfs_bmap_free_item_zone = kmem_cache_create("xfs_bmap_free_item", + xfs_bmap_free_item_cache = kmem_cache_create("xfs_bmap_free_item", sizeof(struct xfs_extent_free_item), 0, 0, NULL); - if (!xfs_bmap_free_item_zone) - goto out_destroy_log_ticket_zone; + if (!xfs_bmap_free_item_cache) + goto out_destroy_log_ticket_cache;
- xfs_btree_cur_zone = kmem_cache_create("xfs_btree_cur", + xfs_btree_cur_cache = kmem_cache_create("xfs_btree_cur", sizeof(struct xfs_btree_cur), 0, 0, NULL); - if (!xfs_btree_cur_zone) - goto out_destroy_bmap_free_item_zone; + if (!xfs_btree_cur_cache) + goto out_destroy_bmap_free_item_cache;
- xfs_da_state_zone = kmem_cache_create("xfs_da_state", + xfs_da_state_cache = kmem_cache_create("xfs_da_state", sizeof(struct xfs_da_state), 0, 0, NULL); - if (!xfs_da_state_zone) - goto out_destroy_btree_cur_zone; + if (!xfs_da_state_cache) + goto out_destroy_btree_cur_cache;
- xfs_ifork_zone = kmem_cache_create("xfs_ifork", + xfs_ifork_cache = kmem_cache_create("xfs_ifork", sizeof(struct xfs_ifork), 0, 0, NULL); - if (!xfs_ifork_zone) - goto out_destroy_da_state_zone; + if (!xfs_ifork_cache) + goto out_destroy_da_state_cache;
- xfs_trans_zone = kmem_cache_create("xfs_trans", + xfs_trans_cache = kmem_cache_create("xfs_trans", sizeof(struct xfs_trans), 0, 0, NULL); - if (!xfs_trans_zone) - goto out_destroy_ifork_zone; + if (!xfs_trans_cache) + goto out_destroy_ifork_cache;
/* - * The size of the zone allocated buf log item is the maximum + * The size of the cache-allocated buf log item is the maximum * size possible under XFS. This wastes a little bit of memory, * but it is much faster. */ - xfs_buf_item_zone = kmem_cache_create("xfs_buf_item", + xfs_buf_item_cache = kmem_cache_create("xfs_buf_item", sizeof(struct xfs_buf_log_item), 0, 0, NULL); - if (!xfs_buf_item_zone) - goto out_destroy_trans_zone; + if (!xfs_buf_item_cache) + goto out_destroy_trans_cache;
- xfs_efd_zone = kmem_cache_create("xfs_efd_item", + xfs_efd_cache = kmem_cache_create("xfs_efd_item", (sizeof(struct xfs_efd_log_item) + XFS_EFD_MAX_FAST_EXTENTS * sizeof(struct xfs_extent)), 0, 0, NULL); - if (!xfs_efd_zone) - goto out_destroy_buf_item_zone; + if (!xfs_efd_cache) + goto out_destroy_buf_item_cache;
- xfs_efi_zone = kmem_cache_create("xfs_efi_item", + xfs_efi_cache = kmem_cache_create("xfs_efi_item", (sizeof(struct xfs_efi_log_item) + XFS_EFI_MAX_FAST_EXTENTS * sizeof(struct xfs_extent)), 0, 0, NULL); - if (!xfs_efi_zone) - goto out_destroy_efd_zone; + if (!xfs_efi_cache) + goto out_destroy_efd_cache;
- xfs_inode_zone = kmem_cache_create("xfs_inode", + xfs_inode_cache = kmem_cache_create("xfs_inode", sizeof(struct xfs_inode), 0, (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT), xfs_fs_inode_init_once); - if (!xfs_inode_zone) - goto out_destroy_efi_zone; + if (!xfs_inode_cache) + goto out_destroy_efi_cache;
- xfs_ili_zone = kmem_cache_create("xfs_ili", + xfs_ili_cache = kmem_cache_create("xfs_ili", sizeof(struct xfs_inode_log_item), 0, SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL); - if (!xfs_ili_zone) - goto out_destroy_inode_zone; + if (!xfs_ili_cache) + goto out_destroy_inode_cache;
- xfs_icreate_zone = kmem_cache_create("xfs_icr", + xfs_icreate_cache = kmem_cache_create("xfs_icr", sizeof(struct xfs_icreate_item), 0, 0, NULL); - if (!xfs_icreate_zone) - goto out_destroy_ili_zone; + if (!xfs_icreate_cache) + goto out_destroy_ili_cache;
- xfs_rud_zone = kmem_cache_create("xfs_rud_item", + xfs_rud_cache = kmem_cache_create("xfs_rud_item", sizeof(struct xfs_rud_log_item), 0, 0, NULL); - if (!xfs_rud_zone) - goto out_destroy_icreate_zone; + if (!xfs_rud_cache) + goto out_destroy_icreate_cache;
- xfs_rui_zone = kmem_cache_create("xfs_rui_item", + xfs_rui_cache = kmem_cache_create("xfs_rui_item", xfs_rui_log_item_sizeof(XFS_RUI_MAX_FAST_EXTENTS), 0, 0, NULL); - if (!xfs_rui_zone) - goto out_destroy_rud_zone; + if (!xfs_rui_cache) + goto out_destroy_rud_cache;
- xfs_cud_zone = kmem_cache_create("xfs_cud_item", + xfs_cud_cache = kmem_cache_create("xfs_cud_item", sizeof(struct xfs_cud_log_item), 0, 0, NULL); - if (!xfs_cud_zone) - goto out_destroy_rui_zone; + if (!xfs_cud_cache) + goto out_destroy_rui_cache;
- xfs_cui_zone = kmem_cache_create("xfs_cui_item", + xfs_cui_cache = kmem_cache_create("xfs_cui_item", xfs_cui_log_item_sizeof(XFS_CUI_MAX_FAST_EXTENTS), 0, 0, NULL); - if (!xfs_cui_zone) - goto out_destroy_cud_zone; + if (!xfs_cui_cache) + goto out_destroy_cud_cache;
- xfs_bud_zone = kmem_cache_create("xfs_bud_item", + xfs_bud_cache = kmem_cache_create("xfs_bud_item", sizeof(struct xfs_bud_log_item), 0, 0, NULL); - if (!xfs_bud_zone) - goto out_destroy_cui_zone; + if (!xfs_bud_cache) + goto out_destroy_cui_cache;
- xfs_bui_zone = kmem_cache_create("xfs_bui_item", + xfs_bui_cache = kmem_cache_create("xfs_bui_item", xfs_bui_log_item_sizeof(XFS_BUI_MAX_FAST_EXTENTS), 0, 0, NULL); - if (!xfs_bui_zone) - goto out_destroy_bud_zone; + if (!xfs_bui_cache) + goto out_destroy_bud_cache;
return 0;
- out_destroy_bud_zone: - kmem_cache_destroy(xfs_bud_zone); - out_destroy_cui_zone: - kmem_cache_destroy(xfs_cui_zone); - out_destroy_cud_zone: - kmem_cache_destroy(xfs_cud_zone); - out_destroy_rui_zone: - kmem_cache_destroy(xfs_rui_zone); - out_destroy_rud_zone: - kmem_cache_destroy(xfs_rud_zone); - out_destroy_icreate_zone: - kmem_cache_destroy(xfs_icreate_zone); - out_destroy_ili_zone: - kmem_cache_destroy(xfs_ili_zone); - out_destroy_inode_zone: - kmem_cache_destroy(xfs_inode_zone); - out_destroy_efi_zone: - kmem_cache_destroy(xfs_efi_zone); - out_destroy_efd_zone: - kmem_cache_destroy(xfs_efd_zone); - out_destroy_buf_item_zone: - kmem_cache_destroy(xfs_buf_item_zone); - out_destroy_trans_zone: - kmem_cache_destroy(xfs_trans_zone); - out_destroy_ifork_zone: - kmem_cache_destroy(xfs_ifork_zone); - out_destroy_da_state_zone: - kmem_cache_destroy(xfs_da_state_zone); - out_destroy_btree_cur_zone: - kmem_cache_destroy(xfs_btree_cur_zone); - out_destroy_bmap_free_item_zone: - kmem_cache_destroy(xfs_bmap_free_item_zone); - out_destroy_log_ticket_zone: - kmem_cache_destroy(xfs_log_ticket_zone); + out_destroy_bud_cache: + kmem_cache_destroy(xfs_bud_cache); + out_destroy_cui_cache: + kmem_cache_destroy(xfs_cui_cache); + out_destroy_cud_cache: + kmem_cache_destroy(xfs_cud_cache); + out_destroy_rui_cache: + kmem_cache_destroy(xfs_rui_cache); + out_destroy_rud_cache: + kmem_cache_destroy(xfs_rud_cache); + out_destroy_icreate_cache: + kmem_cache_destroy(xfs_icreate_cache); + out_destroy_ili_cache: + kmem_cache_destroy(xfs_ili_cache); + out_destroy_inode_cache: + kmem_cache_destroy(xfs_inode_cache); + out_destroy_efi_cache: + kmem_cache_destroy(xfs_efi_cache); + out_destroy_efd_cache: + kmem_cache_destroy(xfs_efd_cache); + out_destroy_buf_item_cache: + kmem_cache_destroy(xfs_buf_item_cache); + out_destroy_trans_cache: + kmem_cache_destroy(xfs_trans_cache); + out_destroy_ifork_cache: + kmem_cache_destroy(xfs_ifork_cache); + out_destroy_da_state_cache: + kmem_cache_destroy(xfs_da_state_cache); + out_destroy_btree_cur_cache: + kmem_cache_destroy(xfs_btree_cur_cache); + out_destroy_bmap_free_item_cache: + kmem_cache_destroy(xfs_bmap_free_item_cache); + out_destroy_log_ticket_cache: + kmem_cache_destroy(xfs_log_ticket_cache); out: return -ENOMEM; }
STATIC void -xfs_destroy_zones(void) +xfs_destroy_caches(void) { /* * Make sure all delayed rcu free are flushed before we * destroy caches. */ rcu_barrier(); - kmem_cache_destroy(xfs_bui_zone); - kmem_cache_destroy(xfs_bud_zone); - kmem_cache_destroy(xfs_cui_zone); - kmem_cache_destroy(xfs_cud_zone); - kmem_cache_destroy(xfs_rui_zone); - kmem_cache_destroy(xfs_rud_zone); - kmem_cache_destroy(xfs_icreate_zone); - kmem_cache_destroy(xfs_ili_zone); - kmem_cache_destroy(xfs_inode_zone); - kmem_cache_destroy(xfs_efi_zone); - kmem_cache_destroy(xfs_efd_zone); - kmem_cache_destroy(xfs_buf_item_zone); - kmem_cache_destroy(xfs_trans_zone); - kmem_cache_destroy(xfs_ifork_zone); - kmem_cache_destroy(xfs_da_state_zone); - kmem_cache_destroy(xfs_btree_cur_zone); - kmem_cache_destroy(xfs_bmap_free_item_zone); - kmem_cache_destroy(xfs_log_ticket_zone); + kmem_cache_destroy(xfs_bui_cache); + kmem_cache_destroy(xfs_bud_cache); + kmem_cache_destroy(xfs_cui_cache); + kmem_cache_destroy(xfs_cud_cache); + kmem_cache_destroy(xfs_rui_cache); + kmem_cache_destroy(xfs_rud_cache); + kmem_cache_destroy(xfs_icreate_cache); + kmem_cache_destroy(xfs_ili_cache); + kmem_cache_destroy(xfs_inode_cache); + kmem_cache_destroy(xfs_efi_cache); + kmem_cache_destroy(xfs_efd_cache); + kmem_cache_destroy(xfs_buf_item_cache); + kmem_cache_destroy(xfs_trans_cache); + kmem_cache_destroy(xfs_ifork_cache); + kmem_cache_destroy(xfs_da_state_cache); + kmem_cache_destroy(xfs_btree_cur_cache); + kmem_cache_destroy(xfs_bmap_free_item_cache); + kmem_cache_destroy(xfs_log_ticket_cache); }
STATIC int __init @@ -2271,13 +2271,13 @@ init_xfs_fs(void) if (error) goto out;
- error = xfs_init_zones(); + error = xfs_init_caches(); if (error) goto out_destroy_hp;
error = xfs_init_workqueues(); if (error) - goto out_destroy_zones; + goto out_destroy_caches;
error = xfs_mru_cache_init(); if (error) @@ -2352,8 +2352,8 @@ init_xfs_fs(void) xfs_mru_cache_uninit(); out_destroy_wq: xfs_destroy_workqueues(); - out_destroy_zones: - xfs_destroy_zones(); + out_destroy_caches: + xfs_destroy_caches(); out_destroy_hp: xfs_cpu_hotplug_destroy(); out: @@ -2376,7 +2376,7 @@ exit_xfs_fs(void) xfs_buf_terminate(); xfs_mru_cache_uninit(); xfs_destroy_workqueues(); - xfs_destroy_zones(); + xfs_destroy_caches(); xfs_uuid_table_free(); xfs_cpu_hotplug_destroy(); } diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index fa5132bf9180..36a894a2128a 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -26,7 +26,7 @@ #include "xfs_icache.h" #include "xfs_buf_item.h"
-struct kmem_cache *xfs_trans_zone; +struct kmem_cache *xfs_trans_cache;
#if defined(CONFIG_TRACEPOINTS) static void @@ -77,7 +77,7 @@ xfs_trans_free( if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT)) sb_end_intwrite(tp->t_mountp->m_super); xfs_trans_free_dqinfo(tp); - kmem_cache_free(xfs_trans_zone, tp); + kmem_cache_free(xfs_trans_cache, tp); }
/* @@ -96,7 +96,7 @@ xfs_trans_dup(
trace_xfs_trans_dup(tp, _RET_IP_);
- ntp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL); + ntp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL);
/* * Initialize the new transaction structure. @@ -264,7 +264,7 @@ xfs_trans_alloc( * by doing GFP_KERNEL allocations inside sb_start_intwrite(). */ retry: - tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL); + tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL); if (!(flags & XFS_TRANS_NO_WRITECOUNT)) sb_start_intwrite(mp->m_super); xfs_trans_set_context(tp); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index a9ea901d7449..731af3f04c83 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -244,7 +244,7 @@ void xfs_trans_buf_set_type(struct xfs_trans *, struct xfs_buf *, void xfs_trans_buf_copy_type(struct xfs_buf *dst_bp, struct xfs_buf *src_bp);
-extern struct kmem_cache *xfs_trans_zone; +extern struct kmem_cache *xfs_trans_cache;
static inline struct xfs_log_item * xfs_trans_item_relog( diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index d91a8ecf01ff..e29b77a71664 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -866,7 +866,7 @@ STATIC void xfs_trans_alloc_dqinfo( xfs_trans_t *tp) { - tp->t_dqinfo = kmem_cache_zalloc(xfs_qm_dqtrxzone, + tp->t_dqinfo = kmem_cache_zalloc(xfs_dqtrx_cache, GFP_KERNEL | __GFP_NOFAIL); }
@@ -876,6 +876,6 @@ xfs_trans_free_dqinfo( { if (!tp->t_dqinfo) return; - kmem_cache_free(xfs_qm_dqtrxzone, tp->t_dqinfo); + kmem_cache_free(xfs_dqtrx_cache, tp->t_dqinfo); tp->t_dqinfo = NULL; }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit 9e253954acf53227f33d307f5ac5ff94c1ca5880 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Rearrange these structs to reduce the amount of unused padding bytes. This saves eight bytes for each of the three structs changed here, which means they're now all (rmap/bmap are 64 bytes, refc is 32 bytes) even powers of two.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.h | 2 +- fs/xfs/libxfs/xfs_refcount.h | 2 +- fs/xfs/libxfs/xfs_rmap.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 05c5db07d9fa..aa6f45a7ee69 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -258,8 +258,8 @@ enum xfs_bmap_intent_type { struct xfs_bmap_intent { struct list_head bi_list; enum xfs_bmap_intent_type bi_type; - struct xfs_inode *bi_owner; int bi_whichfork; + struct xfs_inode *bi_owner; struct xfs_bmbt_irec bi_bmap; };
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 209795539c8d..d43e276b6883 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -25,8 +25,8 @@ enum xfs_refcount_intent_type { struct xfs_refcount_intent { struct list_head ri_list; enum xfs_refcount_intent_type ri_type; - xfs_fsblock_t ri_startblock; xfs_extlen_t ri_blockcount; + xfs_fsblock_t ri_startblock; };
void xfs_refcount_increase_extent(struct xfs_trans *tp, diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index abe633403fd1..8a4f106c9ff4 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -156,8 +156,8 @@ enum xfs_rmap_intent_type { struct xfs_rmap_intent { struct list_head ri_list; enum xfs_rmap_intent_type ri_type; - uint64_t ri_owner; int ri_whichfork; + uint64_t ri_owner; struct xfs_bmbt_irec ri_bmap; };
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit f3c799c22c661e181c71a0d9914fc923023f65fb category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Create slab caches for the high-level structures that coordinate deferred intent items, since they're used fairly heavily.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_defer.c fs/xfs/xfs_super.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 21 ++++++++++-- fs/xfs/libxfs/xfs_bmap.h | 5 +++ fs/xfs/libxfs/xfs_defer.c | 65 +++++++++++++++++++++++++++++++++--- fs/xfs/libxfs/xfs_defer.h | 3 ++ fs/xfs/libxfs/xfs_refcount.c | 23 +++++++++++-- fs/xfs/libxfs/xfs_refcount.h | 5 +++ fs/xfs/libxfs/xfs_rmap.c | 21 +++++++++++- fs/xfs/libxfs/xfs_rmap.h | 5 +++ fs/xfs/xfs_bmap_item.c | 4 +-- fs/xfs/xfs_refcount_item.c | 4 +-- fs/xfs/xfs_rmap_item.c | 4 +-- fs/xfs/xfs_super.c | 12 ++++++- 12 files changed, 156 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 180c212c862b..19761b909128 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -36,7 +36,7 @@ #include "xfs_icache.h" #include "xfs_iomap.h"
- +struct kmem_cache *xfs_bmap_intent_cache; struct kmem_cache *xfs_bmap_free_item_cache;
/* @@ -6137,7 +6137,7 @@ __xfs_bmap_add( bmap->br_blockcount, bmap->br_state);
- bi = kmem_alloc(sizeof(struct xfs_bmap_intent), KM_NOFS); + bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&bi->bi_list); bi->bi_type = type; bi->bi_owner = ip; @@ -6259,3 +6259,20 @@ xfs_bmap_validate_extent( return __this_address; return NULL; } + +int __init +xfs_bmap_intent_init_cache(void) +{ + xfs_bmap_intent_cache = kmem_cache_create("xfs_bmap_intent", + sizeof(struct xfs_bmap_intent), + 0, 0, NULL); + + return xfs_bmap_intent_cache != NULL ? 0 : -ENOMEM; +} + +void +xfs_bmap_intent_destroy_cache(void) +{ + kmem_cache_destroy(xfs_bmap_intent_cache); + xfs_bmap_intent_cache = NULL; +} diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index aa6f45a7ee69..6903013e14e5 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -291,4 +291,9 @@ int xfs_bmapi_remap(struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t bno, xfs_filblks_t len, xfs_fsblock_t startblock, int flags);
+extern struct kmem_cache *xfs_bmap_intent_cache; + +int __init xfs_bmap_intent_init_cache(void); +void xfs_bmap_intent_destroy_cache(void); + #endif /* __XFS_BMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 168b8798625f..9d0abdac423f 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -18,6 +18,11 @@ #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_log.h" +#include "xfs_rmap.h" +#include "xfs_refcount.h" +#include "xfs_bmap.h" + +static struct kmem_cache *xfs_defer_pending_cache;
/* * Deferred Operations in XFS @@ -349,7 +354,7 @@ xfs_defer_cancel_list( ops->cancel_item(pwi); } ASSERT(dfp->dfp_count == 0); - kmem_free(dfp); + kmem_cache_free(xfs_defer_pending_cache, dfp); } }
@@ -446,7 +451,7 @@ xfs_defer_finish_one(
/* Done with the dfp, free it. */ list_del(&dfp->dfp_list); - kmem_free(dfp); + kmem_cache_free(xfs_defer_pending_cache, dfp); out: if (ops->finish_cleanup) ops->finish_cleanup(tp, state, error); @@ -580,8 +585,8 @@ xfs_defer_add( dfp = NULL; } if (!dfp) { - dfp = kmem_alloc(sizeof(struct xfs_defer_pending), - KM_NOFS); + dfp = kmem_cache_zalloc(xfs_defer_pending_cache, + GFP_NOFS | __GFP_NOFAIL); dfp->dfp_type = type; dfp->dfp_intent = NULL; dfp->dfp_done = NULL; @@ -750,3 +755,55 @@ xfs_defer_ops_continue(
kmem_free(dfc); } + +static inline int __init +xfs_defer_init_cache(void) +{ + xfs_defer_pending_cache = kmem_cache_create("xfs_defer_pending", + sizeof(struct xfs_defer_pending), + 0, 0, NULL); + + return xfs_defer_pending_cache != NULL ? 0 : -ENOMEM; +} + +static inline void +xfs_defer_destroy_cache(void) +{ + kmem_cache_destroy(xfs_defer_pending_cache); + xfs_defer_pending_cache = NULL; +} + +/* Set up caches for deferred work items. */ +int __init +xfs_defer_init_item_caches(void) +{ + int error; + + error = xfs_defer_init_cache(); + if (error) + return error; + error = xfs_rmap_intent_init_cache(); + if (error) + goto err; + error = xfs_refcount_intent_init_cache(); + if (error) + goto err; + error = xfs_bmap_intent_init_cache(); + if (error) + goto err; + + return 0; +err: + xfs_defer_destroy_item_caches(); + return error; +} + +/* Destroy all the deferred work item caches, if they've been allocated. */ +void +xfs_defer_destroy_item_caches(void) +{ + xfs_bmap_intent_destroy_cache(); + xfs_refcount_intent_destroy_cache(); + xfs_rmap_intent_destroy_cache(); + xfs_defer_destroy_cache(); +} diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index df0312679ca7..edc23cc324de 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -101,4 +101,7 @@ void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp, struct xfs_inode **captured_ipp); void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
+int __init xfs_defer_init_item_caches(void); +void xfs_defer_destroy_item_caches(void); + #endif /* __XFS_DEFER_H__ */ diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 1ab15910e402..ed21426f9a44 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -23,6 +23,8 @@ #include "xfs_refcount.h" #include "xfs_rmap.h"
+struct kmem_cache *xfs_refcount_intent_cache; + /* Allowable refcount adjustment amounts. */ enum xfs_refc_adjust_op { XFS_REFCOUNT_ADJUST_INCREASE = 1, @@ -1241,8 +1243,8 @@ __xfs_refcount_add( type, XFS_FSB_TO_AGBNO(tp->t_mountp, startblock), blockcount);
- ri = kmem_alloc(sizeof(struct xfs_refcount_intent), - KM_NOFS); + ri = kmem_cache_alloc(xfs_refcount_intent_cache, + GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); ri->ri_type = type; ri->ri_startblock = startblock; @@ -1787,3 +1789,20 @@ xfs_refcount_has_record(
return xfs_btree_has_record(cur, &low, &high, exists); } + +int __init +xfs_refcount_intent_init_cache(void) +{ + xfs_refcount_intent_cache = kmem_cache_create("xfs_refc_intent", + sizeof(struct xfs_refcount_intent), + 0, 0, NULL); + + return xfs_refcount_intent_cache != NULL ? 0 : -ENOMEM; +} + +void +xfs_refcount_intent_destroy_cache(void) +{ + kmem_cache_destroy(xfs_refcount_intent_cache); + xfs_refcount_intent_cache = NULL; +} diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index d43e276b6883..d86a1afca0b7 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -76,4 +76,9 @@ extern void xfs_refcount_btrec_to_irec(union xfs_btree_rec *rec, extern int xfs_refcount_insert(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, int *stat);
+extern struct kmem_cache *xfs_refcount_intent_cache; + +int __init xfs_refcount_intent_init_cache(void); +void xfs_refcount_intent_destroy_cache(void); + #endif /* __XFS_REFCOUNT_H__ */ diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index e6caa25c9e18..64bfb4f319be 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -22,6 +22,8 @@ #include "xfs_error.h" #include "xfs_inode.h"
+struct kmem_cache *xfs_rmap_intent_cache; + /* * Lookup the first record less than or equal to [bno, len, owner, offset] * in the btree given by cur. @@ -2487,7 +2489,7 @@ __xfs_rmap_add( bmap->br_blockcount, bmap->br_state);
- ri = kmem_alloc(sizeof(struct xfs_rmap_intent), KM_NOFS); + ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); ri->ri_type = type; ri->ri_owner = owner; @@ -2781,3 +2783,20 @@ const struct xfs_owner_info XFS_RMAP_OINFO_REFC = { const struct xfs_owner_info XFS_RMAP_OINFO_COW = { .oi_owner = XFS_RMAP_OWN_COW, }; + +int __init +xfs_rmap_intent_init_cache(void) +{ + xfs_rmap_intent_cache = kmem_cache_create("xfs_rmap_intent", + sizeof(struct xfs_rmap_intent), + 0, 0, NULL); + + return xfs_rmap_intent_cache != NULL ? 0 : -ENOMEM; +} + +void +xfs_rmap_intent_destroy_cache(void) +{ + kmem_cache_destroy(xfs_rmap_intent_cache); + xfs_rmap_intent_cache = NULL; +} diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 8a4f106c9ff4..9befe4894359 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -212,4 +212,9 @@ extern const struct xfs_owner_info XFS_RMAP_OINFO_INODES; extern const struct xfs_owner_info XFS_RMAP_OINFO_REFC; extern const struct xfs_owner_info XFS_RMAP_OINFO_COW;
+extern struct kmem_cache *xfs_rmap_intent_cache; + +int __init xfs_rmap_intent_init_cache(void); +void xfs_rmap_intent_destroy_cache(void); + #endif /* __XFS_RMAP_H__ */ diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index ec3c3c43b4f9..a68ed8f36b47 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -387,7 +387,7 @@ xfs_bmap_update_finish_item( bmap->bi_bmap.br_blockcount = count; return -EAGAIN; } - kmem_free(bmap); + kmem_cache_free(xfs_bmap_intent_cache, bmap); return error; }
@@ -407,7 +407,7 @@ xfs_bmap_update_cancel_item( struct xfs_bmap_intent *bmap;
bmap = container_of(item, struct xfs_bmap_intent, bi_list); - kmem_free(bmap); + kmem_cache_free(xfs_bmap_intent_cache, bmap); }
const struct xfs_defer_op_type xfs_bmap_update_defer_type = { diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 09efa723ddfa..1d3036fcfbf4 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -386,7 +386,7 @@ xfs_refcount_update_finish_item( refc->ri_blockcount = new_aglen; return -EAGAIN; } - kmem_free(refc); + kmem_cache_free(xfs_refcount_intent_cache, refc); return error; }
@@ -406,7 +406,7 @@ xfs_refcount_update_cancel_item( struct xfs_refcount_intent *refc;
refc = container_of(item, struct xfs_refcount_intent, ri_list); - kmem_free(refc); + kmem_cache_free(xfs_refcount_intent_cache, refc); }
const struct xfs_defer_op_type xfs_refcount_update_defer_type = { diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 52e3f983821a..aafbb6589ebc 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -404,7 +404,7 @@ xfs_rmap_update_finish_item( rmap->ri_bmap.br_startoff, rmap->ri_bmap.br_startblock, rmap->ri_bmap.br_blockcount, rmap->ri_bmap.br_state, state); - kmem_free(rmap); + kmem_cache_free(xfs_rmap_intent_cache, rmap); return error; }
@@ -424,7 +424,7 @@ xfs_rmap_update_cancel_item( struct xfs_rmap_intent *rmap;
rmap = container_of(item, struct xfs_rmap_intent, ri_list); - kmem_free(rmap); + kmem_cache_free(xfs_rmap_intent_cache, rmap); }
const struct xfs_defer_op_type xfs_rmap_update_defer_type = { diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 935cd48df9a2..b5021e9c3aaf 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -36,6 +36,7 @@ #include "xfs_bmap_item.h" #include "xfs_reflink.h" #include "xfs_pwork.h" +#include "xfs_defer.h"
#include <linux/magic.h> #include <linux/fs_context.h> @@ -1991,6 +1992,8 @@ MODULE_ALIAS_FS("xfs"); STATIC int __init xfs_init_caches(void) { + int error; + xfs_log_ticket_cache = kmem_cache_create("xfs_log_ticket", sizeof(struct xlog_ticket), 0, 0, NULL); @@ -2009,11 +2012,15 @@ xfs_init_caches(void) if (!xfs_btree_cur_cache) goto out_destroy_bmap_free_item_cache;
+ error = xfs_defer_init_item_caches(); + if (error) + goto out_destroy_btree_cur_cache; + xfs_da_state_cache = kmem_cache_create("xfs_da_state", sizeof(struct xfs_da_state), 0, 0, NULL); if (!xfs_da_state_cache) - goto out_destroy_btree_cur_cache; + goto out_destroy_defer_item_cache;
xfs_ifork_cache = kmem_cache_create("xfs_ifork", sizeof(struct xfs_ifork), @@ -2143,6 +2150,8 @@ xfs_init_caches(void) kmem_cache_destroy(xfs_ifork_cache); out_destroy_da_state_cache: kmem_cache_destroy(xfs_da_state_cache); + out_destroy_defer_item_cache: + xfs_defer_destroy_item_caches(); out_destroy_btree_cur_cache: kmem_cache_destroy(xfs_btree_cur_cache); out_destroy_bmap_free_item_cache: @@ -2177,6 +2186,7 @@ xfs_destroy_caches(void) kmem_cache_destroy(xfs_ifork_cache); kmem_cache_destroy(xfs_da_state_cache); kmem_cache_destroy(xfs_btree_cur_cache); + xfs_defer_destroy_item_caches(); kmem_cache_destroy(xfs_bmap_free_item_cache); kmem_cache_destroy(xfs_log_ticket_cache); }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit c201d9ca5392b20f04882848a071025b0e194c17 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs_bmap_add_free isn't a block mapping function; it schedules deferred freeing operations for a later point in a compound transaction chain. While it's primarily used by bunmapi, its use has expanded beyond that. Move it to xfs_alloc.c and rename the function since it's now general freeing functionality. Bring the slab cache bits in line with the way we handle the other intent items.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_ag.c fs/xfs/libxfs/xfs_alloc.c fs/xfs/libxfs/xfs_bmap.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/xfs_super.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 71 ++++++++++++++++++++++++++++++++-- fs/xfs/libxfs/xfs_alloc.h | 32 +++++++++++++++ fs/xfs/libxfs/xfs_bmap.c | 55 +------------------------- fs/xfs/libxfs/xfs_bmap.h | 28 -------------- fs/xfs/libxfs/xfs_bmap_btree.c | 2 +- fs/xfs/libxfs/xfs_defer.c | 5 +++ fs/xfs/libxfs/xfs_ialloc.c | 4 +- fs/xfs/libxfs/xfs_refcount.c | 6 +-- fs/xfs/xfs_extfree_item.c | 6 +-- fs/xfs/xfs_reflink.c | 2 +- fs/xfs/xfs_super.c | 11 +----- 11 files changed, 117 insertions(+), 105 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 9d1097331b49..d06097660d89 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -27,7 +27,7 @@ #include "xfs_ag_resv.h" #include "xfs_bmap.h"
-extern struct kmem_cache *xfs_bmap_free_item_cache; +struct kmem_cache *xfs_extfree_item_cache;
struct workqueue_struct *xfs_alloc_wq;
@@ -2449,7 +2449,7 @@ xfs_agfl_reset(
/* * Defer an AGFL block free. This is effectively equivalent to - * xfs_bmap_add_free() with some special handling particular to AGFL blocks. + * xfs_free_extent_later() with some special handling particular to AGFL blocks. * * Deferring AGFL frees helps prevent log reservation overruns due to too many * allocation operations in a transaction. AGFL frees are prone to this problem @@ -2468,10 +2468,10 @@ xfs_defer_agfl_block( struct xfs_mount *mp = tp->t_mountp; struct xfs_extent_free_item *new; /* new element */
- ASSERT(xfs_bmap_free_item_cache != NULL); + ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
- new = kmem_cache_alloc(xfs_bmap_free_item_cache, + new = kmem_cache_alloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); new->xefi_blockcount = 1; @@ -2483,6 +2483,52 @@ xfs_defer_agfl_block( xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list); }
+/* + * Add the extent to the list of extents to be free at transaction end. + * The list is maintained sorted (by block number). + */ +void +__xfs_free_extent_later( + struct xfs_trans *tp, + xfs_fsblock_t bno, + xfs_filblks_t len, + const struct xfs_owner_info *oinfo, + bool skip_discard) +{ + struct xfs_extent_free_item *new; /* new element */ +#ifdef DEBUG + struct xfs_mount *mp = tp->t_mountp; + xfs_agnumber_t agno; + xfs_agblock_t agbno; + + ASSERT(bno != NULLFSBLOCK); + ASSERT(len > 0); + ASSERT(len <= MAXEXTLEN); + ASSERT(!isnullstartblock(bno)); + agno = XFS_FSB_TO_AGNO(mp, bno); + agbno = XFS_FSB_TO_AGBNO(mp, bno); + ASSERT(agno < mp->m_sb.sb_agcount); + ASSERT(agbno < mp->m_sb.sb_agblocks); + ASSERT(len < mp->m_sb.sb_agblocks); + ASSERT(agbno + len <= mp->m_sb.sb_agblocks); +#endif + ASSERT(xfs_extfree_item_cache != NULL); + + new = kmem_cache_alloc(xfs_extfree_item_cache, + GFP_KERNEL | __GFP_NOFAIL); + new->xefi_startblock = bno; + new->xefi_blockcount = (xfs_extlen_t)len; + if (oinfo) + new->xefi_oinfo = *oinfo; + else + new->xefi_oinfo = XFS_RMAP_OINFO_SKIP_UPDATE; + new->xefi_skip_discard = skip_discard; + trace_xfs_bmap_free_defer(tp->t_mountp, + XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, + XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); +} + /* * Decide whether to use this allocation group for this allocation. * If so, fix up the btree freelist's size. @@ -3459,3 +3505,20 @@ xfs_agfl_walk(
return 0; } + +int __init +xfs_extfree_intent_init_cache(void) +{ + xfs_extfree_item_cache = kmem_cache_create("xfs_extfree_intent", + sizeof(struct xfs_extent_free_item), + 0, 0, NULL); + + return xfs_extfree_item_cache != NULL ? 0 : -ENOMEM; +} + +void +xfs_extfree_intent_destroy_cache(void) +{ + kmem_cache_destroy(xfs_extfree_item_cache); + xfs_extfree_item_cache = NULL; +} diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 9cf9b4b593ca..36c346a0f608 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -246,4 +246,36 @@ xfs_buf_to_agfl_bno( return bp->b_addr; }
+void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, + xfs_filblks_t len, const struct xfs_owner_info *oinfo, + bool skip_discard); + +/* + * List of extents to be free "later". + * The list is kept sorted on xbf_startblock. + */ +struct xfs_extent_free_item { + struct list_head xefi_list; + xfs_fsblock_t xefi_startblock;/* starting fs block number */ + xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ + bool xefi_skip_discard; + struct xfs_owner_info xefi_oinfo; /* extent owner */ +}; + +static inline void +xfs_free_extent_later( + struct xfs_trans *tp, + xfs_fsblock_t bno, + xfs_filblks_t len, + const struct xfs_owner_info *oinfo) +{ + __xfs_free_extent_later(tp, bno, len, oinfo, false); +} + + +extern struct kmem_cache *xfs_extfree_item_cache; + +int __init xfs_extfree_intent_init_cache(void); +void xfs_extfree_intent_destroy_cache(void); + #endif /* __XFS_ALLOC_H__ */ diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 19761b909128..2387d6c3add6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -37,7 +37,6 @@ #include "xfs_iomap.h"
struct kmem_cache *xfs_bmap_intent_cache; -struct kmem_cache *xfs_bmap_free_item_cache;
/* * Miscellaneous helper functions @@ -518,56 +517,6 @@ xfs_bmap_validate_ret( #define xfs_bmap_validate_ret(bno,len,flags,mval,onmap,nmap) do { } while (0) #endif /* DEBUG */
-/* - * bmap free list manipulation functions - */ - -/* - * Add the extent to the list of extents to be free at transaction end. - * The list is maintained sorted (by block number). - */ -void -__xfs_bmap_add_free( - struct xfs_trans *tp, - xfs_fsblock_t bno, - xfs_filblks_t len, - const struct xfs_owner_info *oinfo, - bool skip_discard) -{ - struct xfs_extent_free_item *new; /* new element */ -#ifdef DEBUG - struct xfs_mount *mp = tp->t_mountp; - xfs_agnumber_t agno; - xfs_agblock_t agbno; - - ASSERT(bno != NULLFSBLOCK); - ASSERT(len > 0); - ASSERT(len <= MAXEXTLEN); - ASSERT(!isnullstartblock(bno)); - agno = XFS_FSB_TO_AGNO(mp, bno); - agbno = XFS_FSB_TO_AGBNO(mp, bno); - ASSERT(agno < mp->m_sb.sb_agcount); - ASSERT(agbno < mp->m_sb.sb_agblocks); - ASSERT(len < mp->m_sb.sb_agblocks); - ASSERT(agbno + len <= mp->m_sb.sb_agblocks); -#endif - ASSERT(xfs_bmap_free_item_cache != NULL); - - new = kmem_cache_alloc(xfs_bmap_free_item_cache, - GFP_KERNEL | __GFP_NOFAIL); - new->xefi_startblock = bno; - new->xefi_blockcount = (xfs_extlen_t)len; - if (oinfo) - new->xefi_oinfo = *oinfo; - else - new->xefi_oinfo = XFS_RMAP_OINFO_SKIP_UPDATE; - new->xefi_skip_discard = skip_discard; - trace_xfs_bmap_free_defer(tp->t_mountp, - XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, - XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); -} - /* * Inode fork format manipulation functions */ @@ -622,7 +571,7 @@ xfs_bmap_btree_to_extents( if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error; xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - xfs_bmap_add_free(cur->bc_tp, cbno, 1, &oinfo); + xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); ip->i_d.di_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); xfs_trans_binval(tp, cbp); @@ -5239,7 +5188,7 @@ xfs_bmap_del_extent_real( if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { - __xfs_bmap_add_free(tp, del->br_startblock, + __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, (bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN); diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 6903013e14e5..76a40a47abdf 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -13,8 +13,6 @@ struct xfs_inode; struct xfs_mount; struct xfs_trans;
-extern struct kmem_cache *xfs_bmap_free_item_cache; - /* * Argument structure for xfs_bmap_alloc. */ @@ -44,19 +42,6 @@ struct xfs_bmalloca { int flags; };
-/* - * List of extents to be free "later". - * The list is kept sorted on xbf_startblock. - */ -struct xfs_extent_free_item -{ - xfs_fsblock_t xefi_startblock;/* starting fs block number */ - xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ - bool xefi_skip_discard; - struct list_head xefi_list; - struct xfs_owner_info xefi_oinfo; /* extent owner */ -}; - #define XFS_BMAP_MAX_NMAP 4
/* @@ -189,9 +174,6 @@ int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd); int xfs_bmap_set_attrforkoff(struct xfs_inode *ip, int size, int *version); void xfs_bmap_local_to_extents_empty(struct xfs_trans *tp, struct xfs_inode *ip, int whichfork); -void __xfs_bmap_add_free(struct xfs_trans *tp, xfs_fsblock_t bno, - xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard); void xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork); int xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip, xfs_extlen_t len, xfs_fileoff_t *unused, int whichfork); @@ -240,16 +222,6 @@ int xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp);
-static inline void -xfs_bmap_add_free( - struct xfs_trans *tp, - xfs_fsblock_t bno, - xfs_filblks_t len, - const struct xfs_owner_info *oinfo) -{ - __xfs_bmap_add_free(tp, bno, len, oinfo, false); -} - enum xfs_bmap_intent_type { XFS_BMAP_MAP = 1, XFS_BMAP_UNMAP, diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 65762ae9b40a..945a9c2f751a 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -286,7 +286,7 @@ xfs_bmbt_free_block( struct xfs_owner_info oinfo;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - xfs_bmap_add_free(cur->bc_tp, fsbno, 1, &oinfo); + xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 9d0abdac423f..3da95a951504 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -21,6 +21,7 @@ #include "xfs_rmap.h" #include "xfs_refcount.h" #include "xfs_bmap.h" +#include "xfs_alloc.h"
static struct kmem_cache *xfs_defer_pending_cache;
@@ -789,6 +790,9 @@ xfs_defer_init_item_caches(void) if (error) goto err; error = xfs_bmap_intent_init_cache(); + if (error) + goto err; + error = xfs_extfree_intent_init_cache(); if (error) goto err;
@@ -802,6 +806,7 @@ xfs_defer_init_item_caches(void) void xfs_defer_destroy_item_caches(void) { + xfs_extfree_intent_destroy_cache(); xfs_bmap_intent_destroy_cache(); xfs_refcount_intent_destroy_cache(); xfs_rmap_intent_destroy_cache(); diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index b8b7f4d14329..7b3a86fe5db5 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1868,7 +1868,7 @@ xfs_difree_inode_chunk(
if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ - xfs_bmap_add_free(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), + xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES); return; @@ -1913,7 +1913,7 @@ xfs_difree_inode_chunk(
ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); - xfs_bmap_add_free(tp, XFS_AGB_TO_FSB(mp, agno, agbno), + xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno), contigblk, &XFS_RMAP_OINFO_INODES);
/* reset range to current bit and carry on... */ diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index ed21426f9a44..4714b50626b3 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -975,7 +975,7 @@ xfs_refcount_adjust_extents( fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.agno, tmp.rc_startblock); - xfs_bmap_add_free(cur->bc_tp, fsbno, + xfs_free_extent_later(cur->bc_tp, fsbno, tmp.rc_blockcount, oinfo); }
@@ -1020,7 +1020,7 @@ xfs_refcount_adjust_extents( fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.agno, ext.rc_startblock); - xfs_bmap_add_free(cur->bc_tp, fsbno, ext.rc_blockcount, + xfs_free_extent_later(cur->bc_tp, fsbno, ext.rc_blockcount, oinfo); }
@@ -1749,7 +1749,7 @@ xfs_refcount_recover_cow_leftovers( rr->rr_rrec.rc_blockcount);
/* Free the block. */ - xfs_bmap_add_free(tp, fsb, rr->rr_rrec.rc_blockcount, NULL); + xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL);
error = xfs_trans_commit(tp); if (error) diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 4bcdc363ec12..5211b91883b1 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -487,7 +487,7 @@ xfs_extent_free_finish_item( free->xefi_startblock, free->xefi_blockcount, &free->xefi_oinfo, free->xefi_skip_discard); - kmem_cache_free(xfs_bmap_free_item_cache, free); + kmem_cache_free(xfs_extfree_item_cache, free); return error; }
@@ -507,7 +507,7 @@ xfs_extent_free_cancel_item( struct xfs_extent_free_item *free;
free = container_of(item, struct xfs_extent_free_item, xefi_list); - kmem_cache_free(xfs_bmap_free_item_cache, free); + kmem_cache_free(xfs_extfree_item_cache, free); }
const struct xfs_defer_op_type xfs_extent_free_defer_type = { @@ -569,7 +569,7 @@ xfs_agfl_free_finish_item( extp->ext_len = free->xefi_blockcount; efdp->efd_next_extent++;
- kmem_cache_free(xfs_bmap_free_item_cache, free); + kmem_cache_free(xfs_extfree_item_cache, free); return error; }
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 19b1285f6ae9..e9aaf730f784 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -484,7 +484,7 @@ xfs_reflink_cancel_cow_blocks( xfs_refcount_free_cow_extent(*tpp, del.br_startblock, del.br_blockcount);
- xfs_bmap_add_free(*tpp, del.br_startblock, + xfs_free_extent_later(*tpp, del.br_startblock, del.br_blockcount, NULL);
/* Roll the transaction */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index b5021e9c3aaf..502fb08bfd38 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2000,17 +2000,11 @@ xfs_init_caches(void) if (!xfs_log_ticket_cache) goto out;
- xfs_bmap_free_item_cache = kmem_cache_create("xfs_bmap_free_item", - sizeof(struct xfs_extent_free_item), - 0, 0, NULL); - if (!xfs_bmap_free_item_cache) - goto out_destroy_log_ticket_cache; - xfs_btree_cur_cache = kmem_cache_create("xfs_btree_cur", sizeof(struct xfs_btree_cur), 0, 0, NULL); if (!xfs_btree_cur_cache) - goto out_destroy_bmap_free_item_cache; + goto out_destroy_log_ticket_cache;
error = xfs_defer_init_item_caches(); if (error) @@ -2154,8 +2148,6 @@ xfs_init_caches(void) xfs_defer_destroy_item_caches(); out_destroy_btree_cur_cache: kmem_cache_destroy(xfs_btree_cur_cache); - out_destroy_bmap_free_item_cache: - kmem_cache_destroy(xfs_bmap_free_item_cache); out_destroy_log_ticket_cache: kmem_cache_destroy(xfs_log_ticket_cache); out: @@ -2187,7 +2179,6 @@ xfs_destroy_caches(void) kmem_cache_destroy(xfs_da_state_cache); kmem_cache_destroy(xfs_btree_cur_cache); xfs_defer_destroy_item_caches(); - kmem_cache_destroy(xfs_bmap_free_item_cache); kmem_cache_destroy(xfs_log_ticket_cache); }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit b3b5ff412ab04afd99173bb12d3cc146ee478ae7 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We only use EFIs to free metadata blocks -- not regular data/attr fork extents. Remove all the fields that we never use, for a net reduction of 16 bytes.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++++--------- fs/xfs/libxfs/xfs_alloc.h | 8 ++++++-- fs/xfs/xfs_extfree_item.c | 13 ++++++++++--- 3 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index d06097660d89..1079c98e166c 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2471,12 +2471,11 @@ xfs_defer_agfl_block( ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
- new = kmem_cache_alloc(xfs_extfree_item_cache, + new = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); new->xefi_blockcount = 1; - new->xefi_oinfo = *oinfo; - new->xefi_skip_discard = false; + new->xefi_owner = oinfo->oi_owner;
trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
@@ -2514,15 +2513,23 @@ __xfs_free_extent_later( #endif ASSERT(xfs_extfree_item_cache != NULL);
- new = kmem_cache_alloc(xfs_extfree_item_cache, + new = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = bno; new->xefi_blockcount = (xfs_extlen_t)len; - if (oinfo) - new->xefi_oinfo = *oinfo; - else - new->xefi_oinfo = XFS_RMAP_OINFO_SKIP_UPDATE; - new->xefi_skip_discard = skip_discard; + if (skip_discard) + new->xefi_flags |= XFS_EFI_SKIP_DISCARD; + if (oinfo) { + ASSERT(oinfo->oi_offset == 0); + + if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK) + new->xefi_flags |= XFS_EFI_ATTR_FORK; + if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK) + new->xefi_flags |= XFS_EFI_BMBT_BLOCK; + new->xefi_owner = oinfo->oi_owner; + } else { + new->xefi_owner = XFS_RMAP_OWN_NULL; + } trace_xfs_bmap_free_defer(tp->t_mountp, XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 36c346a0f608..b65e933ebabe 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -256,12 +256,16 @@ void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, */ struct xfs_extent_free_item { struct list_head xefi_list; + uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ - bool xefi_skip_discard; - struct xfs_owner_info xefi_oinfo; /* extent owner */ + unsigned int xefi_flags; };
+#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ +#define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ +#define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */ + static inline void xfs_free_extent_later( struct xfs_trans *tp, diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 5211b91883b1..445dc083873b 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -479,14 +479,20 @@ xfs_extent_free_finish_item( struct list_head *item, struct xfs_btree_cur **state) { + struct xfs_owner_info oinfo = { }; struct xfs_extent_free_item *free; int error;
free = container_of(item, struct xfs_extent_free_item, xefi_list); + oinfo.oi_owner = free->xefi_owner; + if (free->xefi_flags & XFS_EFI_ATTR_FORK) + oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; + if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) + oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; error = xfs_trans_free_extent(tp, EFD_ITEM(done), free->xefi_startblock, free->xefi_blockcount, - &free->xefi_oinfo, free->xefi_skip_discard); + &oinfo, free->xefi_flags & XFS_EFI_SKIP_DISCARD); kmem_cache_free(xfs_extfree_item_cache, free); return error; } @@ -530,6 +536,7 @@ xfs_agfl_free_finish_item( struct list_head *item, struct xfs_btree_cur **state) { + struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_efd_log_item *efdp = EFD_ITEM(done); struct xfs_extent_free_item *free; @@ -544,13 +551,13 @@ xfs_agfl_free_finish_item( ASSERT(free->xefi_blockcount == 1); agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock); agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock); + oinfo.oi_owner = free->xefi_owner;
trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, free->xefi_blockcount);
error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp); if (!error) - error = xfs_free_agfl_block(tp, agno, agbno, agbp, - &free->xefi_oinfo); + error = xfs_free_agfl_block(tp, agno, agbno, agbp, &oinfo);
/* * Mark the transaction dirty, even on error. This ensures the
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.15-rc4 commit c04c51c524697cd68d668d595f8ebc381ffe426b category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The owner info parameter is always NULL, so get rid of the parameter.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_refcount.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_refcount.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 4714b50626b3..5b39759af829 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -917,8 +917,7 @@ xfs_refcount_adjust_extents( struct xfs_btree_cur *cur, xfs_agblock_t *agbno, xfs_extlen_t *aglen, - enum xfs_refc_adjust_op adj, - struct xfs_owner_info *oinfo) + enum xfs_refc_adjust_op adj) { struct xfs_refcount_irec ext, tmp; int error; @@ -976,7 +975,7 @@ xfs_refcount_adjust_extents( cur->bc_ag.agno, tmp.rc_startblock); xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, oinfo); + tmp.rc_blockcount, NULL); }
(*agbno) += tmp.rc_blockcount; @@ -1020,8 +1019,8 @@ xfs_refcount_adjust_extents( fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.agno, ext.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, ext.rc_blockcount, - oinfo); + xfs_free_extent_later(cur->bc_tp, fsbno, + ext.rc_blockcount, NULL); }
skip: @@ -1049,8 +1048,7 @@ xfs_refcount_adjust( xfs_extlen_t aglen, xfs_agblock_t *new_agbno, xfs_extlen_t *new_aglen, - enum xfs_refc_adjust_op adj, - struct xfs_owner_info *oinfo) + enum xfs_refc_adjust_op adj) { bool shape_changed; int shape_changes = 0; @@ -1093,8 +1091,7 @@ xfs_refcount_adjust( cur->bc_ag.refc.shape_changes++;
/* Now that we've taken care of the ends, adjust the middle extents */ - error = xfs_refcount_adjust_extents(cur, new_agbno, new_aglen, - adj, oinfo); + error = xfs_refcount_adjust_extents(cur, new_agbno, new_aglen, adj); if (error) goto out_error;
@@ -1193,12 +1190,12 @@ xfs_refcount_finish_one( switch (type) { case XFS_REFCOUNT_INCREASE: error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno, - new_len, XFS_REFCOUNT_ADJUST_INCREASE, NULL); + new_len, XFS_REFCOUNT_ADJUST_INCREASE); *new_fsb = XFS_AGB_TO_FSB(mp, agno, new_agbno); break; case XFS_REFCOUNT_DECREASE: error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno, - new_len, XFS_REFCOUNT_ADJUST_DECREASE, NULL); + new_len, XFS_REFCOUNT_ADJUST_DECREASE); *new_fsb = XFS_AGB_TO_FSB(mp, agno, new_agbno); break; case XFS_REFCOUNT_ALLOC_COW:
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v6.2-rc6 commit 72ba455599ad13d08c29dafa22a32360e07b1961 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters.
Signed-off-by: Darrick J. Wong djwong@kernel.org
Conflicts: fs/xfs/xfs_extfree_item.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_extfree_item.c | 53 +++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 445dc083873b..100a21fe0527 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -361,23 +361,30 @@ static int xfs_trans_free_extent( struct xfs_trans *tp, struct xfs_efd_log_item *efdp, - xfs_fsblock_t start_block, - xfs_extlen_t ext_len, - const struct xfs_owner_info *oinfo, - bool skip_discard) + struct xfs_extent_free_item *free) { + struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_extent *extp; uint next_extent; - xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, start_block); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, + free->xefi_startblock); xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, - start_block); + free->xefi_startblock); int error;
- trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, ext_len); + oinfo.oi_owner = free->xefi_owner; + if (free->xefi_flags & XFS_EFI_ATTR_FORK) + oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; + if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) + oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; + + trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, + free->xefi_blockcount);
- error = __xfs_free_extent(tp, start_block, ext_len, - oinfo, XFS_AG_RESV_NONE, skip_discard); + error = __xfs_free_extent(tp, free->xefi_startblock, + free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + free->xefi_flags & XFS_EFI_SKIP_DISCARD); /* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: @@ -391,8 +398,8 @@ xfs_trans_free_extent( next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); extp = &(efdp->efd_format.efd_extents[next_extent]); - extp->ext_start = start_block; - extp->ext_len = ext_len; + extp->ext_start = free->xefi_startblock; + extp->ext_len = free->xefi_blockcount; efdp->efd_next_extent++;
return error; @@ -479,20 +486,12 @@ xfs_extent_free_finish_item( struct list_head *item, struct xfs_btree_cur **state) { - struct xfs_owner_info oinfo = { }; struct xfs_extent_free_item *free; int error;
free = container_of(item, struct xfs_extent_free_item, xefi_list); - oinfo.oi_owner = free->xefi_owner; - if (free->xefi_flags & XFS_EFI_ATTR_FORK) - oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; - if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) - oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; - error = xfs_trans_free_extent(tp, EFD_ITEM(done), - free->xefi_startblock, - free->xefi_blockcount, - &oinfo, free->xefi_flags & XFS_EFI_SKIP_DISCARD); + + error = xfs_trans_free_extent(tp, EFD_ITEM(done), free); kmem_cache_free(xfs_extfree_item_cache, free); return error; } @@ -630,10 +629,16 @@ xfs_efi_item_recover( efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { + struct xfs_extent_free_item fake = { + .xefi_owner = XFS_RMAP_OWN_UNKNOWN, + }; + extp = &efip->efi_format.efi_extents[i]; - error = xfs_trans_free_extent(tp, efdp, extp->ext_start, - extp->ext_len, - &XFS_RMAP_OINFO_ANY_OWNER, false); + + fake.xefi_startblock = extp->ext_start; + fake.xefi_blockcount = extp->ext_len; + + error = xfs_trans_free_extent(tp, efdp, &fake); if (error) goto abort_error;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc5 commit 3148ebf2c0782340946732bfaf3073d23ac833fa category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the agfl or the indexing in the AGF has been corrupted, getting a block form the AGFL could return an invalid block number. If this happens, bad things happen. Check the agbno we pull off the AGFL and return -EFSCORRUPTED if we find somethign bad.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com
Conflicts: fs/xfs/libxfs/xfs_alloc.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 1079c98e166c..7ffccfc1320c 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2748,6 +2748,10 @@ xfs_alloc_get_freelist( */ agfl_bno = xfs_buf_to_agfl_bno(agflbp); bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]); + if (XFS_IS_CORRUPT(tp->t_mountp, + !xfs_verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))) + return -EFSCORRUPTED; + be32_add_cpu(&agf->agf_flfirst, 1); xfs_trans_brelse(tp, agflbp); if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc5 commit 7dfee17b13e5024c5c0ab1911859ded4182de3e5 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around....
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com
Conflicts: fs/xfs/libxfs/xfs_ag.c fs/xfs/libxfs/xfs_alloc.c fs/xfs/libxfs/xfs_bmap.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/libxfs/xfs_ialloc.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 18 ++++++++++++++---- fs/xfs/libxfs/xfs_alloc.h | 6 +++--- fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++-- fs/xfs/libxfs/xfs_bmap_btree.c | 5 ++++- fs/xfs/libxfs/xfs_ialloc.c | 24 ++++++++++++++++-------- fs/xfs/libxfs/xfs_refcount.c | 13 ++++++++++--- fs/xfs/libxfs/xfs_types.c | 23 +++++++++++++++++++++++ fs/xfs/libxfs/xfs_types.h | 2 ++ fs/xfs/xfs_reflink.c | 4 +++- 9 files changed, 83 insertions(+), 22 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 7ffccfc1320c..ebfaf5f48010 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2458,7 +2458,7 @@ xfs_agfl_reset( * the real allocation can proceed. Deferring the free disconnects freeing up * the AGFL slot from freeing the block. */ -STATIC void +static int xfs_defer_agfl_block( struct xfs_trans *tp, xfs_agnumber_t agno, @@ -2477,16 +2477,20 @@ xfs_defer_agfl_block( new->xefi_blockcount = 1; new->xefi_owner = oinfo->oi_owner;
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, new->xefi_startblock))) + return -EFSCORRUPTED; + trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list); + return 0; }
/* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). */ -void +int __xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, @@ -2495,8 +2499,8 @@ __xfs_free_extent_later( bool skip_discard) { struct xfs_extent_free_item *new; /* new element */ -#ifdef DEBUG struct xfs_mount *mp = tp->t_mountp; +#ifdef DEBUG xfs_agnumber_t agno; xfs_agblock_t agbno;
@@ -2513,6 +2517,9 @@ __xfs_free_extent_later( #endif ASSERT(xfs_extfree_item_cache != NULL);
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) + return -EFSCORRUPTED; + new = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = bno; @@ -2534,6 +2541,7 @@ __xfs_free_extent_later( XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); + return 0; }
/* @@ -2648,7 +2656,9 @@ xfs_alloc_fix_freelist( goto out_agbp_relse;
/* defer agfl frees */ - xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo); + error = xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo); + if (error) + goto out_agbp_relse; }
targs.tp = tp; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index b65e933ebabe..0fd02e889216 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -246,7 +246,7 @@ xfs_buf_to_agfl_bno( return bp->b_addr; }
-void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, +int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, bool skip_discard);
@@ -266,14 +266,14 @@ struct xfs_extent_free_item { #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
-static inline void +static inline int xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo) { - __xfs_free_extent_later(tp, bno, len, oinfo, false); + return __xfs_free_extent_later(tp, bno, len, oinfo, false); }
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 2387d6c3add6..3eb1d31df6e1 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -570,8 +570,12 @@ xfs_bmap_btree_to_extents( cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error; + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + if (error) + return error; + ip->i_d.di_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); xfs_trans_binval(tp, cbp); @@ -5188,10 +5192,12 @@ xfs_bmap_del_extent_real( if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { - __xfs_free_extent_later(tp, del->br_startblock, + error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, (bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN); + if (error) + goto done; } }
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 945a9c2f751a..021afb43c56b 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -284,9 +284,12 @@ xfs_bmbt_free_block( struct xfs_trans *tp = cur->bc_tp; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); struct xfs_owner_info oinfo; + int error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + if (error) + return error; ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 7b3a86fe5db5..d26f966b0901 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1851,7 +1851,7 @@ xfs_dialloc( * might be sparse and only free the regions that are allocated as part of the * chunk. */ -STATIC void +static int xfs_difree_inode_chunk( struct xfs_trans *tp, xfs_agnumber_t agno, @@ -1868,10 +1868,10 @@ xfs_difree_inode_chunk(
if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ - xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); - return; + return xfs_free_extent_later(tp, + XFS_AGB_TO_FSB(mp, agno, sagbno), + M_IGEO(mp)->ialloc_blks, + &XFS_RMAP_OINFO_INODES); }
/* holemask is only 16-bits (fits in an unsigned long) */ @@ -1888,6 +1888,8 @@ xfs_difree_inode_chunk( XFS_INOBT_HOLEMASK_BITS); nextbit = startidx + 1; while (startidx < XFS_INOBT_HOLEMASK_BITS) { + int error; + nextbit = find_next_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS, nextbit); /* @@ -1913,8 +1915,11 @@ xfs_difree_inode_chunk(
ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); - xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + error = xfs_free_extent_later(tp, + XFS_AGB_TO_FSB(mp, agno, agbno), + contigblk, &XFS_RMAP_OINFO_INODES); + if (error) + return error;
/* reset range to current bit and carry on... */ startidx = endidx = nextbit; @@ -1922,6 +1927,7 @@ xfs_difree_inode_chunk( next: nextbit++; } + return 0; }
STATIC int @@ -2021,7 +2027,9 @@ xfs_difree_inobt( goto error0; }
- xfs_difree_inode_chunk(tp, agno, &rec); + error = xfs_difree_inode_chunk(tp, agno, &rec); + if (error) + goto error0; } else { xic->deleted = false;
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 5b39759af829..0e1a9f047b12 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -974,8 +974,10 @@ xfs_refcount_adjust_extents( fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.agno, tmp.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, + error = xfs_free_extent_later(cur->bc_tp, fsbno, tmp.rc_blockcount, NULL); + if (error) + goto out_error; }
(*agbno) += tmp.rc_blockcount; @@ -1019,8 +1021,10 @@ xfs_refcount_adjust_extents( fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.agno, ext.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, + error = xfs_free_extent_later(cur->bc_tp, fsbno, ext.rc_blockcount, NULL); + if (error) + goto out_error; }
skip: @@ -1746,7 +1750,10 @@ xfs_refcount_recover_cow_leftovers( rr->rr_rrec.rc_blockcount);
/* Free the block. */ - xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL); + error = xfs_free_extent_later(tp, fsb, + rr->rr_rrec.rc_blockcount, NULL); + if (error) + goto out_trans;
error = xfs_trans_commit(tp); if (error) diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index 86cb4b764897..19eabad06f92 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -61,6 +61,29 @@ xfs_verify_fsbno( return xfs_verify_agbno(mp, agno, XFS_FSB_TO_AGBNO(mp, fsbno)); }
+/* + * Verify that a data device extent is fully contained inside the filesystem, + * does not cross an AG boundary, and does not point at static metadata. + */ +bool +xfs_verify_fsbext( + struct xfs_mount *mp, + xfs_fsblock_t fsbno, + xfs_fsblock_t len) +{ + if (fsbno + len <= fsbno) + return false; + + if (!xfs_verify_fsbno(mp, fsbno)) + return false; + + if (!xfs_verify_fsbno(mp, fsbno + len - 1)) + return false; + + return XFS_FSB_TO_AGNO(mp, fsbno) == + XFS_FSB_TO_AGNO(mp, fsbno + len - 1); +} + /* Calculate the first and last possible inode number in an AG. */ void xfs_agino_range( diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 1ce06173c2f5..f76fa8498749 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -186,6 +186,8 @@ bool xfs_verify_agbno(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno); bool xfs_verify_fsbno(struct xfs_mount *mp, xfs_fsblock_t fsbno);
+bool xfs_verify_fsbext(struct xfs_mount *mp, xfs_fsblock_t fsbno, + xfs_fsblock_t len); void xfs_agino_range(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agino_t *first, xfs_agino_t *last); bool xfs_verify_agino(struct xfs_mount *mp, xfs_agnumber_t agno, diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index e9aaf730f784..e20aa87fed21 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -484,8 +484,10 @@ xfs_reflink_cancel_cow_blocks( xfs_refcount_free_cow_extent(*tpp, del.br_startblock, del.br_blockcount);
- xfs_free_extent_later(*tpp, del.br_startblock, + error = xfs_free_extent_later(*tpp, del.br_startblock, del.br_blockcount, NULL); + if (error) + break;
/* Roll the transaction */ error = xfs_defer_finish(tpp);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit 939bd50dfbe7c17d958a62208e8b584442759bf5 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
XFS has strict metadata ordering requirements. One of the things it does is maintain the commit order of items from transaction commit through the CIL and into the AIL. That is, if a transaction logs item A before item B in a modification, then they will be inserted into the CIL in the order {A, B}. These items are then written into the iclog during checkpointing in the order {A, B}. When the checkpoint commits, they are supposed to be inserted into the AIL in the order {A, B}, and when they are pushed from the AIL, they are pushed in the order {A, B}.
If we crash, log recovery then replays the two items from the checkpoint in the order {A, B}, resulting in the objects the items apply to being queued for writeback at the end of the checkpoint in the order {A, B}. This means recovery behaves the same way as the runtime code.
In places, we have subtle dependencies on this ordering being maintained. One of this place is performing intent recovery from the log. It assumes that recovering an intent will result in a non-intent object being the first thing that is modified in the recovery transaction, and so when the transaction commits and the journal flushes, the first object inserted into the AIL beyond the intent recovery range will be a non-intent item. It uses the transistion from intent items to non-intent items to stop the recovery pass.
A recent log recovery issue indicated that an intent was appearing as the first item in the AIL beyond the recovery range, hence breaking the end of recovery detection that exists.
Tracing indicated insertion of the items into the AIL was apparently occurring in the right order (the intent was last in the commit item list), but the intent was appearing first in the AIL. IOWs, the order of items in the AIL was {D,C,B,A}, not {A,B,C,D}, and bulk insertion was reversing the order of the items in the batch of items being inserted.
Lucky for us, all the items fed to bulk insertion have the same LSN, so the reversal of order does not affect the log head/tail tracking that is based on the contents of the AIL. It only impacts on code that has implicit, subtle dependencies on object order, and AFAICT only the intent recovery loop is impacted by it.
Make sure bulk AIL insertion does not reorder items incorrectly.
Fixes: 0e57f6a36f9b ("xfs: bulk AIL insertion during transaction commit") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_trans_ail.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index d3a97a028560..0f9d4527dc7d 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -822,7 +822,7 @@ xfs_trans_ail_update_bulk( trace_xfs_ail_insert(lip, 0, lsn); } lip->li_lsn = lsn; - list_add(&lip->li_ail, &tmp); + list_add_tail(&lip->li_ail, &tmp); }
if (!list_empty(&tmp))
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit b742d7b4f0e03df25c2a772adcded35044b625ca category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Btrees that aren't freespace management trees use the normal extent allocation and freeing routines for their blocks. Hence when a btree block is freed, a direct call to xfs_free_extent() is made and the extent is immediately freed. This puts the entire free space management btrees under this path, so we are stacking btrees on btrees in the call stack. The inobt, finobt and refcount btrees all do this.
However, the bmap btree does not do this - it calls xfs_free_extent_later() to defer the extent free operation via an XEFI and hence it gets processed in deferred operation processing during the commit of the primary transaction (i.e. via intent chaining).
We need to change xfs_free_extent() to behave in a non-blocking manner so that we can avoid deadlocks with busy extents near ENOSPC in transactions that free multiple extents. Inserting or removing a record from a btree can cause a multi-level tree merge operation and that will free multiple blocks from the btree in a single transaction. i.e. we can call xfs_free_extent() multiple times, and hence the btree manipulation transaction is vulnerable to this busy extent deadlock vector.
To fix this, convert all the remaining callers of xfs_free_extent() to use xfs_free_extent_later() to queue XEFIs and hence defer processing of the extent frees to a context that can be safely restarted if a deadlock condition is detected.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_ag.c fs/xfs/libxfs/xfs_alloc.c fs/xfs/libxfs/xfs_ialloc_btree.c fs/xfs/libxfs/xfs_refcount_btree.c fs/xfs/xfs_extfree_item.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 4 ++++ fs/xfs/libxfs/xfs_alloc.h | 8 +++++--- fs/xfs/libxfs/xfs_bmap.c | 8 +++++--- fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++- fs/xfs/libxfs/xfs_ialloc.c | 8 ++++---- fs/xfs/libxfs/xfs_ialloc_btree.c | 6 ++++-- fs/xfs/libxfs/xfs_refcount.c | 9 ++++++--- fs/xfs/libxfs/xfs_refcount_btree.c | 9 ++------- fs/xfs/xfs_extfree_item.c | 3 ++- fs/xfs/xfs_reflink.c | 3 ++- 10 files changed, 36 insertions(+), 25 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index ebfaf5f48010..2e7d442c117b 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2476,6 +2476,7 @@ xfs_defer_agfl_block( new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); new->xefi_blockcount = 1; new->xefi_owner = oinfo->oi_owner; + new->xefi_agresv = XFS_AG_RESV_AGFL;
if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, new->xefi_startblock))) return -EFSCORRUPTED; @@ -2496,6 +2497,7 @@ __xfs_free_extent_later( xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type, bool skip_discard) { struct xfs_extent_free_item *new; /* new element */ @@ -2516,6 +2518,7 @@ __xfs_free_extent_later( ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif ASSERT(xfs_extfree_item_cache != NULL); + ASSERT(type != XFS_AG_RESV_AGFL);
if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) return -EFSCORRUPTED; @@ -2524,6 +2527,7 @@ __xfs_free_extent_later( GFP_KERNEL | __GFP_NOFAIL); new->xefi_startblock = bno; new->xefi_blockcount = (xfs_extlen_t)len; + new->xefi_agresv = type; if (skip_discard) new->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 0fd02e889216..cfacb88e22d3 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -248,7 +248,7 @@ xfs_buf_to_agfl_bno(
int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard); + enum xfs_ag_resv_type type, bool skip_discard);
/* * List of extents to be free "later". @@ -260,6 +260,7 @@ struct xfs_extent_free_item { xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ unsigned int xefi_flags; + enum xfs_ag_resv_type xefi_agresv; };
#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ @@ -271,9 +272,10 @@ xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, - const struct xfs_owner_info *oinfo) + const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type) { - return __xfs_free_extent_later(tp, bno, len, oinfo, false); + return __xfs_free_extent_later(tp, bno, len, oinfo, type, false); }
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 3eb1d31df6e1..c5d784daed8b 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -572,7 +572,8 @@ xfs_bmap_btree_to_extents( return error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error;
@@ -5194,8 +5195,9 @@ xfs_bmap_del_extent_real( } else { error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, - (bflags & XFS_BMAPI_NODISCARD) || - del->br_state == XFS_EXT_UNWRITTEN); + XFS_AG_RESV_NONE, + ((bflags & XFS_BMAPI_NODISCARD) || + del->br_state == XFS_EXT_UNWRITTEN)); if (error) goto done; } diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 021afb43c56b..4118c4c1443a 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -287,7 +287,8 @@ xfs_bmbt_free_block( int error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error; ip->i_d.di_nblocks--; diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index d26f966b0901..1861fc71f028 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1870,8 +1870,8 @@ xfs_difree_inode_chunk( /* not sparse, calculate extent info directly */ return xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); + M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES, + XFS_AG_RESV_NONE); }
/* holemask is only 16-bits (fits in an unsigned long) */ @@ -1916,8 +1916,8 @@ xfs_difree_inode_chunk( ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); error = xfs_free_extent_later(tp, - XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + XFS_AGB_TO_FSB(mp, agno, agbno), contigblk, + &XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE); if (error) return error;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index 0733270062f6..5150010ca4c6 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -154,9 +154,11 @@ __xfs_inobt_free_block( struct xfs_buf *bp, enum xfs_ag_resv_type resv) { + xfs_fsblock_t fsbno; + xfs_inobt_mod_blockcount(cur, -1); - return xfs_free_extent(cur->bc_tp, - XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)), 1, + fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)); + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_INOBT, resv); }
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 0e1a9f047b12..4191548b4736 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -975,7 +975,8 @@ xfs_refcount_adjust_extents( cur->bc_ag.agno, tmp.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL); + tmp.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; } @@ -1022,7 +1023,8 @@ xfs_refcount_adjust_extents( cur->bc_ag.agno, ext.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL); + ext.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; } @@ -1751,7 +1753,8 @@ xfs_refcount_recover_cow_leftovers(
/* Free the block. */ error = xfs_free_extent_later(tp, fsb, - rr->rr_rrec.rc_blockcount, NULL); + rr->rr_rrec.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_trans;
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index 59b1fee4f680..3e0d82f1802b 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -103,18 +103,13 @@ xfs_refcountbt_free_block( struct xfs_buf *agbp = cur->bc_ag.agbp; struct xfs_agf *agf = agbp->b_addr; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); - int error;
trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.agno, XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); - error = xfs_free_extent(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_REFC, - XFS_AG_RESV_METADATA); - if (error) - return error; - - return error; + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, + &XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA); }
STATIC int diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 100a21fe0527..c9c648a618d6 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -383,7 +383,7 @@ xfs_trans_free_extent( free->xefi_blockcount);
error = __xfs_free_extent(tp, free->xefi_startblock, - free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + free->xefi_blockcount, &oinfo, free->xefi_agresv, free->xefi_flags & XFS_EFI_SKIP_DISCARD); /* * Mark the transaction dirty, even on error. This ensures the @@ -631,6 +631,7 @@ xfs_efi_item_recover( for (i = 0; i < efip->efi_format.efi_nextents; i++) { struct xfs_extent_free_item fake = { .xefi_owner = XFS_RMAP_OWN_UNKNOWN, + .xefi_agresv = XFS_AG_RESV_NONE, };
extp = &efip->efi_format.efi_extents[i]; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index e20aa87fed21..551cc92b049d 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -485,7 +485,8 @@ xfs_reflink_cancel_cow_blocks( del.br_blockcount);
error = xfs_free_extent_later(*tpp, del.br_startblock, - del.br_blockcount, NULL); + del.br_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) break;
hulk inclusion category: bugfix bugzilla: 187526, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
This reverts "xfs: propagate the return value of xfs_log_force() to avoid soft lockup", Avoid conflicts with mainline patches in subsequent rounds, where the mainline patch solves the problem that the current patch solves.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 22 ++++++---------------- fs/xfs/xfs_extent_busy.c | 6 ++---- fs/xfs/xfs_extent_busy.h | 2 +- 3 files changed, 9 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 2e7d442c117b..ce3dcd57e756 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1629,11 +1629,8 @@ xfs_alloc_ag_vextent_near( if (!acur.len) { if (acur.busy) { trace_xfs_alloc_near_busy(args); - error = xfs_extent_busy_flush(args->mp, args->pag, + xfs_extent_busy_flush(args->mp, args->pag, acur.busy_gen); - if (error) - goto out; - goto restart; } trace_xfs_alloc_size_neither(args); @@ -1736,14 +1733,11 @@ xfs_alloc_ag_vextent_size( * Make it unbusy by forcing the log out and * retrying. */ - trace_xfs_alloc_size_busy(args); - error = xfs_extent_busy_flush(args->mp, - args->pag, busy_gen); - if (error) - goto error0; - xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); + trace_xfs_alloc_size_busy(args); + xfs_extent_busy_flush(args->mp, + args->pag, busy_gen); goto restart; } } @@ -1825,13 +1819,9 @@ xfs_alloc_ag_vextent_size( args->len = rlen; if (rlen < args->minlen) { if (busy) { - trace_xfs_alloc_size_busy(args); - error = xfs_extent_busy_flush(args->mp, args->pag, - busy_gen); - if (error) - goto error0; - xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); + trace_xfs_alloc_size_busy(args); + xfs_extent_busy_flush(args->mp, args->pag, busy_gen); goto restart; } goto out_nominleft; diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index ea3cee00149a..26680444969c 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -579,7 +579,7 @@ xfs_extent_busy_clear( /* * Flush out all busy extents for this AG. */ -int +void xfs_extent_busy_flush( struct xfs_mount *mp, struct xfs_perag *pag, @@ -590,7 +590,7 @@ xfs_extent_busy_flush(
error = xfs_log_force(mp, XFS_LOG_SYNC); if (error) - return error; + return;
do { prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); @@ -600,8 +600,6 @@ xfs_extent_busy_flush( } while (1);
finish_wait(&pag->pagb_wait, &wait); - - return 0; }
void diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index 7099f4bb358c..8aea07100092 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -50,7 +50,7 @@ bool xfs_extent_busy_trim(struct xfs_alloc_arg *args, xfs_agblock_t *bno, xfs_extlen_t *len, unsigned *busy_gen);
-int +void xfs_extent_busy_flush(struct xfs_mount *mp, struct xfs_perag *pag, unsigned busy_gen);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit 6a2a9d776c4ae24a797e25eed2b9f7f33f756296 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
To avoid blocking in xfs_extent_busy_flush() when freeing extents and the only busy extents are held by the current transaction, we need to pass the XFS_ALLOC_FLAG_FREEING flag context all the way into xfs_extent_busy_flush().
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com
Conflicts: fs/xfs/libxfs/xfs_alloc.c fs/xfs/libxfs/xfs_alloc.h Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 90 +++++++++++++++++++++------------------ fs/xfs/libxfs/xfs_alloc.h | 2 +- fs/xfs/xfs_extent_busy.c | 3 +- fs/xfs/xfs_extent_busy.h | 2 +- 4 files changed, 52 insertions(+), 45 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index ce3dcd57e756..11de77ff02b1 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -37,8 +37,8 @@ struct workqueue_struct *xfs_alloc_wq; #define XFSA_FIXUP_CNT_OK 2
STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *); -STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *); -STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *); +STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *, uint32_t); +STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *, uint32_t);
/* * Size of the AGFL. For CRC-enabled filesystes we steal a couple of slots in @@ -1127,7 +1127,8 @@ xfs_alloc_ag_vextent_small( */ STATIC int /* error */ xfs_alloc_ag_vextent( - xfs_alloc_arg_t *args) /* argument structure for allocation */ + xfs_alloc_arg_t *args, /* argument structure for allocation */ + uint32_t alloc_flags) { int error=0;
@@ -1143,10 +1144,10 @@ xfs_alloc_ag_vextent( args->wasfromfl = 0; switch (args->type) { case XFS_ALLOCTYPE_THIS_AG: - error = xfs_alloc_ag_vextent_size(args); + error = xfs_alloc_ag_vextent_size(args, alloc_flags); break; case XFS_ALLOCTYPE_NEAR_BNO: - error = xfs_alloc_ag_vextent_near(args); + error = xfs_alloc_ag_vextent_near(args, alloc_flags); break; case XFS_ALLOCTYPE_THIS_BNO: error = xfs_alloc_ag_vextent_exact(args); @@ -1554,7 +1555,8 @@ xfs_alloc_ag_vextent_lastblock( */ STATIC int xfs_alloc_ag_vextent_near( - struct xfs_alloc_arg *args) + struct xfs_alloc_arg *args, + uint32_t alloc_flags) { struct xfs_alloc_cur acur = {}; int error; /* error code */ @@ -1630,7 +1632,7 @@ xfs_alloc_ag_vextent_near( if (acur.busy) { trace_xfs_alloc_near_busy(args); xfs_extent_busy_flush(args->mp, args->pag, - acur.busy_gen); + acur.busy_gen, alloc_flags); goto restart; } trace_xfs_alloc_size_neither(args); @@ -1653,21 +1655,22 @@ xfs_alloc_ag_vextent_near( * and of the form k * prod + mod unless there's nothing that large. * Return the starting a.g. block, or NULLAGBLOCK if we can't do it. */ -STATIC int /* error */ +static int xfs_alloc_ag_vextent_size( - xfs_alloc_arg_t *args) /* allocation argument structure */ + struct xfs_alloc_arg *args, + uint32_t alloc_flags) { - struct xfs_agf *agf = args->agbp->b_addr; - xfs_btree_cur_t *bno_cur; /* cursor for bno btree */ - xfs_btree_cur_t *cnt_cur; /* cursor for cnt btree */ - int error; /* error result */ - xfs_agblock_t fbno; /* start of found freespace */ - xfs_extlen_t flen; /* length of found freespace */ - int i; /* temp status variable */ - xfs_agblock_t rbno; /* returned block number */ - xfs_extlen_t rlen; /* length of returned extent */ - bool busy; - unsigned busy_gen; + struct xfs_agf *agf = args->agbp->b_addr; + struct xfs_btree_cur *bno_cur; + struct xfs_btree_cur *cnt_cur; + xfs_agblock_t fbno; /* start of found freespace */ + xfs_extlen_t flen; /* length of found freespace */ + xfs_agblock_t rbno; /* returned block number */ + xfs_extlen_t rlen; /* length of returned extent */ + bool busy; + unsigned busy_gen; + int error; + int i;
restart: /* @@ -1736,8 +1739,8 @@ xfs_alloc_ag_vextent_size( xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, - args->pag, busy_gen); + xfs_extent_busy_flush(args->mp, args->pag, + busy_gen, alloc_flags); goto restart; } } @@ -1821,7 +1824,8 @@ xfs_alloc_ag_vextent_size( if (busy) { xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, busy_gen); + xfs_extent_busy_flush(args->mp, args->pag, busy_gen, + alloc_flags); goto restart; } goto out_nominleft; @@ -2545,7 +2549,7 @@ __xfs_free_extent_later( int /* error */ xfs_alloc_fix_freelist( struct xfs_alloc_arg *args, /* allocation argument structure */ - int flags) /* XFS_ALLOC_FLAG_... */ + uint32_t alloc_flags) { struct xfs_mount *mp = args->mp; struct xfs_perag *pag = args->pag; @@ -2561,7 +2565,7 @@ xfs_alloc_fix_freelist( ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
if (!pag->pagf_init) { - error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp); + error = xfs_alloc_read_agf(mp, tp, args->agno, alloc_flags, &agbp); if (error) { /* Couldn't lock the AGF so skip this AG. */ if (error == -EAGAIN) @@ -2576,8 +2580,8 @@ xfs_alloc_fix_freelist( * point */ if (pag->pagf_metadata && (args->datatype & XFS_ALLOC_USERDATA) && - (flags & XFS_ALLOC_FLAG_TRYLOCK)) { - ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING)); + (alloc_flags & XFS_ALLOC_FLAG_TRYLOCK)) { + ASSERT(!(alloc_flags & XFS_ALLOC_FLAG_FREEING)); goto out_agbp_relse; }
@@ -2587,7 +2591,7 @@ xfs_alloc_fix_freelist( * transaction if needed. */ need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, flags | + if (!xfs_alloc_space_available(args, need, alloc_flags | XFS_ALLOC_FLAG_CHECK)) goto out_agbp_relse;
@@ -2596,7 +2600,7 @@ xfs_alloc_fix_freelist( * Can fail if we're not blocking on locks, and it's held. */ if (!agbp) { - error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp); + error = xfs_alloc_read_agf(mp, tp, args->agno, alloc_flags, &agbp); if (error) { /* Couldn't lock the AGF so skip this AG. */ if (error == -EAGAIN) @@ -2611,7 +2615,7 @@ xfs_alloc_fix_freelist(
/* If there isn't enough total space or single-extent, reject it. */ need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, flags)) + if (!xfs_alloc_space_available(args, need, alloc_flags)) goto out_agbp_relse;
/* @@ -2640,11 +2644,12 @@ xfs_alloc_fix_freelist( */ memset(&targs, 0, sizeof(targs)); /* struct copy below */ - if (flags & XFS_ALLOC_FLAG_NORMAP) + if (alloc_flags & XFS_ALLOC_FLAG_NORMAP) targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE; else targs.oinfo = XFS_RMAP_OINFO_AG; - while (!(flags & XFS_ALLOC_FLAG_NOSHRINK) && pag->pagf_flcount > need) { + while (!(alloc_flags & XFS_ALLOC_FLAG_NOSHRINK) && + pag->pagf_flcount > need) { error = xfs_alloc_get_freelist(tp, agbp, &bno, 0); if (error) goto out_agbp_relse; @@ -2673,7 +2678,7 @@ xfs_alloc_fix_freelist( targs.resv = XFS_AG_RESV_AGFL;
/* Allocate as many blocks as possible at once. */ - error = xfs_alloc_ag_vextent(&targs); + error = xfs_alloc_ag_vextent(&targs, alloc_flags); if (error) goto out_agflbp_relse;
@@ -2683,7 +2688,7 @@ xfs_alloc_fix_freelist( * on a completely full ag. */ if (targs.agbno == NULLAGBLOCK) { - if (flags & XFS_ALLOC_FLAG_FREEING) + if (alloc_flags & XFS_ALLOC_FLAG_FREEING) break; goto out_agflbp_relse; } @@ -3130,7 +3135,7 @@ xfs_alloc_vextent( { xfs_agblock_t agsize; /* allocation group size */ int error; - int flags; /* XFS_ALLOC_FLAG_... locking flags */ + uint32_t alloc_flags; /* XFS_ALLOC_FLAG_... locking flags */ struct xfs_mount *mp; /* mount structure pointer */ xfs_agnumber_t sagno; /* starting allocation group number */ xfs_alloctype_t type; /* input allocation type */ @@ -3183,7 +3188,7 @@ xfs_alloc_vextent( break; } args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno); - if ((error = xfs_alloc_ag_vextent(args))) + if ((error = xfs_alloc_ag_vextent(args, 0))) goto error0; break; case XFS_ALLOCTYPE_START_BNO: @@ -3212,13 +3217,13 @@ xfs_alloc_vextent( args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->type = XFS_ALLOCTYPE_THIS_AG; sagno = 0; - flags = 0; + alloc_flags = 0; } else { /* * Start with the given allocation group. */ args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno); - flags = XFS_ALLOC_FLAG_TRYLOCK; + alloc_flags = XFS_ALLOC_FLAG_TRYLOCK; } /* * Loop over allocation groups twice; first time with @@ -3226,7 +3231,7 @@ xfs_alloc_vextent( */ for (;;) { args->pag = xfs_perag_get(mp, args->agno); - error = xfs_alloc_fix_freelist(args, flags); + error = xfs_alloc_fix_freelist(args, alloc_flags); if (error) { trace_xfs_alloc_vextent_nofix(args); goto error0; @@ -3235,7 +3240,8 @@ xfs_alloc_vextent( * If we get a buffer back then the allocation will fly. */ if (args->agbp) { - if ((error = xfs_alloc_ag_vextent(args))) + if ((error = xfs_alloc_ag_vextent(args, + alloc_flags))) goto error0; break; } @@ -3266,13 +3272,13 @@ xfs_alloc_vextent( * or switch to non-trylock mode. */ if (args->agno == sagno) { - if (flags == 0) { + if (alloc_flags == 0) { args->agbno = NULLAGBLOCK; trace_xfs_alloc_vextent_allfailed(args); break; }
- flags = 0; + alloc_flags = 0; if (type == XFS_ALLOCTYPE_START_BNO) { args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno); diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index cfacb88e22d3..8435d02622cc 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -211,7 +211,7 @@ int xfs_alloc_read_agfl(struct xfs_mount *mp, struct xfs_trans *tp, xfs_agnumber_t agno, struct xfs_buf **bpp); int xfs_free_agfl_block(struct xfs_trans *, xfs_agnumber_t, xfs_agblock_t, struct xfs_buf *, struct xfs_owner_info *); -int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags); +int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, uint32_t alloc_flags); int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno, struct xfs_buf **agbp);
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 26680444969c..914011cf8c14 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -583,7 +583,8 @@ void xfs_extent_busy_flush( struct xfs_mount *mp, struct xfs_perag *pag, - unsigned busy_gen) + unsigned busy_gen, + uint32_t alloc_flags) { DEFINE_WAIT (wait); int error; diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index 8aea07100092..d70d395ce642 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -52,7 +52,7 @@ xfs_extent_busy_trim(struct xfs_alloc_arg *args, xfs_agblock_t *bno,
void xfs_extent_busy_flush(struct xfs_mount *mp, struct xfs_perag *pag, - unsigned busy_gen); + unsigned busy_gen, uint32_t alloc_flags);
void xfs_extent_busy_wait_all(struct xfs_mount *mp);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit 0853b5de42b471a92f4ff128a8757b87427d2431 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Extent freeing neeeds to be able to avoid a busy extent deadlock when the transaction itself holds the only busy extents in the allocation group. This may occur if we have an EFI that contains multiple extents to be freed, and the freeing the second intent requires the space the first extent free released to expand the AGFL. If we block on the busy extent at this point, we deadlock.
We hold a dirty transaction that contains a entire atomic extent free operations within it, so if we can abort the extent free operation and commit the progress that we've made, the busy extent can be resolved by a log force. Hence we can restart the aborted extent free with a new transaction and continue to make progress without risking deadlocks.
To enable this, we need the EFI processing code to be able to handle an -EAGAIN error to tell it to commit the current transaction and retry again. This mechanism is already built into the defer ops processing (used bythe refcount btree modification intents), so there's relatively little handling we need to add to the EFI code to enable this.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/xfs_extfree_item.c --- fs/xfs/xfs_extfree_item.c | 68 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index c9c648a618d6..24ff79d1a123 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -352,6 +352,34 @@ xfs_trans_get_efd( return efdp; }
+/* + * Fill the EFD with all extents from the EFI when we need to roll the + * transaction and continue with a new EFI. + * + * This simply copies all the extents in the EFI to the EFD rather than make + * assumptions about which extents in the EFI have already been processed. We + * currently keep the xefi list in the same order as the EFI extent list, but + * that may not always be the case. Copying everything avoids leaving a landmine + * were we fail to cancel all the extents in an EFI if the xefi list is + * processed in a different order to the extents in the EFI. + */ +static void +xfs_efd_from_efi( + struct xfs_efd_log_item *efdp) +{ + struct xfs_efi_log_item *efip = efdp->efd_efip; + uint i; + + ASSERT(efip->efi_format.efi_nextents > 0); + ASSERT(efdp->efd_next_extent < efip->efi_format.efi_nextents); + + for (i = 0; i < efip->efi_format.efi_nextents; i++) { + efdp->efd_format.efd_extents[i] = + efip->efi_format.efi_extents[i]; + } + efdp->efd_next_extent = efip->efi_format.efi_nextents; +} + /* * Free an extent and log it to the EFD. Note that the transaction is marked * dirty regardless of whether the extent free succeeds or fails to support the @@ -395,6 +423,17 @@ xfs_trans_free_extent( tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
+ /* + * If we need a new transaction to make progress, the caller will log a + * new EFI with the current contents. It will also log an EFD to cancel + * the existing EFI, and so we need to copy all the unprocessed extents + * in this EFI to the EFD so this works correctly. + */ + if (error == -EAGAIN) { + xfs_efd_from_efi(efdp); + return error; + } + next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); extp = &(efdp->efd_format.efd_extents[next_extent]); @@ -492,6 +531,14 @@ xfs_extent_free_finish_item( free = container_of(item, struct xfs_extent_free_item, xefi_list);
error = xfs_trans_free_extent(tp, EFD_ITEM(done), free); + + /* + * Don't free the XEFI if we need a new transaction to complete + * processing of it. + */ + if (error == -EAGAIN) + return error; + kmem_cache_free(xfs_extfree_item_cache, free); return error; } @@ -606,6 +653,7 @@ xfs_efi_item_recover( xfs_fsblock_t startblock_fsb; int i; int error = 0; + bool requeue_only = false;
/* * First check the validity of the extents described by the @@ -639,7 +687,25 @@ xfs_efi_item_recover( fake.xefi_startblock = extp->ext_start; fake.xefi_blockcount = extp->ext_len;
- error = xfs_trans_free_extent(tp, efdp, &fake); + if (!requeue_only) + error = xfs_trans_free_extent(tp, efdp, &fake); + + /* + * If we can't free the extent without potentially deadlocking, + * requeue the rest of the extents to a new so that they get + * run again later with a new transaction context. + */ + if (error == -EAGAIN || requeue_only) { + error = xfs_free_extent_later(tp, fake.xefi_startblock, + fake.xefi_blockcount, + &XFS_RMAP_OINFO_ANY_OWNER, + fake.xefi_agresv); + if (!error) { + requeue_only = true; + continue; + } + }; + if (error) goto abort_error;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit 8ebbf262d4684e035af5e7aa2a71cab636673a9b category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the current transaction holds a busy extent and we are trying to allocate a new extent to fix up the free list, we can deadlock if the AG is entirely empty except for the busy extent held by the transaction.
This can occur at runtime processing an XEFI with multiple extents in this path:
__schedule+0x22f at ffffffff81f75e8f schedule+0x46 at ffffffff81f76366 xfs_extent_busy_flush+0x69 at ffffffff81477d99 xfs_alloc_ag_vextent_size+0x16a at ffffffff8141711a xfs_alloc_ag_vextent+0x19b at ffffffff81417edb xfs_alloc_fix_freelist+0x22f at ffffffff8141896f xfs_free_extent_fix_freelist+0x6a at ffffffff8141939a __xfs_free_extent+0x99 at ffffffff81419499 xfs_trans_free_extent+0x3e at ffffffff814a6fee xfs_extent_free_finish_item+0x24 at ffffffff814a70d4 xfs_defer_finish_noroll+0x1f7 at ffffffff81441407 xfs_defer_finish+0x11 at ffffffff814417e1 xfs_itruncate_extents_flags+0x13d at ffffffff8148b7dd xfs_inactive_truncate+0xb9 at ffffffff8148bb89 xfs_inactive+0x227 at ffffffff8148c4f7 xfs_fs_destroy_inode+0xb8 at ffffffff81496898 destroy_inode+0x3b at ffffffff8127d2ab do_unlinkat+0x1d1 at ffffffff81270df1 do_syscall_64+0x40 at ffffffff81f6b5f0 entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8200007c
This can also happen in log recovery when processing an EFI with multiple extents through this path:
context_switch() kernel/sched/core.c:3881 __schedule() kernel/sched/core.c:5111 schedule() kernel/sched/core.c:5186 xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598 xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641 xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828 xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362 xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029 __xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067 xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370 xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626 xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605 xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893 xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824 xfs_log_mount_finish() fs/xfs/xfs_log.c:764 xfs_mountfs() fs/xfs/xfs_mount.c:978 xfs_fs_fill_super() fs/xfs/xfs_super.c:1908 mount_bdev() fs/super.c:1417 xfs_fs_mount() fs/xfs/xfs_super.c:1985 legacy_get_tree() fs/fs_context.c:647 vfs_get_tree() fs/super.c:1547 do_new_mount() fs/namespace.c:2843 do_mount() fs/namespace.c:3163 ksys_mount() fs/namespace.c:3372 __do_sys_mount() fs/namespace.c:3386 __se_sys_mount() fs/namespace.c:3383 __x64_sys_mount() fs/namespace.c:3383 do_syscall_64() arch/x86/entry/common.c:296 entry_SYSCALL_64() arch/x86/entry/entry_64.S:180
To avoid this deadlock, we should not block in xfs_extent_busy_flush() if we hold a busy extent in the current transaction.
Now that the EFI processing code can handle requeuing a partially completed EFI, we can detect this situation in xfs_extent_busy_flush() and return -EAGAIN rather than going to sleep forever. The -EAGAIN get propagated back out to the xfs_trans_free_extent() context, where the EFD is populated and the transaction is rolled, thereby moving the busy extents into the CIL.
At this point, we can retry the extent free operation again with a clean transaction. If we hit the same "all free extents are busy" situation when trying to fix up the free list, we can safely call xfs_extent_busy_flush() and wait for the busy extents to resolve and wake us. At this point, the allocation search can make progress again and we can fix up the free list.
This deadlock was first reported by Chandan in mid-2021, but I couldn't make myself understood during review, and didn't have time to fix it myself.
It was reported again in March 2023, and again I have found myself unable to explain the complexities of the solution needed during review.
As such, I don't have hours more time to waste trying to get the fix written the way it needs to be written, so I'm just doing it myself. This patchset is largely based on Wengang Wang's last patch, but with all the unnecessary stuff removed, split up into multiple patches and cleaned up somewhat.
Reported-by: Chandan Babu R chandanrlinux@gmail.com Reported-by: Wengang Wang wen.gang.wang@oracle.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 68 ++++++++++++++++++++++++++++----------- fs/xfs/libxfs/xfs_alloc.h | 11 ++++--- fs/xfs/xfs_extent_busy.c | 33 ++++++++++++++++--- fs/xfs/xfs_extent_busy.h | 6 ++-- 4 files changed, 88 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 11de77ff02b1..0c99287d0d89 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1575,6 +1575,8 @@ xfs_alloc_ag_vextent_near( if (args->agbno > args->max_agbno) args->agbno = args->max_agbno;
+ /* Retry once quickly if we find busy extents before blocking. */ + alloc_flags |= XFS_ALLOC_FLAG_TRYFLUSH; restart: len = 0;
@@ -1630,9 +1632,20 @@ xfs_alloc_ag_vextent_near( */ if (!acur.len) { if (acur.busy) { + /* + * Our only valid extents must have been busy. Flush and + * retry the allocation again. If we get an -EAGAIN + * error, we're being told that a deadlock was avoided + * and the current transaction needs committing before + * the allocation can be retried. + */ trace_xfs_alloc_near_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, - acur.busy_gen, alloc_flags); + error = xfs_extent_busy_flush(args->tp, args->pag, + acur.busy_gen, alloc_flags); + if (error) + goto out; + + alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH; goto restart; } trace_xfs_alloc_size_neither(args); @@ -1672,6 +1685,8 @@ xfs_alloc_ag_vextent_size( int error; int i;
+ /* Retry once quickly if we find busy extents before blocking. */ + alloc_flags |= XFS_ALLOC_FLAG_TRYFLUSH; restart: /* * Allocate and initialize a cursor for the by-size btree. @@ -1730,19 +1745,25 @@ xfs_alloc_ag_vextent_size( error = xfs_btree_increment(cnt_cur, 0, &i); if (error) goto error0; - if (i == 0) { - /* - * Our only valid extents must have been busy. - * Make it unbusy by forcing the log out and - * retrying. - */ - xfs_btree_del_cursor(cnt_cur, - XFS_BTREE_NOERROR); - trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, - busy_gen, alloc_flags); - goto restart; - } + if (i) + continue; + + /* + * Our only valid extents must have been busy. Flush and + * retry the allocation again. If we get an -EAGAIN + * error, we're being told that a deadlock was avoided + * and the current transaction needs committing before + * the allocation can be retried. + */ + trace_xfs_alloc_size_busy(args); + error = xfs_extent_busy_flush(args->tp, args->pag, + busy_gen, alloc_flags); + if (error) + goto error0; + + alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH; + xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); + goto restart; } }
@@ -1822,10 +1843,21 @@ xfs_alloc_ag_vextent_size( args->len = rlen; if (rlen < args->minlen) { if (busy) { - xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); + /* + * Our only valid extents must have been busy. Flush and + * retry the allocation again. If we get an -EAGAIN + * error, we're being told that a deadlock was avoided + * and the current transaction needs committing before + * the allocation can be retried. + */ trace_xfs_alloc_size_busy(args); - xfs_extent_busy_flush(args->mp, args->pag, busy_gen, - alloc_flags); + error = xfs_extent_busy_flush(args->tp, args->pag, + busy_gen, alloc_flags); + if (error) + goto error0; + + alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH; + xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); goto restart; } goto out_nominleft; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 8435d02622cc..6ae1ade607cd 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -38,11 +38,12 @@ typedef unsigned int xfs_alloctype_t; /* * Flags for xfs_alloc_fix_freelist. */ -#define XFS_ALLOC_FLAG_TRYLOCK 0x00000001 /* use trylock for buffer locking */ -#define XFS_ALLOC_FLAG_FREEING 0x00000002 /* indicate caller is freeing extents*/ -#define XFS_ALLOC_FLAG_NORMAP 0x00000004 /* don't modify the rmapbt */ -#define XFS_ALLOC_FLAG_NOSHRINK 0x00000008 /* don't shrink the freelist */ -#define XFS_ALLOC_FLAG_CHECK 0x00000010 /* test only, don't modify args */ +#define XFS_ALLOC_FLAG_TRYLOCK (1U << 0) /* use trylock for buffer locking */ +#define XFS_ALLOC_FLAG_FREEING (1U << 1) /* indicate caller is freeing extents*/ +#define XFS_ALLOC_FLAG_NORMAP (1U << 2) /* don't modify the rmapbt */ +#define XFS_ALLOC_FLAG_NOSHRINK (1U << 3) /* don't shrink the freelist */ +#define XFS_ALLOC_FLAG_CHECK (1U << 4) /* test only, don't modify args */ +#define XFS_ALLOC_FLAG_TRYFLUSH (1U << 5) /* don't wait in busy extent flush */
/* * Argument structure for xfs_alloc routines. diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 914011cf8c14..3b71e1714514 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -578,10 +578,21 @@ xfs_extent_busy_clear(
/* * Flush out all busy extents for this AG. + * + * If the current transaction is holding busy extents, the caller may not want + * to wait for committed busy extents to resolve. If we are being told just to + * try a flush or progress has been made since we last skipped a busy extent, + * return immediately to allow the caller to try again. + * + * If we are freeing extents, we might actually be holding the only free extents + * in the transaction busy list and the log force won't resolve that situation. + * In this case, we must return -EAGAIN to avoid a deadlock by informing the + * caller it needs to commit the busy extents it holds before retrying the + * extent free operation. */ -void +int xfs_extent_busy_flush( - struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_perag *pag, unsigned busy_gen, uint32_t alloc_flags) @@ -589,10 +600,23 @@ xfs_extent_busy_flush( DEFINE_WAIT (wait); int error;
- error = xfs_log_force(mp, XFS_LOG_SYNC); + error = xfs_log_force(tp->t_mountp, XFS_LOG_SYNC); if (error) - return; + return error; + + /* Avoid deadlocks on uncommitted busy extents. */ + if (!list_empty(&tp->t_busy)) { + if (alloc_flags & XFS_ALLOC_FLAG_TRYFLUSH) + return 0; + + if (busy_gen != READ_ONCE(pag->pagb_gen)) + return 0; + + if (alloc_flags & XFS_ALLOC_FLAG_FREEING) + return -EAGAIN; + }
+ /* Wait for committed busy extents to resolve. */ do { prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); if (busy_gen != READ_ONCE(pag->pagb_gen)) @@ -601,6 +625,7 @@ xfs_extent_busy_flush( } while (1);
finish_wait(&pag->pagb_wait, &wait); + return 0; }
void diff --git a/fs/xfs/xfs_extent_busy.h b/fs/xfs/xfs_extent_busy.h index d70d395ce642..f145d5320519 100644 --- a/fs/xfs/xfs_extent_busy.h +++ b/fs/xfs/xfs_extent_busy.h @@ -50,9 +50,9 @@ bool xfs_extent_busy_trim(struct xfs_alloc_arg *args, xfs_agblock_t *bno, xfs_extlen_t *len, unsigned *busy_gen);
-void -xfs_extent_busy_flush(struct xfs_mount *mp, struct xfs_perag *pag, - unsigned busy_gen, uint32_t alloc_flags); +int +xfs_extent_busy_flush(struct xfs_trans *tp, struct xfs_perag *pag, + unsigned busy_gen, uint32_t alloc_flags);
void xfs_extent_busy_wait_all(struct xfs_mount *mp);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit f1e1765aad7de7a8b8102044fc6a44684bc36180 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If the journal geometry results in a sector or log stripe unit validation problem, it indicates that we cannot set the log up to safely write to the the journal. In these cases, we must abort the mount because the corruption needs external intervention to resolve. Similarly, a journal that is too large cannot be written to safely, either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations overruning the available log space and the system hanging waiting for space it can never provide. This is purely a runtime hang issue, not a corruption issue as per the first cases listed above. We abort mounts of the log is too small for V5 filesystems, but we must allow v4 filesystems to mount because, historically, there was no log size validity checking and so some systems may still be out there with undersized logs.
The problem is that on V4 filesystems, when we discover a log geometry problem, we skip all the remaining checks and then allow the log to continue mounting. This mean that if one of the log size checks fails, we skip the log stripe unit check. i.e. we allow the mount because a "non-fatal" geometry is violated, and then fail to check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a new check for the two log sector size geometry variables having the same values. This will prevent any attempt to mount a log that has invalid or inconsistent geometries long before we attempt to mount the log.
However, for the minimum log size checks, we can only do that once we've setup up the log and calculated all the iclog sizes and roundoffs. Hence this needs to remain in the log mount code after the log has been initialised. It is also the only case where we should allow a v4 filesystem to continue running, so leave that handling in place, too.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_sb.c | 56 +++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_log.c | 47 +++++++++++------------------------ 2 files changed, 70 insertions(+), 33 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index af313e9bf55e..c099ccf2787d 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -419,7 +419,6 @@ xfs_validate_sb_common( sbp->sb_inodelog < XFS_DINODE_MIN_LOG || sbp->sb_inodelog > XFS_DINODE_MAX_LOG || sbp->sb_inodesize != (1 << sbp->sb_inodelog) || - sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE || sbp->sb_inopblock != howmany(sbp->sb_blocksize,sbp->sb_inodesize) || XFS_FSB_TO_B(mp, sbp->sb_agblocks) < XFS_MIN_AG_BYTES || XFS_FSB_TO_B(mp, sbp->sb_agblocks) > XFS_MAX_AG_BYTES || @@ -437,6 +436,61 @@ xfs_validate_sb_common( return -EFSCORRUPTED; }
+ /* + * Logs that are too large are not supported at all. Reject them + * outright. Logs that are too small are tolerated on v4 filesystems, + * but we can only check that when mounting the log. Hence we skip + * those checks here. + */ + if (sbp->sb_logblocks > XFS_MAX_LOG_BLOCKS) { + xfs_notice(mp, + "Log size 0x%x blocks too large, maximum size is 0x%llx blocks", + sbp->sb_logblocks, XFS_MAX_LOG_BLOCKS); + return -EFSCORRUPTED; + } + + if (XFS_FSB_TO_B(mp, sbp->sb_logblocks) > XFS_MAX_LOG_BYTES) { + xfs_warn(mp, + "log size 0x%llx bytes too large, maximum size is 0x%llx bytes", + XFS_FSB_TO_B(mp, sbp->sb_logblocks), + XFS_MAX_LOG_BYTES); + return -EFSCORRUPTED; + } + + /* + * Do not allow filesystems with corrupted log sector or stripe units to + * be mounted. We cannot safely size the iclogs or write to the log if + * the log stripe unit is not valid. + */ + if (sbp->sb_versionnum & XFS_SB_VERSION_SECTORBIT) { + if (sbp->sb_logsectsize != (1U << sbp->sb_logsectlog)) { + xfs_notice(mp, + "log sector size in bytes/log2 (0x%x/0x%x) must match", + sbp->sb_logsectsize, 1U << sbp->sb_logsectlog); + return -EFSCORRUPTED; + } + } else if (sbp->sb_logsectsize || sbp->sb_logsectlog) { + xfs_notice(mp, + "log sector size in bytes/log2 (0x%x/0x%x) are not zero", + sbp->sb_logsectsize, sbp->sb_logsectlog); + return -EFSCORRUPTED; + } + + if (sbp->sb_logsunit > 1) { + if (sbp->sb_logsunit % sbp->sb_blocksize) { + xfs_notice(mp, + "log stripe unit 0x%x bytes must be a multiple of block size", + sbp->sb_logsunit); + return -EFSCORRUPTED; + } + if (sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE) { + xfs_notice(mp, + "log stripe unit 0x%x bytes over maximum size (0x%x bytes)", + sbp->sb_logsunit, XLOG_MAX_RECORD_BSIZE); + return -EFSCORRUPTED; + } + } + /* Validate the realtime geometry; stolen from xfs_repair */ if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE || sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) { diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index c9bfafd0b9f5..059f0c61f94c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -613,7 +613,6 @@ xfs_log_mount( int num_bblks) { struct xlog *log; - bool fatal = xfs_has_crc(mp); int error = 0; int min_logfsbs;
@@ -635,53 +634,37 @@ xfs_log_mount( mp->m_log = log;
/* - * Validate the given log space and drop a critical message via syslog - * if the log size is too small that would lead to some unexpected - * situations in transaction log space reservation stage. + * Now that we have set up the log and it's internal geometry + * parameters, we can validate the given log space and drop a critical + * message via syslog if the log size is too small. A log that is too + * small can lead to unexpected situations in transaction log space + * reservation stage. The superblock verifier has already validated all + * the other log geometry constraints, so we don't have to check those + * here. * - * Note: we can't just reject the mount if the validation fails. This - * would mean that people would have to downgrade their kernel just to - * remedy the situation as there is no way to grow the log (short of - * black magic surgery with xfs_db). + * Note: For v4 filesystems, we can't just reject the mount if the + * validation fails. This would mean that people would have to + * downgrade their kernel just to remedy the situation as there is no + * way to grow the log (short of black magic surgery with xfs_db). * - * We can, however, reject mounts for CRC format filesystems, as the + * We can, however, reject mounts for V5 format filesystems, as the * mkfs binary being used to make the filesystem should never create a * filesystem with a log that is too small. */ min_logfsbs = xfs_log_calc_minimum_size(mp); - if (mp->m_sb.sb_logblocks < min_logfsbs) { xfs_warn(mp, "Log size %d blocks too small, minimum size is %d blocks", mp->m_sb.sb_logblocks, min_logfsbs); - error = -EINVAL; - } else if (mp->m_sb.sb_logblocks > XFS_MAX_LOG_BLOCKS) { - xfs_warn(mp, - "Log size %d blocks too large, maximum size is %lld blocks", - mp->m_sb.sb_logblocks, XFS_MAX_LOG_BLOCKS); - error = -EINVAL; - } else if (XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks) > XFS_MAX_LOG_BYTES) { - xfs_warn(mp, - "log size %lld bytes too large, maximum size is %lld bytes", - XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks), - XFS_MAX_LOG_BYTES); - error = -EINVAL; - } else if (mp->m_sb.sb_logsunit > 1 && - mp->m_sb.sb_logsunit % mp->m_sb.sb_blocksize) { - xfs_warn(mp, - "log stripe unit %u bytes must be a multiple of block size", - mp->m_sb.sb_logsunit); - error = -EINVAL; - fatal = true; - } - if (error) { + /* * Log check errors are always fatal on v5; or whenever bad * metadata leads to a crash. */ - if (fatal) { + if (xfs_has_crc(mp)) { xfs_crit(mp, "AAIEEE! Log failed size checks. Abort!"); ASSERT(0); + error = -EINVAL; goto out_free_log; } xfs_crit(mp, "Log size out of supported range.");
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit edd8276dd70279c29d412d99b99c2c0cac1b2cdd category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The AGF verifier does not check that the AGF length field is within known good bounds. This has never been checked by runtime kernel code (i.e. the lack of verification goes back to 1993) yet we assume in many places that it is correct and verify other metdata against it.
Add length verification to the AGF verifier. The length of the AGF must be equal to the size of the AG specified in the superblock, unless it is the last AG in the filesystem. In that case, it must be less than or equal to sb->sb_agblocks and greater than XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs operation will allow to exist.
This requires a bit of rework of the verifier function. We want to verify metadata before we use it to verify other metadata. Hence we need to verify the AGF sequence numbers before using them to verify the length of the AGF. Then we can verify the AGF length before we verify AGFL fields. Then we can verifier other fields that are bounds limited by the AGF length.
And, finally, by calculating agf_length only once into a local variable, we can collapse repeated "if (xfs_has_foo() &&" conditionaly checks into single checks. This makes the code much easier to follow as all the checks for a given feature are obviously in the same place.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org
Conflicts: fs/xfs/libxfs/xfs_alloc.c Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 89 ++++++++++++++++++++++++--------------- 1 file changed, 56 insertions(+), 33 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 0c99287d0d89..4a139583a905 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2939,6 +2939,7 @@ xfs_agf_verify( { struct xfs_mount *mp = bp->b_mount; struct xfs_agf *agf = bp->b_addr; + uint32_t agf_length = be32_to_cpu(agf->agf_length);
if (xfs_has_crc(mp)) { if (!uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid)) @@ -2950,18 +2951,49 @@ xfs_agf_verify( if (!xfs_verify_magic(bp, agf->agf_magicnum)) return __this_address;
- if (!(XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) && - be32_to_cpu(agf->agf_freeblks) <= be32_to_cpu(agf->agf_length) && - be32_to_cpu(agf->agf_flfirst) < xfs_agfl_size(mp) && - be32_to_cpu(agf->agf_fllast) < xfs_agfl_size(mp) && - be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp))) + if (!XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum))) return __this_address;
- if (be32_to_cpu(agf->agf_length) > mp->m_sb.sb_dblocks) + /* + * Both agf_seqno and agf_length need to validated before anything else + * block number related in the AGF or AGFL can be checked. + * + * During growfs operations, the perag is not fully initialised, + * so we can't use it for any useful checking. growfs ensures we can't + * use it by using uncached buffers that don't have the perag attached + * so we can detect and avoid this problem. + */ + if (bp->b_pag && be32_to_cpu(agf->agf_seqno) != bp->b_pag->pag_agno) + return __this_address; + + /* + * Only the last AGF in the filesytsem is allowed to be shorter + * than the AG size recorded in the superblock. + */ + if (agf_length != mp->m_sb.sb_agblocks) { + /* + * During growfs, the new last AGF can get here before we + * have updated the superblock. Give it a pass on the seqno + * check. + */ + if (bp->b_pag && + be32_to_cpu(agf->agf_seqno) != mp->m_sb.sb_agcount - 1) + return __this_address; + if (agf_length < XFS_MIN_AG_BLOCKS) + return __this_address; + if (agf_length > mp->m_sb.sb_agblocks) + return __this_address; + } + + if (be32_to_cpu(agf->agf_flfirst) >= xfs_agfl_size(mp)) + return __this_address; + if (be32_to_cpu(agf->agf_fllast) >= xfs_agfl_size(mp)) + return __this_address; + if (be32_to_cpu(agf->agf_flcount) > xfs_agfl_size(mp)) return __this_address;
if (be32_to_cpu(agf->agf_freeblks) < be32_to_cpu(agf->agf_longest) || - be32_to_cpu(agf->agf_freeblks) > be32_to_cpu(agf->agf_length)) + be32_to_cpu(agf->agf_freeblks) > agf_length) return __this_address;
if (be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]) < 1 || @@ -2970,37 +3002,28 @@ xfs_agf_verify( be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS) return __this_address;
- if (xfs_has_rmapbt(mp) && - (be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) < 1 || - be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)) - return __this_address; - - if (xfs_has_rmapbt(mp) && - be32_to_cpu(agf->agf_rmap_blocks) > be32_to_cpu(agf->agf_length)) + if (xfs_has_lazysbcount(mp) && + be32_to_cpu(agf->agf_btreeblks) > agf_length) return __this_address;
- /* - * during growfs operations, the perag is not fully initialised, - * so we can't use it for any useful checking. growfs ensures we can't - * use it by using uncached buffers that don't have the perag attached - * so we can detect and avoid this problem. - */ - if (bp->b_pag && be32_to_cpu(agf->agf_seqno) != bp->b_pag->pag_agno) - return __this_address; + if (xfs_has_rmapbt(mp)) { + if (be32_to_cpu(agf->agf_rmap_blocks) > agf_length) + return __this_address;
- if (xfs_has_lazysbcount(mp) && - be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length)) - return __this_address; + if (be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) < 1 || + be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > + mp->m_rmap_maxlevels) + return __this_address; + }
- if (xfs_has_reflink(mp) && - be32_to_cpu(agf->agf_refcount_blocks) > - be32_to_cpu(agf->agf_length)) - return __this_address; + if (xfs_has_reflink(mp)) { + if (be32_to_cpu(agf->agf_refcount_blocks) > agf_length) + return __this_address;
- if (xfs_has_reflink(mp) && - (be32_to_cpu(agf->agf_refcount_level) < 1 || - be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)) - return __this_address; + if (be32_to_cpu(agf->agf_refcount_level) < 1 || + be32_to_cpu(agf->agf_refcount_level) > mp->m_refc_maxlevels) + return __this_address; + }
return NULL;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.4-rc6 commit 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 category: bugfix bugzilla: 188883, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Need to happen before we allocate and then leak the xefi. Found by coverity via an xfsprogs libxfs scan.
[djwong: This also fixes the type of the @agbno argument.]
Fixes: 7dfee17b13e5 ("xfs: validate block number being freed before adding to xefi") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/libxfs/xfs_alloc.c --- fs/xfs/libxfs/xfs_alloc.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 4a139583a905..53ea47fb2cc3 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2488,25 +2488,26 @@ static int xfs_defer_agfl_block( struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_fsblock_t agbno, + xfs_agblock_t agbno, struct xfs_owner_info *oinfo) { struct xfs_mount *mp = tp->t_mountp; struct xfs_extent_free_item *new; /* new element */ + xfs_fsblock_t fsbno = XFS_AGB_TO_FSB(mp, agno, agbno);
ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, fsbno))) + return -EFSCORRUPTED; + new = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); - new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); + new->xefi_startblock = fsbno; new->xefi_blockcount = 1; new->xefi_owner = oinfo->oi_owner; new->xefi_agresv = XFS_AG_RESV_AGFL;
- if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, new->xefi_startblock))) - return -EFSCORRUPTED; - trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list);
hulk inclusion category: bugfix bugzilla: 188734, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
The following error occurred when do IO fault injection test:
XFS: Assertion failed: xlog_is_shutdown(lip->li_log), file: fs/xfs/xfs_inode_item.c, line: 748
commit "3c4cb76bce43 xfs: xfs_trans_commit() path must check for log shutdown" fix a problem that dirty transaction was canceled before log shutdown, because of the log is still running, it result dirty and unlogged inode item that isn't in the AIL in memory that can be flushed to disk via writeback clustering.
xfs_trans_cancel() has the same problem, if a shut down races with xfs_trans_cancel() and we have shut down the filesystem but not the log, we will still cancel the transaction before log shutdown. So xfs_trans_cancel() needs to check log state for shutdown, not mount.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 36a894a2128a..609ddada6fd7 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -985,7 +985,7 @@ xfs_trans_cancel( * progress, so we only need to check against the mount shutdown state * here. */ - if (dirty && !xfs_is_shutdown(mp)) { + if (dirty && !(xfs_is_shutdown(mp) && xlog_is_shutdown(log))) { XFS_ERROR_REPORT("xfs_trans_cancel", XFS_ERRLEVEL_LOW, mp); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); }
From: Colin Ian King colin.i.king@gmail.com
mainline inclusion from mainline-v6.4-rc6 commit 347eb95b27eb97bebdc3ea7de23558216f4e2c90 category: bugfix bugzilla: 189095, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Pointers drop_leaf and save_leaf are initialized with values that are never read, they are being re-assigned later on just before they are used. Remove the redundant early initializations and keep the later assignments at the point where they are used. Cleans up two clang scan build warnings:
fs/xfs/libxfs/xfs_attr_leaf.c:2288:29: warning: Value stored to 'drop_leaf' during its initialization is never read [deadcode.DeadStores] fs/xfs/libxfs/xfs_attr_leaf.c:2289:29: warning: Value stored to 'save_leaf' during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King colin.i.king@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_attr_leaf.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 852019a7feda..70e30fdbedfd 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -2251,8 +2251,6 @@ xfs_attr3_leaf_unbalance(
trace_xfs_attr_leaf_unbalance(state->args);
- drop_leaf = drop_blk->bp->b_addr; - save_leaf = save_blk->bp->b_addr; xfs_attr3_leaf_hdr_from_disk(state->args->geo, &drophdr, drop_leaf); xfs_attr3_leaf_hdr_from_disk(state->args->geo, &savehdr, save_leaf); entry = xfs_attr3_leaf_entryp(drop_leaf);
From: Colin Ian King colin.i.king@gmail.com
mainline inclusion from mainline-v6.1-rc1 commit fc93812c725068e6a491ce574f058a4530130c00 category: bugfix bugzilla: 189095, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The assignment to pointer lip is not really required, the pointer lip is redundant and can be removed.
Cleans up clang-scan warning: warning: Although the value stored to 'lip' is used in the enclosing expression, the value is never actually read from 'lip' [deadcode.DeadStores]
Signed-off-by: Colin Ian King colin.i.king@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_trans_ail.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 0f9d4527dc7d..11a938652f22 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -730,11 +730,10 @@ void xfs_ail_push_all_sync( struct xfs_ail *ailp) { - struct xfs_log_item *lip; DEFINE_WAIT(wait);
spin_lock(&ailp->ail_lock); - while ((lip = xfs_ail_max(ailp)) != NULL) { + while (xfs_ail_max(ailp) != NULL) { prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE); wake_up_process(ailp->ail_task); spin_unlock(&ailp->ail_lock);
From: Jiapeng Chong jiapeng.chong@linux.alibaba.com
mainline inclusion from mainline-v5.13-rc4 commit 9673261c32dc2f30863b803374b726a72d16b07c category: bugfix bugzilla: 189095, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Variable busy is set to false, but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed.
Clean up the following clang-analyzer warning:
fs/xfs/libxfs/xfs_alloc.c:1679:2: warning: Value stored to 'busy' is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Jiapeng Chong jiapeng.chong@linux.alibaba.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 53ea47fb2cc3..fc60013ffba9 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1694,7 +1694,6 @@ xfs_alloc_ag_vextent_size( cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp, args->agno, XFS_BTNUM_CNT); bno_cur = NULL; - busy = false;
/* * Look for an entry >= maxlen+alignment-1 blocks.
hulk inclusion category: bugfix bugzilla: 189177, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
The following corruption is reported when log recovery:
XFS (loop0): Metadata corruption detected at xfs_agf_verify+0x386/0xb70, xfs_agf block 0x64001 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 41 47 46 00 00 00 01 00 00 00 02 00 00 4b 88 XAGF..........K. 00000010: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 01 ................ 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ 00000030: 00 00 00 04 00 00 4b 7e 00 00 4b 7e 00 00 00 00 ......K~..K~.... 00000040: 3a 9d 97 6d b5 a0 42 13 a3 b3 7f 28 2a ac 3f e8 :..m..B....(*.?. 00000050: 00 00 00 00 00 00 00 01 00 00 00 05 00 00 00 01 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0xbc9/0xe60 (fs/xfs/xfs_buf.c:1593). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed
This problem occurs during agf write verify, where the AG's agf_length is smaller than sb_agblocks, and it is not the last AG. Consider the following situation, the file system has 3 AG, and the last AG is not full size, now the file system is growfs twice. In the first growfs, only AG2 is increased some blocks. In the second growfs, AG2 is extent to the full AG size and an AG3 is added.
pre growfs:
|----------|----------|-----|
First growfs:
|----------|----------|--------|
Second growfs:
|----------|----------|----------|-------| AG0 AG1 AG2 AG3
During each growfs, agf in AG2 and sb will be modified, if sb has already written to metadata and agf has not yet, then a shutdown occurs. The next time we mount the disk, the sb changes from the second growfs will have taken effect in memory, when we recovery the agf from the first growfs will report the above problem.
Fixes: edd8276dd702 ("xfs: AGF length has never been bounds checked") Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index fc60013ffba9..433798f5d935 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2970,7 +2970,8 @@ xfs_agf_verify( * Only the last AGF in the filesytsem is allowed to be shorter * than the AG size recorded in the superblock. */ - if (agf_length != mp->m_sb.sb_agblocks) { + if (agf_length != mp->m_sb.sb_agblocks && + !(bp->b_flags & _XBF_LOGRECOVERY)) { /* * During growfs, the new last AGF can get here before we * have updated the superblock. Give it a pass on the seqno
hulk inclusion category: bugfix bugzilla: 189076, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
While performing the io fault injection test, I caught the following data corruption report:
XFS (dm-0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x79c/0x1130 CPU: 3 PID: 33 Comm: kworker/3:0 Not tainted 6.5.0-rc7-next-20230825-00001-g7f8666926889 #214 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Workqueue: xfs-inodegc/dm-0 xfs_inodegc_worker Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 xfs_free_ag_extent+0x7d3/0x1130 __xfs_free_extent+0x201/0x3c0 xfs_trans_free_extent+0x29b/0xa10 xfs_extent_free_finish_item+0x2a/0xb0 xfs_defer_finish_noroll+0x8d1/0x1b40 xfs_defer_finish+0x21/0x200 xfs_itruncate_extents_flags+0x1cb/0x650 xfs_free_eofblocks+0x18f/0x250 xfs_inactive+0x485/0x570 xfs_inodegc_worker+0x207/0x530 process_scheduled_works+0x24a/0xe10 worker_thread+0x5ac/0xc60 kthread+0x2cd/0x3c0 ret_from_fork+0x4a/0x80 ret_from_fork_asm+0x11/0x20 </TASK> XFS (dm-0): Corruption detected. Unmount and run xfs_repair
After analyzing the disk image, it was found that the corruption was triggered by the fact that extent was recorded in both the inode and AGF btrees. After a long time of reproduction and analysis, we found that the root cause of the problem was that the AGF btree block was not recovered.
Consider the following scenario, Transaction A and Transaction B are in the same record, so Transaction A and Transaction B share the same LSN1. If the buf item in Transaction A has been recovered, then the buf item in Transaction B cannot be recovered, because log recovery skips items with a metadata LSN >= the current LSN of the recovery item. If there is still an inode item in transaction B that records the Extent X, the Extent X will be recorded in both the inode and the AGF btree block after transaction B is recovered.
|------------Record (LSN1)------------------|---Record (LSN2)---| |----------Trans A------------|-------------Trans B-------------| | Buf Item(Extent X) | Buf Item / Inode item(Extent X) | | Extent X is freed | Extent X is allocated |
After commit 12818d24db8a ("xfs: rework log recovery to submit buffers on LSN boundaries") was introduced, during log recovery we submits buffers on lsn boundaries. The above problem can be avoided under normal paths, but is not guaranteed under abnormal paths. Consider the following process, if an error is encountered after recover buf item in transaction A and before recover buf item in transaction B, buffers that have been added to buffer_list will still be submitted, this violates the submits rule on lsn boundaries. So buf item in Transaction B cannot be recovered on the next mount due to current lsn equal to metadata lsn.
xlog_do_recovery_pass xlog_recover_process xlog_recover_process_data ... xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer //recover buf item in Trans A xfs_buf_delwri_queue(bp, buffer_list) ... ====> Encountered error and returned ... xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer //recover buf item in Trans B xfs_buf_delwri_queue(bp, buffer_list) if (!list_empty(&buffer_list)) xfs_buf_delwri_submit(&buffer_list); //submit regardless of error
In order to make sure that submits buffers on lsn boundaries in the abnormal paths, we need to check error status before submit buffers that have been added from the last record processed.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_log_recover.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 1ce9700ea68c..331f568834a1 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3193,8 +3193,12 @@ xlog_do_recovery_pass( * Submit buffers that have been added from the last record processed, * regardless of error status. */ - if (!list_empty(&buffer_list)) + if (!list_empty(&buffer_list)) { + if (error) + xfs_force_shutdown(log->l_mp, SHUTDOWN_META_IO_ERROR); + error2 = xfs_buf_delwri_submit(&buffer_list); + }
if (error && first_bad) *first_bad = rhead_blk;
From: Guo Xuenan guoxuenan@huawei.com
hulk inclusion category: bugfix bugzilla: 188788, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
In DEBUG mode may do sparse inode allocations randomly, but forget to set the remaining space correctly for the inode btree to split. It's OK for most cases, only under DEBUG mode and AG space is running out may bring something bad.
Fixes: 1cdadee11f8d ("xfs: randomly do sparse inode allocations in DEBUG mode") Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_ialloc.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 1861fc71f028..0c2b7a6601a0 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -768,6 +768,8 @@ xfs_ialloc_ag_alloc( args.alignment = args.mp->m_sb.sb_spino_align; args.prod = 1;
+ /* Allow space for the inode btree to split */ + args.minleft = igeo->inobt_maxlevels; args.minlen = igeo->ialloc_min_blks; args.maxlen = args.minlen;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.3-rc1 commit d5753847b216db0e553e8065aa825cfe497ad143 category: bugfix bugzilla: 188788, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value.
This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.
We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it...
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap.h | 2 ++ fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++---------- 3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index c5d784daed8b..cecfa5280b51 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4199,7 +4199,7 @@ xfs_bmapi_convert_unwritten( return 0; }
-static inline xfs_extlen_t +xfs_extlen_t xfs_bmapi_minleft( struct xfs_trans *tp, struct xfs_inode *ip, diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 76a40a47abdf..39848a18700c 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -221,6 +221,8 @@ int xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp); +xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip, + int fork);
enum xfs_bmap_intent_type { XFS_BMAP_MAP = 1, diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 4118c4c1443a..a6f7cd114222 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -211,18 +211,16 @@ xfs_bmbt_alloc_block( if (args.fsbno == NULLFSBLOCK) { args.fsbno = be64_to_cpu(start->l); args.type = XFS_ALLOCTYPE_START_BNO; + /* - * Make sure there is sufficient room left in the AG to - * complete a full tree split for an extent insert. If - * we are converting the middle part of an extent then - * we may need space for two tree splits. - * - * We are relying on the caller to make the correct block - * reservation for this operation to succeed. If the - * reservation amount is insufficient then we may fail a - * block allocation here and corrupt the filesystem. + * If we are coming here from something like unwritten extent + * conversion, there has been no data extent allocation already + * done, so we have to ensure that we attempt to locate the + * entire set of bmbt allocations in the same AG, as + * xfs_bmapi_write() would have reserved. */ - args.minleft = args.tp->t_blk_res; + args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip, + cur->bc_ino.whichfork); } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) { args.type = XFS_ALLOCTYPE_START_BNO; } else { @@ -246,6 +244,7 @@ xfs_bmbt_alloc_block( * successful activate the lowspace algorithm. */ args.fsbno = 0; + args.minleft = 0; args.type = XFS_ALLOCTYPE_FIRST_AG; error = xfs_alloc_vextent(&args); if (error)
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 188788, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
Twice fixup for the same ag may happen within exact one tp, and the consume of agfl after first fixup may trigger failure of second fixup, which is a unintended behavior and then xfs shutdown[1][2].
Gao Xiang describe one solution that we can reserve more blocks when first fixup, but there is some logical error:
- we may first see postallocs as 1 and second as 0, this can trigger pointless agfl filling or shortening - upper case(postallocs first equals to 1, second equals to 0) give us examples that we need shorten the agfl, but xfs_alloc_fix_freelist can only free agfl after success freespace check. Besides, the filling or shortening of agfl won't change fdblocks, so we can fall into that we can see fdblocks(or resblocks) but ag fixup will reject us, and then xfs can shutdown too - once postallocs equals to 1, it can also change the logical of xfs_alloc_ag_max_usable, which will change the block allocation logical(found this problem by check each ag's freeblocks after we fallocate a huge file) - once postallocs equals to 1, we reserve 2 * xfs_alloc_min_freelist(), but sometimes it seems not enough once bnt/cnt grow and the second fixup need more reserve...
This patch fix all bug above by using m_ag_maxlevels to reserve more blocks, and adapt xfs_alloc_set_aside/xfs_alloc_ag_max_usable to match this more reserve. Besides, we just reserve more, won't fill or shorten agfl according to that reserve.
[1] https://www.spinics.net/lists/linux-xfs/msg66440.html [2] https://lore.kernel.org/linux-xfs/20221228133204.4021519-1-guoxuenan@huawei....
Fixes: 53f85096f93e ("xfs: account extra freespace btree splits for multiple allocations") Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 41 ++++++++++++++++++++++++++++++++++----- fs/xfs/xfs_mount.c | 9 +++++++++ 2 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 433798f5d935..d4d7a99114c7 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -81,6 +81,25 @@ xfs_prealloc_blocks( return XFS_IBT_BLOCK(mp) + 1; }
+/* + * Twice fixup for the same ag may happen within exact one tp, and the consume + * of agfl after first fixup may trigger second fixup's failure, then xfs will + * shutdown. To avoid that, we reserve blocks which can satisfy the second + * fixup. + */ +xfs_extlen_t +xfs_ag_fixup_aside( + struct xfs_mount *mp) +{ + xfs_extlen_t ret; + + ret = 2 * mp->m_ag_maxlevels; + if (xfs_has_rmapbt(mp)) + ret += mp->m_rmap_maxlevels; + + return ret; +} + /* * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of * AGF buffer (PV 947395), we place constraints on the relationship among @@ -95,12 +114,15 @@ xfs_prealloc_blocks( * * We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to handle a * potential split of the file's bmap btree. + * + * Besides, comment for xfs_ag_fixup_aside show why we reserve more blocks. */ unsigned int xfs_alloc_set_aside( struct xfs_mount *mp) { - return mp->m_sb.sb_agcount * (XFS_ALLOC_AGFL_RESERVE + 4); + return mp->m_sb.sb_agcount * (XFS_ALLOC_AGFL_RESERVE + + 4 + xfs_ag_fixup_aside(mp)); }
/* @@ -133,6 +155,8 @@ xfs_alloc_ag_max_usable( if (xfs_has_reflink(mp)) blocks++; /* refcount root block */
+ blocks += xfs_ag_fixup_aside(mp); + return mp->m_sb.sb_agblocks - blocks; }
@@ -2591,6 +2615,7 @@ xfs_alloc_fix_freelist( struct xfs_alloc_arg targs; /* local allocation arguments */ xfs_agblock_t bno; /* freelist block */ xfs_extlen_t need; /* total blocks needed in freelist */ + xfs_extlen_t minfree; int error = 0;
/* deferred ops (AGFL block frees) require permanent transactions */ @@ -2622,8 +2647,11 @@ xfs_alloc_fix_freelist( * blocks to perform multiple allocations from a single AG and * transaction if needed. */ - need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, alloc_flags | + minfree = need = xfs_alloc_min_freelist(mp, pag); + if (args->postallocs) + minfree += xfs_ag_fixup_aside(mp); + + if (!xfs_alloc_space_available(args, minfree, alloc_flags | XFS_ALLOC_FLAG_CHECK)) goto out_agbp_relse;
@@ -2646,8 +2674,11 @@ xfs_alloc_fix_freelist( xfs_agfl_reset(tp, agbp, pag);
/* If there isn't enough total space or single-extent, reject it. */ - need = xfs_alloc_min_freelist(mp, pag) * (1 + args->postallocs); - if (!xfs_alloc_space_available(args, need, alloc_flags)) + minfree = need = xfs_alloc_min_freelist(mp, pag); + if (args->postallocs) + minfree += xfs_ag_fixup_aside(mp); + + if (!xfs_alloc_space_available(args, minfree, alloc_flags)) goto out_agbp_relse;
/* diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3f7044611286..04b347fe1b59 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -778,6 +778,15 @@ xfs_mountfs( xfs_rmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp);
+ /* + * We now need m_ag_maxlevels/m_rmap_maxlevels to initialize + * m_alloc_set_aside/m_ag_max_usable. And when we first do the + * init in xfs_sb_mount_common, m_alloc_set_aside/m_ag_max_usable + * still equals to 0. Redo it now. + */ + mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); + mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); + /* * Check if sb_agblocks is aligned at stripe boundary. If sb_agblocks * is NOT aligned turn off m_dalign since allocator alignment is within
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 188788, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
xfs_alloc_space_available need to make sure the longest free extent can satisfy extra blocks needed for agfl fixup and also the really needed block for the latter allocation. Or the latter agfl filling may consume some blocks within the longest free extent, which can lead to that all the free extent can not satisfy the really block allocation.
The arg 'min_free' show that actually blocks we need fill agfl up to. But after commit aa1ab9a77d89 ("xfs: fix xfs shutdown since we reserve more blocks in agfl fixup"), the 'min_free' will add the reserve blocks for second fixup, but leave agfl blocks does not include this.
Actually, what we want was just reserve more blocks to satisfy second fixup, we will not fill agfl with that so many blocks, so the args for xfs_alloc_longest_free_extent should not consider postalloc. And fix agflcount in xfs_alloc_space_available too.
Fixes: aa1ab9a77d89 ("xfs: fix xfs shutdown since we reserve more blocks in agfl fixup") Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index d4d7a99114c7..b901eedf3bb8 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2339,6 +2339,7 @@ xfs_alloc_min_freelist( static bool xfs_alloc_space_available( struct xfs_alloc_arg *args, + xfs_extlen_t need, xfs_extlen_t min_free, int flags) { @@ -2355,7 +2356,7 @@ xfs_alloc_space_available(
/* do we have enough contiguous free space for the allocation? */ alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop; - longest = xfs_alloc_longest_free_extent(pag, min_free, reservation); + longest = xfs_alloc_longest_free_extent(pag, need, reservation); if (longest < alloc_len) return false;
@@ -2364,7 +2365,7 @@ xfs_alloc_space_available( * account extra agfl blocks because we are about to defer free them, * making them unavailable until the current transaction commits. */ - agflcount = min_t(xfs_extlen_t, pag->pagf_flcount, min_free); + agflcount = min_t(xfs_extlen_t, pag->pagf_flcount, need); available = (int)(pag->pagf_freeblks + agflcount - reservation - min_free - args->minleft); if (available < (int)max(args->total, alloc_len)) @@ -2651,7 +2652,7 @@ xfs_alloc_fix_freelist( if (args->postallocs) minfree += xfs_ag_fixup_aside(mp);
- if (!xfs_alloc_space_available(args, minfree, alloc_flags | + if (!xfs_alloc_space_available(args, need, minfree, alloc_flags | XFS_ALLOC_FLAG_CHECK)) goto out_agbp_relse;
@@ -2678,7 +2679,7 @@ xfs_alloc_fix_freelist( if (args->postallocs) minfree += xfs_ag_fixup_aside(mp);
- if (!xfs_alloc_space_available(args, minfree, alloc_flags)) + if (!xfs_alloc_space_available(args, need, minfree, alloc_flags)) goto out_agbp_relse;
/*
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.1-rc4 commit 198dd8aedee6a7d2de0dfa739f9a008a938f6848 category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs_buffered_write_iomap_end() has a comment about the safety of punching delalloc extents based holding the IOLOCK_EXCL. This comment is wrong, and punching delalloc extents is not race free.
When we punch out a delalloc extent after a write failure in xfs_buffered_write_iomap_end(), we punch out the page cache with truncate_pagecache_range() before we punch out the delalloc extents. At this point, we only hold the IOLOCK_EXCL, so there is nothing stopping mmap() write faults racing with this cleanup operation, reinstantiating a folio over the range we are about to punch and hence requiring the delalloc extent to be kept.
If this race condition is hit, we can end up with a dirty page in the page cache that has no delalloc extent or space reservation backing it. This leads to bad things happening at writeback time.
To avoid this race condition, we need the page cache truncation to be atomic w.r.t. the extent manipulation. We can do this by holding the mapping->invalidate_lock exclusively across this operation - this will prevent new pages from being inserted into the page cache whilst we are removing the pages and the backing extent and space reservation.
Taking the mapping->invalidate_lock exclusively in the buffered write IO path is safe - it naturally nests inside the IOLOCK (see truncate and fallocate paths). iomap_zero_range() can be called from under the mapping->invalidate_lock (from the truncate path via either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter() will not instantiate new delalloc pages (because it skips holes) and hence will not ever need to punch out delalloc extents on failure.
Fix the locking issue, and clean up the code logic a little to avoid unnecessary work if we didn't allocate the delalloc extent or wrote the entire region we allocated.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/xfs_iomap.c --- fs/xfs/xfs_iomap.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 892a9ea714ab..d96c003a29d3 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1138,6 +1138,10 @@ xfs_buffered_write_iomap_end( written = 0; }
+ /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (!(iomap->flags & IOMAP_F_NEW)) + return 0; + /* * start_fsb refers to the first unused block after a short write. If * nothing was written, round offset down to point at the first block in @@ -1149,27 +1153,28 @@ xfs_buffered_write_iomap_end( start_fsb = XFS_B_TO_FSB(mp, offset + written); end_fsb = XFS_B_TO_FSB(mp, offset + length);
+ /* Nothing to do if we've written the entire delalloc extent */ + if (start_fsb >= end_fsb) + return 0; + /* - * Trim delalloc blocks if they were allocated by this write and we - * didn't manage to write the whole range. - * - * We don't need to care about racing delalloc as we hold i_mutex - * across the reserve/allocate/unreserve calls. If there are delalloc - * blocks in the range, they are ours. + * Lock the mapping to avoid races with page faults re-instantiating + * folios and dirtying them via ->page_mkwrite between the page cache + * truncation and the delalloc extent removal. Failing to do this can + * leave dirty pages with no space reservation in the cache. */ - if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) { - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), - XFS_FSB_TO_B(mp, end_fsb) - 1); - - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - end_fsb - start_fsb); - if (error && !xfs_is_shutdown(mp)) { - xfs_alert(mp, "%s: unable to clean up ino %lld", - __func__, ip->i_ino); - return error; - } + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); + truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), + XFS_FSB_TO_B(mp, end_fsb) - 1); + + error = xfs_bmap_punch_delalloc_range(ip, start_fsb, + end_fsb - start_fsb); + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); + if (error && !xfs_is_shutdown(mp)) { + xfs_alert(mp, "%s: unable to clean up ino %lld", + __func__, ip->i_ino); + return error; } - return 0; }
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.1-rc4 commit b71f889c18ada210a97aa3eb5e00c0de552234c6 category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs_buffered_write_iomap_end() currently converts the byte ranges passed to it to filesystem blocks to pass them to the bmap code to punch out delalloc blocks, but then has to convert filesytem blocks back to byte ranges for page cache truncate.
We're about to make the page cache truncate go away and replace it with a page cache walk, so having to convert everything to/from/to filesystem blocks is messy and error-prone. It is much easier to pass around byte ranges and convert to page indexes and/or filesystem blocks only where those units are needed.
In preparation for the page cache walk being added, add a helper that converts byte ranges to filesystem blocks and calls xfs_bmap_punch_delalloc_range() and convert xfs_buffered_write_iomap_end() to calculate limits in byte ranges.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/xfs_iomap.c --- fs/xfs/xfs_iomap.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index d96c003a29d3..1ddc1a33effd 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1111,6 +1111,20 @@ xfs_buffered_write_iomap_begin( return error; }
+static int +xfs_buffered_write_delalloc_punch( + struct inode *inode, + loff_t start_byte, + loff_t end_byte) +{ + struct xfs_mount *mp = XFS_M(inode->i_sb); + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); + + return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb, + end_fsb - start_fsb); +} + static int xfs_buffered_write_iomap_end( struct inode *inode, @@ -1120,10 +1134,10 @@ xfs_buffered_write_iomap_end( unsigned flags, struct iomap *iomap) { + struct xfs_mount *mp = XFS_M(inode->i_sb); struct xfs_inode *ip = XFS_I(inode); - struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t start_fsb; - xfs_fileoff_t end_fsb; + loff_t start_byte; + loff_t end_byte; int error = 0;
if (iomap->type != IOMAP_DELALLOC) @@ -1148,13 +1162,13 @@ xfs_buffered_write_iomap_end( * the range. */ if (unlikely(!written)) - start_fsb = XFS_B_TO_FSBT(mp, offset); + start_byte = round_down(offset, mp->m_sb.sb_blocksize); else - start_fsb = XFS_B_TO_FSB(mp, offset + written); - end_fsb = XFS_B_TO_FSB(mp, offset + length); + start_byte = round_up(offset + written, mp->m_sb.sb_blocksize); + end_byte = round_up(offset + length, mp->m_sb.sb_blocksize);
/* Nothing to do if we've written the entire delalloc extent */ - if (start_fsb >= end_fsb) + if (start_byte >= end_byte) return 0;
/* @@ -1164,15 +1178,12 @@ xfs_buffered_write_iomap_end( * leave dirty pages with no space reservation in the cache. */ xfs_ilock(ip, XFS_MMAPLOCK_EXCL); - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), - XFS_FSB_TO_B(mp, end_fsb) - 1); - - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - end_fsb - start_fsb); + truncate_pagecache_range(inode, start_byte, end_byte - 1); + error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte); xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); if (error && !xfs_is_shutdown(mp)) { - xfs_alert(mp, "%s: unable to clean up ino %lld", - __func__, ip->i_ino); + xfs_alert(mp, "%s: unable to clean up ino 0x%llx", + __func__, XFS_I(inode)->i_ino); return error; } return 0;
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.1-rc4 commit 9c7babf94a0d686b552e53aded8d4703d1b8b92b category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Because that's what Christoph wants for this error handling path only XFS uses.
It requires a new iomap export for handling errors over delalloc ranges. This is basically the XFS code as is stands, but even though Christoph wants this as iomap funcitonality, we still have to call it from the filesystem specific ->iomap_end callback, and call into the iomap code with yet another filesystem specific callback to punch the delalloc extent within the defined ranges.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/xfs_iomap.c fs/iomap/buffered-io.c include/linux/iomap.h --- fs/xfs/xfs_iomap.c | 83 +++++++++++++++++++++++++++++++--------------- 1 file changed, 57 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 1ddc1a33effd..f4203e9d7635 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1114,58 +1114,62 @@ xfs_buffered_write_iomap_begin( static int xfs_buffered_write_delalloc_punch( struct inode *inode, - loff_t start_byte, - loff_t end_byte) + loff_t offset, + loff_t length) { struct xfs_mount *mp = XFS_M(inode->i_sb); - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); - xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, offset); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length);
return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb, end_fsb - start_fsb); }
+/* + * When a short write occurs, the filesystem may need to remove reserved space + * that was allocated in ->iomap_begin from it's ->iomap_end method. For + * filesystems that use delayed allocation, we need to punch out delalloc + * extents from the range that are not dirty in the page cache. As the write can + * race with page faults, there can be dirty pages over the delalloc extent + * outside the range of a short write but still within the delalloc extent + * allocated for this iomap. + * + * This function uses [start_byte, end_byte) intervals (i.e. open ended) to + * simplify range iterations, but converts them back to {offset,len} tuples for + * the punch callback. + */ static int -xfs_buffered_write_iomap_end( - struct inode *inode, - loff_t offset, - loff_t length, - ssize_t written, - unsigned flags, - struct iomap *iomap) +xfs_iomap_file_buffered_write_punch_delalloc( + struct inode *inode, + struct iomap *iomap, + loff_t pos, + loff_t length, + ssize_t written, + int (*punch)(struct inode *inode, loff_t pos, loff_t length)) { - struct xfs_mount *mp = XFS_M(inode->i_sb); struct xfs_inode *ip = XFS_I(inode); loff_t start_byte; loff_t end_byte; + int blocksize = i_blocksize(inode); int error = 0;
if (iomap->type != IOMAP_DELALLOC) return 0;
- /* - * Behave as if the write failed if drop writes is enabled. Set the NEW - * flag to force delalloc cleanup. - */ - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) { - iomap->flags |= IOMAP_F_NEW; - written = 0; - } - /* If we didn't reserve the blocks, we're not allowed to punch them. */ if (!(iomap->flags & IOMAP_F_NEW)) return 0;
/* - * start_fsb refers to the first unused block after a short write. If + * start_byte refers to the first unused block after a short write. If * nothing was written, round offset down to point at the first block in * the range. */ if (unlikely(!written)) - start_byte = round_down(offset, mp->m_sb.sb_blocksize); + start_byte = round_down(pos, blocksize); else - start_byte = round_up(offset + written, mp->m_sb.sb_blocksize); - end_byte = round_up(offset + length, mp->m_sb.sb_blocksize); + start_byte = round_up(pos + written, blocksize); + end_byte = round_up(pos + length, blocksize);
/* Nothing to do if we've written the entire delalloc extent */ if (start_byte >= end_byte) @@ -1179,8 +1183,35 @@ xfs_buffered_write_iomap_end( */ xfs_ilock(ip, XFS_MMAPLOCK_EXCL); truncate_pagecache_range(inode, start_byte, end_byte - 1); - error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte); + error = punch(inode, start_byte, end_byte - start_byte); xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); + + return error; +} + +static int +xfs_buffered_write_iomap_end( + struct inode *inode, + loff_t offset, + loff_t length, + ssize_t written, + unsigned flags, + struct iomap *iomap) +{ + struct xfs_mount *mp = XFS_M(inode->i_sb); + int error; + + /* + * Behave as if the write failed if drop writes is enabled. Set the NEW + * flag to force delalloc cleanup. + */ + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) { + iomap->flags |= IOMAP_F_NEW; + written = 0; + } + + error = xfs_iomap_file_buffered_write_punch_delalloc(inode, iomap, offset, + length, written, &xfs_buffered_write_delalloc_punch); if (error && !xfs_is_shutdown(mp)) { xfs_alert(mp, "%s: unable to clean up ino 0x%llx", __func__, XFS_I(inode)->i_ino);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.1-rc4 commit f43dc4dc3eff028b5ddddd99f3a66c5a6bdd4e78 category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
iomap_file_buffered_write_punch_delalloc() currently invalidates the page cache over the unused range of the delalloc extent that was allocated. While the write allocated the delalloc extent, it does not own it exclusively as the write does not hold any locks that prevent either writeback or mmap page faults from changing the state of either the page cache or the extent state backing this range.
Whilst xfs_bmap_punch_delalloc_range() already handles races in extent conversion - it will only punch out delalloc extents and it ignores any other type of extent - the page cache truncate does not discriminate between data written by this write or some other task. As a result, truncating the page cache can result in data corruption if the write races with mmap modifications to the file over the same range.
generic/346 exercises this workload, and if we randomly fail writes (as will happen when iomap gets stale iomap detection later in the patchset), it will randomly corrupt the file data because it removes data written by mmap() in the same page as the write() that failed.
Hence we do not want to punch out the page cache over the range of the extent we failed to write to - what we actually need to do is detect the ranges that have dirty data in cache over them and *not punch them out*.
To do this, we have to walk the page cache over the range of the delalloc extent we want to remove. This is made complex by the fact we have to handle partially up-to-date folios correctly and this can happen even when the FSB size == PAGE_SIZE because we now support multi-page folios in the page cache.
Because we are only interested in discovering the edges of data ranges in the page cache (i.e. hole-data boundaries) we can make use of mapping_seek_hole_data() to find those transitions in the page cache. As we hold the invalidate_lock, we know that the boundaries are not going to change while we walk the range. This interface is also byte-based and is sub-page block aware, so we can find the data ranges in the cache based on byte offsets rather than page, folio or fs block sized chunks. This greatly simplifies the logic of finding dirty cached ranges in the page cache.
Once we've identified a range that contains cached data, we can then iterate the range folio by folio. This allows us to determine if the data is dirty and hence perform the correct delalloc extent punching operations. The seek interface we use to iterate data ranges will give us sub-folio start/end granularity, so we may end up looking up the same folio multiple times as the seek interface iterates across each discontiguous data region in the folio.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com
Conflicts: fs/xfs/xfs_iomap.c fs/iomap/buffered-io.c mm/filemap.c --- fs/xfs/xfs_iomap.c | 205 +++++++++++++++++++++++++++++++++++++++++---- mm/filemap.c | 1 + 2 files changed, 190 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index f4203e9d7635..25b968e85774 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1125,6 +1125,174 @@ xfs_buffered_write_delalloc_punch( end_fsb - start_fsb); }
+/* + * Scan the data range passed to us for dirty page cache folios. If we find a + * dirty folio, punch out the preceeding range and update the offset from which + * the next punch will start from. + * + * We can punch out storage reservations under clean pages because they either + * contain data that has been written back - in which case the delalloc punch + * over that range is a no-op - or they have been read faults in which case they + * contain zeroes and we can remove the delalloc backing range and any new + * writes to those pages will do the normal hole filling operation... + * + * This makes the logic simple: we only need to keep the delalloc extents only + * over the dirty ranges of the page cache. + * + * This function uses [start_byte, end_byte) intervals (i.e. open ended) to + * simplify range iterations. + */ +static int +xfs_iomap_write_delalloc_scan( + struct inode *inode, + loff_t *punch_start_byte, + loff_t start_byte, + loff_t end_byte, + int (*punch)(struct inode *inode, loff_t offset, loff_t length)) +{ + while (start_byte < end_byte) { + struct page *page; + + /* grab locked page */ + page = find_lock_page(inode->i_mapping, + start_byte >> PAGE_SHIFT); + + if (!page) { + start_byte = ALIGN_DOWN(start_byte, PAGE_SIZE) + + PAGE_SIZE; + continue; + } + + /* if dirty, punch up to offset */ + if (PageDirty(page)) { + if (start_byte > *punch_start_byte) { + int error; + + error = punch(inode, *punch_start_byte, + start_byte - *punch_start_byte); + if (error) { + unlock_page(page); + put_page(page); + return error; + } + } + + /* + * Make sure the next punch start is correctly bound to + * the end of this data range, not the end of the folio. + */ + *punch_start_byte = min_t(loff_t, end_byte, + (page->index + 1) << PAGE_SHIFT); + } + + /* move offset to start of next folio in range */ + start_byte = (page->index + 1) << PAGE_SHIFT; + unlock_page(page); + put_page(page); + } + return 0; +} + +/* + * Punch out all the delalloc blocks in the range given except for those that + * have dirty data still pending in the page cache - those are going to be + * written and so must still retain the delalloc backing for writeback. + * + * As we are scanning the page cache for data, we don't need to reimplement the + * wheel - mapping_seek_hole_data() does exactly what we need to identify the + * start and end of data ranges correctly even for sub-folio block sizes. This + * byte range based iteration is especially convenient because it means we + * don't have to care about variable size folios, nor where the start or end of + * the data range lies within a folio, if they lie within the same folio or even + * if there are multiple discontiguous data ranges within the folio. + * + * It should be noted that mapping_seek_hole_data() is not aware of EOF, and so + * can return data ranges that exist in the cache beyond EOF. e.g. a page fault + * spanning EOF will initialise the post-EOF data to zeroes and mark it up to + * date. A write page fault can then mark it dirty. If we then fail a write() + * beyond EOF into that up to date cached range, we allocate a delalloc block + * beyond EOF and then have to punch it out. Because the range is up to date, + * mapping_seek_hole_data() will return it, and we will skip the punch because + * the folio is dirty. THis is incorrect - we always need to punch out delalloc + * beyond EOF in this case as writeback will never write back and covert that + * delalloc block beyond EOF. Hence we limit the cached data scan range to EOF, + * resulting in always punching out the range from the EOF to the end of the + * range the iomap spans. + * + * Intervals are of the form [start_byte, end_byte) (i.e. open ended) because it + * matches the intervals returned by mapping_seek_hole_data(). i.e. SEEK_DATA + * returns the start of a data range (start_byte), and SEEK_HOLE(start_byte) + * returns the end of the data range (data_end). Using closed intervals would + * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose + * the code to subtle off-by-one bugs.... + */ +static int +xfs_iomap_write_delalloc_release( + struct inode *inode, + loff_t start_byte, + loff_t end_byte, + int (*punch)(struct inode *inode, loff_t pos, loff_t length)) +{ + struct xfs_inode *ip = XFS_I(inode); + loff_t punch_start_byte = start_byte; + loff_t scan_end_byte = min(i_size_read(inode), end_byte); + int error = 0; + + /* + * Lock the mapping to avoid races with page faults re-instantiating + * folios and dirtying them via ->page_mkwrite whilst we walk the + * cache and perform delalloc extent removal. Failing to do this can + * leave dirty pages with no space reservation in the cache. + */ + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); + while (start_byte < scan_end_byte) { + loff_t data_end; + + start_byte = mapping_seek_hole_data(inode->i_mapping, + start_byte, scan_end_byte, SEEK_DATA); + /* + * If there is no more data to scan, all that is left is to + * punch out the remaining range. + */ + if (start_byte == -ENXIO || start_byte == scan_end_byte) + break; + if (start_byte < 0) { + error = start_byte; + goto out_unlock; + } + WARN_ON_ONCE(start_byte < punch_start_byte); + WARN_ON_ONCE(start_byte > scan_end_byte); + + /* + * We find the end of this contiguous cached data range by + * seeking from start_byte to the beginning of the next hole. + */ + data_end = mapping_seek_hole_data(inode->i_mapping, start_byte, + scan_end_byte, SEEK_HOLE); + if (data_end < 0) { + error = data_end; + goto out_unlock; + } + WARN_ON_ONCE(data_end <= start_byte); + WARN_ON_ONCE(data_end > scan_end_byte); + + error = xfs_iomap_write_delalloc_scan(inode, &punch_start_byte, + start_byte, data_end, punch); + if (error) + goto out_unlock; + + /* The next data search starts at the end of this one. */ + start_byte = data_end; + } + + if (punch_start_byte < end_byte) + error = punch(inode, punch_start_byte, + end_byte - punch_start_byte); +out_unlock: + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); + return error; +} + /* * When a short write occurs, the filesystem may need to remove reserved space * that was allocated in ->iomap_begin from it's ->iomap_end method. For @@ -1135,8 +1303,25 @@ xfs_buffered_write_delalloc_punch( * allocated for this iomap. * * This function uses [start_byte, end_byte) intervals (i.e. open ended) to - * simplify range iterations, but converts them back to {offset,len} tuples for - * the punch callback. + * simplify range iterations. + * + * The punch() callback *must* only punch delalloc extents in the range passed + * to it. It must skip over all other types of extents in the range and leave + * them completely unchanged. It must do this punch atomically with respect to + * other extent modifications. + * + * The punch() callback may be called with a folio locked to prevent writeback + * extent allocation racing at the edge of the range we are currently punching. + * The locked folio may or may not cover the range being punched, so it is not + * safe for the punch() callback to lock folios itself. + * + * Lock order is: + * + * inode->i_rwsem (shared or exclusive) + * inode->i_mapping->invalidate_lock (exclusive) + * folio_lock() + * ->punch + * internal filesystem allocation lock */ static int xfs_iomap_file_buffered_write_punch_delalloc( @@ -1147,11 +1332,9 @@ xfs_iomap_file_buffered_write_punch_delalloc( ssize_t written, int (*punch)(struct inode *inode, loff_t pos, loff_t length)) { - struct xfs_inode *ip = XFS_I(inode); loff_t start_byte; loff_t end_byte; int blocksize = i_blocksize(inode); - int error = 0;
if (iomap->type != IOMAP_DELALLOC) return 0; @@ -1175,18 +1358,8 @@ xfs_iomap_file_buffered_write_punch_delalloc( if (start_byte >= end_byte) return 0;
- /* - * Lock the mapping to avoid races with page faults re-instantiating - * folios and dirtying them via ->page_mkwrite between the page cache - * truncation and the delalloc extent removal. Failing to do this can - * leave dirty pages with no space reservation in the cache. - */ - xfs_ilock(ip, XFS_MMAPLOCK_EXCL); - truncate_pagecache_range(inode, start_byte, end_byte - 1); - error = punch(inode, start_byte, end_byte - start_byte); - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); - - return error; + return xfs_iomap_write_delalloc_release(inode, start_byte, end_byte, + punch); }
static int diff --git a/mm/filemap.c b/mm/filemap.c index dbd5aa13fc2d..fd4aae06ff15 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2855,6 +2855,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, return end; return start; } +EXPORT_SYMBOL(mapping_seek_hole_data);
#ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100)
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.1-rc4 commit 7348b322332d8602a4133f0b861334ea021b134a category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
All the callers of xfs_bmap_punch_delalloc_range() jump through hoops to convert a byte range to filesystem blocks before calling xfs_bmap_punch_delalloc_range(). Instead, pass the byte range to xfs_bmap_punch_delalloc_range() and have it do the conversion to filesystem blocks internally.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org
Conflicts: fs/xfs/xfs_aops.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_aops.c | 15 +++++---------- fs/xfs/xfs_bmap_util.c | 10 ++++++---- fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_iomap.c | 8 ++------ 4 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index fe8c19814f1d..aa2e166a4fdc 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -150,9 +150,8 @@ xfs_end_ioend( if (unlikely(error)) { if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); - xfs_bmap_punch_delalloc_range(ip, - XFS_B_TO_FSBT(mp, offset), - XFS_B_TO_FSB(mp, size)); + xfs_bmap_punch_delalloc_range(ip, offset, + offset + size); } goto done; } @@ -516,12 +515,8 @@ xfs_discard_page( struct page *page, loff_t fileoff) { - struct inode *inode = page->mapping->host; - struct xfs_inode *ip = XFS_I(inode); + struct xfs_inode *ip = XFS_I(page->mapping->host); struct xfs_mount *mp = ip->i_mount; - unsigned int pageoff = offset_in_page(fileoff); - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, fileoff); - xfs_fileoff_t pageoff_fsb = XFS_B_TO_FSBT(mp, pageoff); int error;
if (xfs_is_shutdown(mp)) @@ -531,8 +526,8 @@ xfs_discard_page( "page discard on page "PTR_FMT", inode 0x%llx, offset %llu.", page, ip->i_ino, fileoff);
- error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - i_blocks_per_page(inode, page) - pageoff_fsb); + error = xfs_bmap_punch_delalloc_range(ip, fileoff, + round_up(fileoff, thp_size(page))); if (error && !xfs_is_shutdown(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); } diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a44416597880..edf62092125c 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -551,11 +551,13 @@ xfs_getbmap( int xfs_bmap_punch_delalloc_range( struct xfs_inode *ip, - xfs_fileoff_t start_fsb, - xfs_fileoff_t length) + xfs_off_t start_byte, + xfs_off_t end_byte) { + struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = &ip->i_df; - xfs_fileoff_t end_fsb = start_fsb + length; + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; int error = 0; @@ -568,7 +570,7 @@ xfs_bmap_punch_delalloc_range(
while (got.br_startoff + got.br_blockcount > start_fsb) { del = got; - xfs_trim_extent(&del, start_fsb, length); + xfs_trim_extent(&del, start_fsb, end_fsb - start_fsb);
/* * A delete can push the cursor forward. Step back to the diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 9f993168b55b..ca9000d38b90 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -31,7 +31,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap) #endif /* CONFIG_XFS_RT */
int xfs_bmap_punch_delalloc_range(struct xfs_inode *ip, - xfs_fileoff_t start_fsb, xfs_fileoff_t length); + xfs_off_t start_byte, xfs_off_t end_byte);
struct kgetbmap { __s64 bmv_offset; /* file offset of segment in blocks */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 25b968e85774..ae96ac52c974 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1117,12 +1117,8 @@ xfs_buffered_write_delalloc_punch( loff_t offset, loff_t length) { - struct xfs_mount *mp = XFS_M(inode->i_sb); - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, offset); - xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length); - - return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb, - end_fsb - start_fsb); + return xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, + offset + length); }
/*
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v6.3-rc1 commit 8ac5b996bf5199f15b7687ceae989f8b2a410dda category: bugfix bugzilla: 189079, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The recent writeback corruption fixes changed the code in xfs_discard_folio() to calculate a byte range to for punching delalloc extents. A mistake was made in using round_up(pos) for the end offset, because when pos points at the first byte of a block, it does not get rounded up to point to the end byte of the block. hence the punch range is short, and this leads to unexpected behaviour in certain cases in xfs_bmap_punch_delalloc_range.
e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so there is no previous extent and it rounds up the punch to the end of the delalloc extent it found at offset 0, not the end of the range given to xfs_bmap_punch_delalloc_range().
Fix this by handling the zero block offset case correctly.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030 Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/ Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range") Reported-by: Pengfei Xu pengfei.xu@intel.com Found-by: Brian Foster bfoster@redhat.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org
Conflicts: fs/xfs/xfs_aops.c
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_aops.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index aa2e166a4fdc..ef7db9defef1 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -500,15 +500,17 @@ xfs_prepare_ioend( }
/* - * If the page has delalloc blocks on it, we need to punch them out before we - * invalidate the page. If we don't, we leave a stale delalloc mapping on the - * inode that can trip up a later direct I/O read operation on the same region. + * If the folio has delalloc blocks on it, the caller is asking us to punch them + * out. If we don't, we can leave a stale delalloc mapping covered by a clean + * page that needs to be dirtied again before the delalloc mapping can be + * converted. This stale delalloc mapping can trip up a later direct I/O read + * operation on the same region. * * We prevent this by truncating away the delalloc regions on the page. Because * they are delalloc, we can do this without needing a transaction. Indeed - if * we get ENOSPC errors, we have to be able to do this truncation without a - * transaction as there is no space left for block reservation (typically why we - * see a ENOSPC in writeback). + * transaction as there is no space left for block reservation (typically why + * we see a ENOSPC in writeback). */ static void xfs_discard_page( @@ -526,8 +528,13 @@ xfs_discard_page( "page discard on page "PTR_FMT", inode 0x%llx, offset %llu.", page, ip->i_ino, fileoff);
+ /* + * The end of the punch range is always the offset of the the first + * byte of the next page. Hence the end offset is only dependent on the + * page itself and not the start offset that is passed in. + */ error = xfs_bmap_punch_delalloc_range(ip, fileoff, - round_up(fileoff, thp_size(page))); + page_offset(page) + thp_size(page)); if (error && !xfs_is_shutdown(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); }
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 189240, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
Once we free exact one inode twice in xfs_difree_inobt, this lead to that ir_freecount does not match ir_free(ir_freecount will add twice, but ir_free will change only once), and the latter xfs_inobt_get_rec will bark for the mismatch of the ir_freecount and ir_free. Once we call xfs_inobt_get_rec when we process AGI unlinked lists, this will fail xfs mount.
We has not found the root cause why we free exact one inode twice, but we really should reject this for the purpose to not spread mistakes.
Signed-off-by: yangerkun yangerkun@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_ialloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 0c2b7a6601a0..a1319f9c8ed8 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1989,7 +1989,11 @@ xfs_difree_inobt( */ off = agino - rec.ir_startino; ASSERT(off >= 0 && off < XFS_INODES_PER_CHUNK); - ASSERT(!(rec.ir_free & XFS_INOBT_MASK(off))); + + if (XFS_IS_CORRUPT(mp, rec.ir_free & XFS_INOBT_MASK(off))) { + error = -EFSCORRUPTED; + goto error0; + } /* * Mark the inode free & increment the count. */
hulk inclusion category: bugfix bugzilla: 189076, https://gitee.com/openeuler/kernel/issues/I76JSK CVE: NA
--------------------------------
While performing the io fault injection test, I caught the following data corruption report:
XFS (dm-6): Internal error ltbno + ltlen > bno at line 1976 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x40b/0x930 [xfs] CPU: 7 PID: 184267 Comm: kworker/7:1 Kdump: loaded Tainted: G O 5.10.0-136.12.0.86.h1179.eulerosv2r12.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20211121_093514-szxrtosci10000 04/01/2014 Workqueue: xfs-inodegc/dm-6 xfs_inodegc_worker [xfs] Call Trace: dump_stack+0x57/0x6e xfs_corruption_error+0x81/0x90 [xfs] xfs_free_ag_extent+0x43c/0x930 [xfs] xfs_free_agfl_block+0x3b/0xd0 [xfs] xfs_agfl_free_finish_item+0x14c/0x160 [xfs] xfs_defer_finish_one+0xd5/0x220 [xfs] xfs_defer_finish_noroll+0xb5/0x210 [xfs] xfs_defer_finish+0x11/0x70 [xfs] xfs_itruncate_extents_flags+0xc1/0x240 [xfs] xfs_inactive_truncate+0xab/0xe0 [xfs] xfs_inactive+0x154/0x170 [xfs] xfs_inodegc_inactivate+0x16/0x50 [xfs] xfs_inodegc_worker+0xa0/0x110 [xfs] process_one_work+0x1b5/0x350 worker_thread+0x49/0x310 kthread+0xfe/0x140 ret_from_fork+0x22/0x30 XFS (dm-6): Corruption detected. Unmount and run xfs_repair
After analyzing the disk image, I found that the cause of the problem was that the transactions were not replayed.
The problem arises in that the iclog buffer IO completion updates the l_last_sync_lsn with it's own LSN. Transactions can be large enough to span many iclogs, only commit iclog goes to update l_last_sync_lsn. Since the last_sync_lsn update and the insertion of the item into the ail list releases the l_icloglock, if the ail is is empty in the meantime, the new iclog gets the last_sync_lsn as tail lsn. If the new iclog is written to disk and a shutdown occurs, the current iclog will not be able to replay in the next mount.
xlog_state_done_syncing xlog_state_do_callback spin_lock(&log->l_icloglock); xlog_state_do_iclog_callbacks xlog_state_iodone_process_iclog xlog_state_set_callback last_sync_lsn = iclog->ic_header.h_lsn spin_unlock(&log->l_icloglock); ====>AIL is empty and get tail lsn for new iclog xlog_cil_process_committed xlog_cil_committed xfs_trans_committed_bulk(ctx->start_lsn) xfs_log_item_batch_insert(commit_lsn) xlog_state_clean_iclog(log, iclog) spin_lock(&log->l_icloglock); spin_unlock(&log->l_icloglock);
Fix is simple, updates the l_last_sync_lsn with it's first ctx start lsn when commit iclog buffer IO completion. Even if the above happens, the iclog will be replayed as well.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_log.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 059f0c61f94c..9a363f787ba4 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2684,19 +2684,22 @@ xlog_get_lowest_lsn( static void xlog_state_set_callback( struct xlog *log, - struct xlog_in_core *iclog, - xfs_lsn_t header_lsn) + struct xlog_in_core *iclog) { + struct xfs_cil_ctx *ctx; + trace_xlog_iclog_callback(iclog, _RET_IP_); iclog->ic_state = XLOG_STATE_CALLBACK;
- ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - header_lsn) <= 0); - - if (list_empty_careful(&iclog->ic_callbacks)) + ctx = list_first_entry_or_null(&iclog->ic_callbacks, + struct xfs_cil_ctx, iclog_entry); + if (!ctx) return;
- atomic64_set(&log->l_last_sync_lsn, header_lsn); + ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), + ctx->start_lsn) <= 0); + + atomic64_set(&log->l_last_sync_lsn, ctx->start_lsn); xlog_grant_push_ail(log, 0); }
@@ -2731,7 +2734,7 @@ xlog_state_iodone_process_iclog( lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) return false; - xlog_state_set_callback(log, iclog, header_lsn); + xlog_state_set_callback(log, iclog); return false; default: /*
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/2340 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/G...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/2340 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/G...