This patch set fix two attr inactive problems:
Guo Xuenan (1): xfs: force shutdown xfs when xfs_attr_inactive fails
Zhang Yi (2): xfs: factor out __xfs_da3_node_read() xfs: atomic drop extent entries when inactiving attr
fs/xfs/libxfs/xfs_da_btree.c | 5 +-- fs/xfs/libxfs/xfs_da_btree.h | 15 ++++++-- fs/xfs/xfs_attr_inactive.c | 66 ++++++++++++++++++++++++++---------- 3 files changed, 65 insertions(+), 21 deletions(-)
From: Guo Xuenan guoxuenan@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8RKY8 CVE: NA
--------------------------------
I observed the following evidence of a leak while xfs_inactive failed. Especially in Debug mode, when xfs_attr_inactive failed, current exception path handling rudely clear inode attr fork, and if the inode is recycled then assertion will accur, if not, which may also lead to memory leak.
Since xfs_attr_inactive is supposed to clean up the attribute fork when the inode is being freed. While it removes the in-memory attribute fork even in the event of truncate attribute fork extents failure, then some attr data may left in memory and never be released. By avoiding blukstat ioctl concurrent access this inode, force shutdown xfs when this situation occurs.
The following script reliably replays the bug described above. ``` DISK=vdb MP=/mnt/$DISK DEV=/dev/$DISK nfiles=10 xattr_val="this is xattr value." while true do pidof fsstress | xargs kill -9 umount $MP df | grep $MP || break sleep 2 done
mkdir -p ${MP} && mkfs.xfs -f $DEV && mount $DEV $MP echo 0 > /sys/fs/xfs/$DISK/errortag/bmapifmt
cd $MP; touch $(seq 1 $nfiles); cd $OLDPWD for n in `seq 1 $nfiles`; do for j in `seq 1 20`; do setfattr -n user.${j} -v "$xattr_val" $MP/$n done done fsstress -d $MP -z -f bulkstat=200 -S c -l 1000 -p 8 & /usr/bin/rm $MP/* echo 3 > /sys/fs/xfs/$DISK/errortag/bmapifmt ```
Assertion in the kernel log as follows:
XFS (vdb): Mounting V5 Filesystem bd1b6c38-599a-43b3-8194-a584bebec4ca XFS (vdb): Ending clean mount xfs filesystem being mounted at /mnt/vdb supports timestamps until 2038 (0x7fffffff) XFS (vdb): Injecting error (false) at file fs/xfs/libxfs/xfs_bmap.c, line 3887, on filesystem "vdb" XFS: Assertion failed: ip->i_nblocks == 0, file: fs/xfs/xfs_inode.c, line: 2277 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 74 Comm: kworker/3:1 Not tainted 6.3.0-rc6-00127-g71deb8a5658c #569 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Workqueue: xfs-inodegc/vdb xfs_inodegc_worker RIP: 0010:assfail+0x8c/0x90 Code: 80 3d 37 27 3b 0a 00 75 1c e8 a0 b0 20 ff 0f 0b 5b 5d 41 5c 41 5d c3 48 c7 c7 30 25 64 8c e8 fb d8 66 ff eb db e8 84 b0 20 ff <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 fd 53 48 63 de e8 6e b0 20 ff RSP: 0018:ffff888101b17b20 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffffff8444eea0 RCX: 0000000000000000 RDX: ffff888101b08040 RSI: ffffffff8228fe1c RDI: ffffffff844510c0 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888101b177ff R10: ffffed1020362eff R11: 0000000000000001 R12: ffffffff8444f720 R13: 00000000000008e5 R14: ffff888155279800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8883edd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001612e18 CR3: 000000017bab5005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> xfs_ifree+0xea6/0x1310 xfs_inactive_ifree.isra.0+0x1ab/0x460 xfs_inactive+0x41f/0x710 xfs_inodegc_worker+0x22e/0x500 process_one_work+0x6d1/0xfe0 worker_thread+0x5b9/0xf60 kthread+0x287/0x330 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:assfail+0x8c/0x90 Code: 80 3d 37 27 3b 0a 00 75 1c e8 a0 b0 20 ff 0f 0b 5b 5d 41 5c 41 5d c3 48 c7 c7 30 25 64 8c e8 fb d8 66 ff eb db e8 84 b0 20 ff <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 fd 53 48 63 de e8 6e b0 20 ff RSP: 0018:ffff888101b17b20 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffffff8444eea0 RCX: 0000000000000000 RDX: ffff888101b08040 RSI: ffffffff8228fe1c RDI: ffffffff844510c0 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888101b177ff R10: ffffed1020362eff R11: 0000000000000001 R12: ffffffff8444f720 R13: 00000000000008e5 R14: ffff888155279800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8883edd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001612e18 CR3: 000000017bab5005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554
Fixes: 6dfe5a049f2d ("xfs: xfs_attr_inactive leaves inconsistent attr fork state behind") Signed-off-by: Guo Xuenan guoxuenan@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_attr_inactive.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 89c7a9f4f930..f03fa219e1cb 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -366,7 +366,7 @@ xfs_attr_inactive( if (dp->i_af.if_nextents > 0) { error = xfs_attr3_root_inactive(&trans, dp); if (error) - goto out_cancel; + goto out_shutdown;
error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0); if (error) @@ -380,6 +380,8 @@ xfs_attr_inactive( xfs_iunlock(dp, lock_mode); return error;
+out_shutdown: + xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR); out_cancel: xfs_trans_cancel(trans); out_destroy_fork:
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8RKU4 CVE: NA
--------------------------------
Factor out a wrapper __xfs_da3_node_read() from xfs_da3_node_read() which could pass flags parameter.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_da_btree.c | 5 +++-- fs/xfs/libxfs/xfs_da_btree.h | 15 +++++++++++++-- 2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index e576560b46e9..2e4e5aa22403 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -378,16 +378,17 @@ xfs_da3_node_set_type( }
int -xfs_da3_node_read( +__xfs_da3_node_read( struct xfs_trans *tp, struct xfs_inode *dp, xfs_dablk_t bno, + unsigned int flags, struct xfs_buf **bpp, int whichfork) { int error;
- error = xfs_da_read_buf(tp, dp, bno, 0, bpp, whichfork, + error = xfs_da_read_buf(tp, dp, bno, flags, bpp, whichfork, &xfs_da3_node_buf_ops); if (error || !*bpp || !tp) return error; diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h index ffa3df5b2893..6d1da641f4f0 100644 --- a/fs/xfs/libxfs/xfs_da_btree.h +++ b/fs/xfs/libxfs/xfs_da_btree.h @@ -194,11 +194,22 @@ int xfs_da3_path_shift(xfs_da_state_t *state, xfs_da_state_path_t *path, */ int xfs_da3_blk_link(xfs_da_state_t *state, xfs_da_state_blk_t *old_blk, xfs_da_state_blk_t *new_blk); -int xfs_da3_node_read(struct xfs_trans *tp, struct xfs_inode *dp, - xfs_dablk_t bno, struct xfs_buf **bpp, int whichfork); +int __xfs_da3_node_read(struct xfs_trans *tp, struct xfs_inode *dp, + xfs_dablk_t bno, unsigned int flags, + struct xfs_buf **bpp, int whichfork); int xfs_da3_node_read_mapped(struct xfs_trans *tp, struct xfs_inode *dp, xfs_daddr_t mappedbno, struct xfs_buf **bpp, int whichfork); +static inline int +xfs_da3_node_read( + struct xfs_trans *tp, + struct xfs_inode *dp, + xfs_dablk_t bno, + struct xfs_buf **bpp, + int whichfork) +{ + return __xfs_da3_node_read(tp, dp, bno, 0, bpp, whichfork); +}
/* * Utility routines.
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8RKU4 CVE: NA
--------------------------------
When inactiving an unlinked inode and it's attrs, if xlog is shutdown either during or just after the process of recurse deleting attribute nodes/leafs in xfs_attr3_root_inactive(), the log will records some buffer cancel items, but doesn't contain the corresponding extent entries and inode updates, this is incomplete and inconsistent. Because of the inactiving process is not completed and the unlinked inode is still in the agi_unlinked table, it will continue to be inactived after replaying the log on the next mount, the attr node/leaf blocks' created record before the cancel items could not be replayed but the inode does. So we could get corrupted data when reading the canceled blocks.
XFS (pmem0): Metadata corruption detected at xfs_da3_node_read_verify+0x53/0x220, xfs_da3_node block 0x78 XFS (pmem0): Unmount and run xfs_repair XFS (pmem0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (pmem0): metadata I/O error in "xfs_da_read_buf+0x104/0x190" at daddr 0x78 len 8 error 117
In order to fix the issue, we need to remove the extent entries, update inode and attr btree atomically when staling attr node/leaf blocks. And note that we may also need to log and update the parent attr node entry when removing child or leaf attr block. Fortunately, it doesn't have to be so complicated, we could leave the removed entres as holes and skip them if we need to do re-inactiving, the whole node tree will be removed completely in the end.
Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_attr_inactive.c | 62 ++++++++++++++++++++++++++++---------- 1 file changed, 46 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index f03fa219e1cb..9de68b6a148c 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -23,6 +23,7 @@ #include "xfs_quota.h" #include "xfs_dir2.h" #include "xfs_error.h" +#include "xfs_defer.h"
/* * Invalidate any incore buffers associated with this remote attribute value @@ -139,7 +140,8 @@ xfs_attr3_node_inactive( xfs_daddr_t parent_blkno, child_blkno; struct xfs_buf *child_bp; struct xfs_da3_icnode_hdr ichdr; - int error, i; + int error, i, done; + xfs_filblks_t count = mp->m_attr_geo->fsbcount;
/* * Since this code is recursive (gasp!) we must protect ourselves. @@ -172,10 +174,13 @@ xfs_attr3_node_inactive( * traversal of the tree so we may deal with many blocks * before we come back to this one. */ - error = xfs_da3_node_read(*trans, dp, child_fsb, &child_bp, - XFS_ATTR_FORK); + error = __xfs_da3_node_read(*trans, dp, child_fsb, + XFS_DABUF_MAP_HOLE_OK, &child_bp, + XFS_ATTR_FORK); if (error) return error; + if (!child_bp) + goto next_entry;
/* save for re-read later */ child_blkno = xfs_buf_daddr(child_bp); @@ -207,14 +212,32 @@ xfs_attr3_node_inactive( * Remove the subsidiary block from the cache and from the log. */ error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, - child_blkno, - XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, - &child_bp); + child_blkno, XFS_FSB_TO_BB(mp, count), + 0, &child_bp); if (error) return error; + + error = xfs_bunmapi(*trans, dp, child_fsb, count, + XFS_BMAPI_ATTRFORK, 0, &done); + if (error) { + xfs_trans_brelse(*trans, child_bp); + return error; + } xfs_trans_binval(*trans, child_bp); + + error = xfs_defer_finish(trans); + if (error) + return error; child_bp = NULL;
+ /* + * Atomically commit the whole invalidate stuff. + */ + error = xfs_trans_roll_inode(trans, dp); + if (error) + return error; + +next_entry: /* * If we're not done, re-read the parent to get the next * child block number. @@ -232,12 +255,6 @@ xfs_attr3_node_inactive( xfs_trans_brelse(*trans, bp); bp = NULL; } - /* - * Atomically commit the whole invalidate stuff. - */ - error = xfs_trans_roll_inode(trans, dp); - if (error) - return error; }
return 0; @@ -258,7 +275,8 @@ xfs_attr3_root_inactive( struct xfs_da_blkinfo *info; struct xfs_buf *bp; xfs_daddr_t blkno; - int error; + xfs_filblks_t count = mp->m_attr_geo->fsbcount; + int error, done;
/* * Read block 0 to see what we have to work with. @@ -266,8 +284,9 @@ xfs_attr3_root_inactive( * the extents in reverse order the extent containing * block 0 must still be there. */ - error = xfs_da3_node_read(*trans, dp, 0, &bp, XFS_ATTR_FORK); - if (error) + error = __xfs_da3_node_read(*trans, dp, 0, XFS_DABUF_MAP_HOLE_OK, + &bp, XFS_ATTR_FORK); + if (error || !bp) return error; blkno = xfs_buf_daddr(bp);
@@ -298,7 +317,7 @@ xfs_attr3_root_inactive( * Invalidate the incore copy of the root block. */ error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, blkno, - XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, &bp); + XFS_FSB_TO_BB(mp, count), 0, &bp); if (error) return error; error = bp->b_error; @@ -306,7 +325,17 @@ xfs_attr3_root_inactive( xfs_trans_brelse(*trans, bp); return error; } + + error = xfs_bunmapi(*trans, dp, 0, count, XFS_BMAPI_ATTRFORK, 0, &done); + if (error) { + xfs_trans_brelse(*trans, bp); + return error; + } xfs_trans_binval(*trans, bp); /* remove from cache */ + + error = xfs_defer_finish(trans); + if (error) + return error; /* * Commit the invalidate and start the next transaction. */ @@ -368,6 +397,7 @@ xfs_attr_inactive( if (error) goto out_shutdown;
+ /* Remove the potential leftover remote attr blocks. */ error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0); if (error) goto out_cancel;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3602 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/R...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3602 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/R...