From: Zhang Yi yi.zhang@huawei.com
mainline inclusion from mainline-v6.11-rc1 commit d048945150b798147b324f05f7e8c857772b0d3f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When unaligned truncate down a big realtime file, xfs_truncate_page() only zeros out the tail EOF block, __xfs_bunmapi() should split the tail written extent and convert the later one that beyond EOF block to unwritten, but it couldn't work as expected now since the reserved block is zero in xfs_setattr_size(), this could expose stale data just after commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")'.
If we truncate file that contains a large enough written extent:
|< rxext >|< rtext >| ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW ^ (new EOF) ^ old EOF
Since we only zeros out the tail of the EOF block, and xfs_itruncate_extents()->..->__xfs_bunmapi() unmap the whole ailgned extents, it becomes this state:
|< rxext >| ...WWWzWWWWWWWWWWWWW ^ new EOF
Then if we do an extending write like this, the blocks in the previous tail extent becomes stale:
|< rxext >| ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW ^ old EOF ^ append start ^ new EOF
Fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for big realtime inode.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Link: https://lore.kernel.org/r/20240618142112.1315279-2-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Conflicts: fs/xfs/xfs_iops.c [ 3fed24fffc76dd("xfs: Replace xfs_isilocked with xfs_assert_ilocked") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zhihao Cheng chengzhihao@huaweicloud.com --- fs/xfs/xfs_iops.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index b8ec045708c3..caba648e0ed2 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -17,6 +17,8 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_trans.h" +#include "xfs_trans_space.h" +#include "xfs_bmap_btree.h" #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_symlink.h" @@ -794,6 +796,7 @@ xfs_setattr_size( struct xfs_trans *tp; int error; uint lock_flags = 0; + uint resblks = 0; bool did_zeroing = false;
ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); @@ -901,7 +904,17 @@ xfs_setattr_size( return error; }
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); + /* + * For realtime inode with more than one block rtextsize, we need the + * block reservation for bmap btree block allocations/splits that can + * happen since it could split the tail written extent and convert the + * right beyond EOF one to unwritten. + */ + if (xfs_inode_has_bigrtalloc(ip)) + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0); + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, + 0, 0, &tp); if (error) return error;