From: Zhihao Cheng chengzhihao1@huawei.com
Zhihao Cheng (1): ext4: dax: fix overflowing extents beyond inode size when partially writing
yangerkun (1): ext4: dax: keep orphan list before truncate overflow allocated blocks
fs/ext4/file.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/11910 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/E...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/11910 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/E...
From: Zhihao Cheng chengzhihao1@huawei.com
mainline inclusion from mainline-v6.12-rc1 commit dda898d7ffe85931f9cca6d702a51f33717c501e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAU1DM CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The dax_iomap_rw() does two things in each iteration: map written blocks and copy user data to blocks. If the process is killed by user(See signal handling in dax_iomap_iter()), the copied data will be returned and added on inode size, which means that the length of written extents may exceed the inode size, then fsck will fail. An example is given as:
dd if=/dev/urandom of=file bs=4M count=1 dax_iomap_rw iomap_iter // round 1 ext4_iomap_begin ext4_iomap_alloc // allocate 0~2M extents(written flag) dax_iomap_iter // copy 2M data iomap_iter // round 2 iomap_iter_advance iter->pos += iter->processed // iter->pos = 2M ext4_iomap_begin ext4_iomap_alloc // allocate 2~4M extents(written flag) dax_iomap_iter fatal_signal_pending done = iter->pos - iocb->ki_pos // done = 2M ext4_handle_inode_extension ext4_update_inode_size // inode size = 2M
fsck reports: Inode 13, i_size is 2097152, should be 4194304. Fix?
Fix the problem by truncating extents if the written length is smaller than expected.
Fixes: 776722e85d3b ("ext4: DAX iomap write support") CC: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219136 Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Link: https://patch.msgid.link/20240809121532.2105494-1-chengzhihao@huaweicloud.co... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/ext4/file.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 602349fc0bf0..ffa6bb50d23e 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -311,10 +311,10 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, * Clean up the inode after DIO or DAX extending write has completed and the * inode size has been updated using ext4_handle_inode_extension(). */ -static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count) +static void ext4_inode_extension_cleanup(struct inode *inode, bool need_trunc) { lockdep_assert_held_write(&inode->i_rwsem); - if (count < 0) { + if (need_trunc) { ext4_truncate_failed_write(inode); /* * If the truncate operation failed early, then the inode may @@ -559,7 +559,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) * writeback of delalloc blocks. */ WARN_ON_ONCE(ret == -EIOCBQUEUED); - ext4_inode_extension_cleanup(inode, ret); + ext4_inode_extension_cleanup(inode, ret < 0); }
out: @@ -643,7 +643,7 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (extend) { ret = ext4_handle_inode_extension(inode, offset, ret); - ext4_inode_extension_cleanup(inode, ret); + ext4_inode_extension_cleanup(inode, ret < (ssize_t)count); } out: inode_unlock(inode);
From: yangerkun yangerkun@huawei.com
mainline inclusion from mainline-v6.12-rc1 commit 59efe53e380ee305ec11378233adb6aaebe1856c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAU1DM
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Any extending write for ext4 requires the inode to be placed on the orphan list before the actual write. In addition, the inode can be actually removed from the orphan list only after all writes are completed. Otherwise we'd leave allocated blocks beyond i_disksize if we could not copy all the data into allocated block and e2fsck would complain.
Currently, direct IO and buffered IO comply with this logic(buffered IO will truncate all overflow allocated blocks that has not been written successfully, and direct IO will truncate all allocated blocks when error occurs). However, dax write break this since dax write will remove the inode from the orphan list by calling ext4_handle_inode_extension unconditionally during extending write.
We add a argument to help determine does we do a fully write, and for the case not fully write, we leave the inode on the orphan list, and the latter ext4_inode_extension_cleanup will help us truncate the overflow allocated blocks, and then remove the inode from the orphan list.
Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://patch.msgid.link/20240829110222.126685-1-yangerkun@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/ext4/file.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ffa6bb50d23e..90c8f5d675a5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -283,7 +283,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, }
static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, - ssize_t count) + ssize_t written, ssize_t count) { handle_t *handle;
@@ -292,7 +292,7 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, if (IS_ERR(handle)) return PTR_ERR(handle);
- if (ext4_update_inode_size(inode, offset + count)) { + if (ext4_update_inode_size(inode, offset + written)) { int ret = ext4_mark_inode_dirty(handle, inode); if (unlikely(ret)) { ext4_journal_stop(handle); @@ -300,11 +300,11 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, } }
- if (inode->i_nlink) + if ((written == count) && inode->i_nlink) ext4_orphan_del(handle, inode); ext4_journal_stop(handle);
- return count; + return written; }
/* @@ -370,7 +370,7 @@ static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size, if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize) && pos + size <= i_size_read(inode)) return size; - return ext4_handle_inode_extension(inode, pos, size); + return ext4_handle_inode_extension(inode, pos, size, size); }
static const struct iomap_dio_ops ext4_dio_write_ops = { @@ -642,7 +642,7 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops);
if (extend) { - ret = ext4_handle_inode_extension(inode, offset, ret); + ret = ext4_handle_inode_extension(inode, offset, ret, count); ext4_inode_extension_cleanup(inode, ret < (ssize_t)count); } out: