[PATCH OLK-6.6 0/7] ext4: Some zeroing fixes

Baokun Li (2): ext4: goto right label 'out_mmap_sem' in ext4_setattr() iomap: do not interrupt IOMAP_ZERO Brian Foster (1): mm: zero range of eof folio exposed by inode size extension Matthew Wilcox (Oracle) (1): mm: convert pagecache_isize_extended to use a folio Yongjian Sun (2): ext4: do not always order data when partial zeroing out a block ext4: fix potential memory exposure issues during truncate in iomap mode. Zhang Yi (1): jbd2: fix off-by-one while erasing journal fs/ext4/inode.c | 130 ++++++++++++++++++++++++++++++----------- fs/iomap/buffered-io.c | 7 ++- fs/jbd2/journal.c | 15 ++--- mm/truncate.c | 51 ++++++++++------ 4 files changed, 141 insertions(+), 62 deletions(-) -- 2.46.1

mainline inclusion from mainline-v6.15-rc1 commit 7e91ae31e2d264155dfd102101afc2de7bd74a64 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Otherwise, if ext4_inode_attach_jinode() fails, a hung task will happen because filemap_invalidate_unlock() isn't called to unlock mapping->invalidate_lock. Like this: EXT4-fs error (device sda) in ext4_setattr:5557: Out of memory INFO: task fsstress:374 blocked for more than 122 seconds. Not tainted 6.14.0-rc1-next-20250206-xfstests-dirty #726 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:fsstress state:D stack:0 pid:374 tgid:374 ppid:373 task_flags:0x440140 flags:0x00000000 Call Trace: <TASK> __schedule+0x2c9/0x7f0 schedule+0x27/0xa0 schedule_preempt_disabled+0x15/0x30 rwsem_down_read_slowpath+0x278/0x4c0 down_read+0x59/0xb0 page_cache_ra_unbounded+0x65/0x1b0 filemap_get_pages+0x124/0x3e0 filemap_read+0x114/0x3d0 vfs_read+0x297/0x360 ksys_read+0x6c/0xe0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: c7fc0366c656 ("ext4: partial zero eof block on unaligned inode size extension") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Link: https://patch.msgid.link/20250213112247.3168709-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bceb4c64754a..fb61ccc38c76 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6059,7 +6059,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, oldsize & (inode->i_sb->s_blocksize - 1)) { error = ext4_inode_attach_jinode(inode); if (error) - goto err_out; + goto out_mmap_sem; } handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.10-rc1 commit 2ebe90dab9808a15e5d1c973e7d3ddaee05ddbd3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Remove four hidden calls to compound_head(). Also exit early if the filesystem block size is >= PAGE_SIZE instead of just equal to PAGE_SIZE. Link: https://lkml.kernel.org/r/20240405180038.2618624-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- mm/truncate.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index fa9728073eeb..4cf6ed75d6a9 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -765,15 +765,15 @@ EXPORT_SYMBOL(truncate_setsize); * @from: original inode size * @to: new inode size * - * Handle extension of inode size either caused by extending truncate or by - * write starting after current i_size. We mark the page straddling current - * i_size RO so that page_mkwrite() is called on the nearest write access to - * the page. This way filesystem can be sure that page_mkwrite() is called on - * the page before user writes to the page via mmap after the i_size has been - * changed. + * Handle extension of inode size either caused by extending truncate or + * by write starting after current i_size. We mark the page straddling + * current i_size RO so that page_mkwrite() is called on the first + * write access to the page. The filesystem will update its per-block + * information before user writes to the page via mmap after the i_size + * has been changed. * * The function must be called after i_size is updated so that page fault - * coming after we unlock the page will already see the new i_size. + * coming after we unlock the folio will already see the new i_size. * The function must be called while we still hold i_rwsem - this not only * makes sure i_size is stable but also that userspace cannot observe new * i_size value before we are prepared to store mmap writes at new inode size. @@ -782,31 +782,29 @@ void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to) { int bsize = i_blocksize(inode); loff_t rounded_from; - struct page *page; - pgoff_t index; + struct folio *folio; WARN_ON(to > inode->i_size); - if (from >= to || bsize == PAGE_SIZE) + if (from >= to || bsize >= PAGE_SIZE) return; /* Page straddling @from will not have any hole block created? */ rounded_from = round_up(from, bsize); if (to <= rounded_from || !(rounded_from & (PAGE_SIZE - 1))) return; - index = from >> PAGE_SHIFT; - page = find_lock_page(inode->i_mapping, index); - /* Page not cached? Nothing to do */ - if (!page) + folio = filemap_lock_folio(inode->i_mapping, from / PAGE_SIZE); + /* Folio not cached? Nothing to do */ + if (IS_ERR(folio)) return; /* - * See clear_page_dirty_for_io() for details why set_page_dirty() + * See folio_clear_dirty_for_io() for details why folio_mark_dirty() * is needed. */ - if (page_mkclean(page)) - set_page_dirty(page); - unlock_page(page); - put_page(page); + if (folio_mkclean(folio)) + folio_mark_dirty(folio); + folio_unlock(folio); + folio_put(folio); } EXPORT_SYMBOL(pagecache_isize_extended); -- 2.46.1

From: Brian Foster <bfoster@redhat.com> mainline inclusion from mainline-v6.13-rc1 commit 52aecaee1c26409ebafe080293e0841884f6e9fb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- On some filesystems, it is currently possible to create a transient data inconsistency between pagecache and on-disk state. For example, on a 1k block size ext4 filesystem: $ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \ -c "truncate 8k" -c "fiemap -v" -c "pread -v 2k 16" <file> ... EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..3]: 17410..17413 4 0x1 1: [4..15]: hole 12 00000800: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX $ umount <mnt>; mount <dev> <mnt> $ xfs_io -c "pread -v 2k 16" <file> 00000800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ This allocates and writes two 1k blocks, map writes to the post-eof portion of the (4k) eof folio, extends the file, and then shows that the post-eof data is not cleared before the file size is extended. The result is pagecache with a clean and uptodate folio over a hole that returns non-zero data. Once reclaimed, pagecache begins to return valid data. Some filesystems avoid this problem by flushing the EOF folio before inode size extension. This triggers writeback time partial post-eof zeroing. XFS explicitly zeroes newly exposed file ranges via iomap_zero_range(), but this includes a hack to flush dirty but hole-backed folios, which means writeback actually does the zeroing in this particular case as well. bcachefs explicitly flushes the eof folio on truncate extension to the same effect, but doesn't handle the analogous write extension case (i.e., replace "truncate 8k" with "pwrite 4k 4k" in the above example command to reproduce the same problem on bcachefs). btrfs doesn't seem to support subpage block sizes. The two main options to avoid this behavior are to either flush or do the appropriate zeroing during size extending operations. Zeroing is only required when the size change exposes ranges of the file that haven't been directly written, such as a write or truncate that starts beyond the current eof. The pagecache_isize_extended() helper is already used for this particular scenario. It currently cleans any pte's for the eof folio to ensure preexisting mappings fault and allow the filesystem to take action based on the updated inode size. This is required to ensure the folio is fully backed by allocated blocks, for example, but this also happens to be the same scenario zeroing is required. Update pagecache_isize_extended() to zero the post-eof range of the eof folio if it is dirty at the time of the size change, since writeback now won't have the chance. If non-dirty, the folio has either not been written or the post-eof portion was zeroed by writeback. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://patch.msgid.link/20240919160741.208162-3-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- mm/truncate.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/truncate.c b/mm/truncate.c index 4cf6ed75d6a9..1557a0503f8e 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -803,6 +803,21 @@ void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to) */ if (folio_mkclean(folio)) folio_mark_dirty(folio); + + /* + * The post-eof range of the folio must be zeroed before it is exposed + * to the file. Writeback normally does this, but since i_size has been + * increased we handle it here. + */ + if (folio_test_dirty(folio)) { + unsigned int offset, end; + + offset = from - folio_pos(folio); + end = min_t(unsigned int, to - folio_pos(folio), + folio_size(folio)); + folio_zero_segment(folio, offset, end); + } + folio_unlock(folio); folio_put(folio); } -- 2.46.1

From: Yongjian Sun <sunyongjian1@huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 -------------------------------- When zeroing out a partial block during a partial truncate, zeroing range, or punching a hole, it is essential to order the data only during the partial truncate. This is necessary because there is a risk of exposing stale data. Consider a scenario in which a crash occurs just after the i_disksize transaction has been submitted but before the zeroed data is written out. In this case, the tail block will retain stale data, which could be exposed on the next expand truncate operation. However, partial zeroing range and punching hole do not have this risk. Therefore, we could move the ext4_jbd2_inode_add_write() out to ext4_truncate(), only order data for the partial truncate. Fixes: 5721968224e0 ("ext4: implement zero_range iomap path") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/inode.c | 53 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fb61ccc38c76..fbbc6ea06eb6 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4230,7 +4230,9 @@ void ext4_set_aops(struct inode *inode) } static int __ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, + loff_t from, loff_t length, + bool *did_zero) { ext4_fsblk_t index = from >> PAGE_SHIFT; unsigned offset = from & (PAGE_SIZE-1); @@ -4310,14 +4312,16 @@ static int __ext4_block_zero_page_range(handle_t *handle, if (ext4_should_journal_data(inode)) { err = ext4_dirty_journalled_data(handle, bh); + if (err) + goto unlock; } else { err = 0; mark_buffer_dirty(bh); - if (ext4_should_order_data(inode)) - err = ext4_jbd2_inode_add_write(handle, inode, from, - length); } + if (did_zero) + *did_zero = true; + unlock: folio_unlock(folio); folio_put(folio); @@ -4339,7 +4343,9 @@ static int ext4_iomap_zero_range(struct inode *inode, * that corresponds to 'from' */ static int ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, + loff_t from, loff_t length, + bool *did_zero) { struct inode *inode = mapping->host; unsigned offset = from & (PAGE_SIZE-1); @@ -4359,7 +4365,8 @@ static int ext4_block_zero_page_range(handle_t *handle, } else if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { return ext4_iomap_zero_range(inode, from, length); } - return __ext4_block_zero_page_range(handle, mapping, from, length); + return __ext4_block_zero_page_range(handle, mapping, from, length, + did_zero); } /* @@ -4369,12 +4376,15 @@ static int ext4_block_zero_page_range(handle_t *handle, * of that block so it doesn't yield old data if the file is later grown. */ static int ext4_block_truncate_page(handle_t *handle, - struct address_space *mapping, loff_t from) + struct address_space *mapping, loff_t from, + loff_t *zero_len) { unsigned offset = from & (PAGE_SIZE-1); unsigned length; unsigned blocksize; struct inode *inode = mapping->host; + bool did_zero = false; + int ret; /* If we are processing an encrypted inode during orphan list handling */ if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode)) @@ -4383,7 +4393,13 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize = inode->i_sb->s_blocksize; length = blocksize - (offset & (blocksize - 1)); - return ext4_block_zero_page_range(handle, mapping, from, length); + ret = ext4_block_zero_page_range(handle, mapping, from, length, + &did_zero); + if (ret) + return ret; + + *zero_len = length; + return 0; } int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, @@ -4406,13 +4422,14 @@ int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, if (start == end && (partial_start || (partial_end != sb->s_blocksize - 1))) { err = ext4_block_zero_page_range(handle, mapping, - lstart, length); + lstart, length, NULL); return err; } /* Handle partial zero out on the start of the range */ if (partial_start) { err = ext4_block_zero_page_range(handle, mapping, - lstart, sb->s_blocksize); + lstart, sb->s_blocksize, + NULL); if (err) return err; } @@ -4420,7 +4437,7 @@ int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, if (partial_end != sb->s_blocksize - 1) err = ext4_block_zero_page_range(handle, mapping, byte_end - partial_end, - partial_end + 1); + partial_end + 1, NULL); return err; } @@ -4715,6 +4732,7 @@ int ext4_truncate(struct inode *inode) int err = 0, err2; handle_t *handle; struct address_space *mapping = inode->i_mapping; + loff_t zero_len = 0; /* * There is a possibility that we're either freeing the inode @@ -4758,7 +4776,15 @@ int ext4_truncate(struct inode *inode) } if (inode->i_size & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, mapping, inode->i_size); + ext4_block_truncate_page(handle, mapping, inode->i_size, + &zero_len); + + if (zero_len && ext4_should_order_data(inode)) { + err = ext4_jbd2_inode_add_write(handle, inode, inode->i_size, + zero_len); + if (err) + goto out_stop; + } /* * We add the inode to the orphan list, so that if this @@ -6081,7 +6107,8 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, inode_set_ctime_current(inode)); if (oldsize & (inode->i_sb->s_blocksize - 1)) ext4_block_truncate_page(handle, - inode->i_mapping, oldsize); + inode->i_mapping, + oldsize, NULL); } if (shrink) -- 2.46.1

From: Yongjian Sun <sunyongjian1@huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 -------------------------------- Since we do not order the data in iomap mode, it is essential to write out zeroed data before the i_disksize update transaction is committed. Otherwise, stale data may left over in the last block, which could be exposed during the next expand truncate operation. After we write 0, we need to perform a sync operation. So we do filemap_write_and_wait_range in the new path. However, doing this would introduce a hung task issue. We can't wait I/O to complete under running handle because the end I/O process may also wait this handle to stop if the running transaction has begun to commit or the journal is running out of space. So, we move the call to ext4_block_truncate_page in the ext4_truncate function before handle_start and pass a NULL handle. When zeroing out a partial block in __ext4_block_zero_page_range() during a partial truncate, we only need to start a handle in data=journal mode and when the handle parameter is NULL. Because we need to log the zeroed data block, we don't need this handle in other modes. Therefore, let's postpone the start of handle in the partial truncation, zeroing range, and hole punching, in preparation for the buffered write iomap conversion. Fixes: 5721968224e0 ("ext4: implement zero_range iomap path") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Conflicts: fs/ext4/inode.c [Move the code that waits for data to be landed on the disk to ext4_block_truncate_page().] Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/inode.c | 101 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 69 insertions(+), 32 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fbbc6ea06eb6..59f46934928d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4242,12 +4242,22 @@ static int __ext4_block_zero_page_range(handle_t *handle, struct buffer_head *bh; struct folio *folio; int err = 0; + bool orig_handle_valid = true; + + if (ext4_should_journal_data(inode) && handle == NULL) { + handle = ext4_journal_start(inode, EXT4_HT_MISC, 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + orig_handle_valid = false; + } folio = __filemap_get_folio(mapping, from >> PAGE_SHIFT, FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mapping_gfp_constraint(mapping, ~__GFP_FS)); - if (IS_ERR(folio)) - return PTR_ERR(folio); + if (IS_ERR(folio)) { + err = PTR_ERR(folio); + goto out; + } blocksize = inode->i_sb->s_blocksize; @@ -4300,22 +4310,24 @@ static int __ext4_block_zero_page_range(handle_t *handle, } } } + if (ext4_should_journal_data(inode)) { BUFFER_TRACE(bh, "get write access"); err = ext4_journal_get_write_access(handle, inode->i_sb, bh, EXT4_JTR_NONE); if (err) goto unlock; - } - folio_zero_range(folio, offset, length); - BUFFER_TRACE(bh, "zeroed end of block"); + folio_zero_range(folio, offset, length); + BUFFER_TRACE(bh, "zeroed end of block"); - if (ext4_should_journal_data(inode)) { err = ext4_dirty_journalled_data(handle, bh); if (err) goto unlock; } else { err = 0; + folio_zero_range(folio, offset, length); + BUFFER_TRACE(bh, "zeroed end of block"); + mark_buffer_dirty(bh); } @@ -4325,13 +4337,16 @@ static int __ext4_block_zero_page_range(handle_t *handle, unlock: folio_unlock(folio); folio_put(folio); +out: + if (ext4_should_journal_data(inode) && orig_handle_valid == false) + ext4_journal_stop(handle); return err; } -static int ext4_iomap_zero_range(struct inode *inode, - loff_t from, loff_t length) +static int ext4_iomap_zero_range(struct inode *inode, loff_t from, + loff_t length, bool *did_zero) { - return iomap_zero_range(inode, from, length, NULL, + return iomap_zero_range(inode, from, length, did_zero, &ext4_iomap_buffered_read_ops); } @@ -4363,7 +4378,7 @@ static int ext4_block_zero_page_range(handle_t *handle, return dax_zero_range(inode, from, length, NULL, &ext4_iomap_ops); } else if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { - return ext4_iomap_zero_range(inode, from, length); + return ext4_iomap_zero_range(inode, from, length, did_zero); } return __ext4_block_zero_page_range(handle, mapping, from, length, did_zero); @@ -4375,16 +4390,15 @@ static int ext4_block_zero_page_range(handle_t *handle, * This required during truncate. We need to physically zero the tail end * of that block so it doesn't yield old data if the file is later grown. */ -static int ext4_block_truncate_page(handle_t *handle, - struct address_space *mapping, loff_t from, - loff_t *zero_len) +static loff_t ext4_block_truncate_page(struct address_space *mapping, + loff_t from) { unsigned offset = from & (PAGE_SIZE-1); unsigned length; unsigned blocksize; struct inode *inode = mapping->host; bool did_zero = false; - int ret; + int err; /* If we are processing an encrypted inode during orphan list handling */ if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode)) @@ -4393,13 +4407,28 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize = inode->i_sb->s_blocksize; length = blocksize - (offset & (blocksize - 1)); - ret = ext4_block_zero_page_range(handle, mapping, from, length, + err = ext4_block_zero_page_range(NULL, mapping, from, length, &did_zero); - if (ret) - return ret; + if (err) + return err; - *zero_len = length; - return 0; + /* + * inode with an iomap buffered I/O path does not order data, + * so it is necessary to write out zeroed data before the + * updating i_disksize transaction is committed. Otherwise, + * stale data may remain in the last block, which could be + * exposed during the next expand truncate operation. + */ + if (length && ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { + loff_t zero_end = inode->i_size + length; + + err = filemap_write_and_wait_range(mapping, + inode->i_size, zero_end - 1); + if (err) + return err; + } + + return length; } int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, @@ -4762,6 +4791,12 @@ int ext4_truncate(struct inode *inode) err = ext4_inode_attach_jinode(inode); if (err) goto out_trace; + + zero_len = ext4_block_truncate_page(mapping, inode->i_size); + if (zero_len < 0) { + err = zero_len; + goto out_trace; + } } if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) @@ -4775,10 +4810,6 @@ int ext4_truncate(struct inode *inode) goto out_trace; } - if (inode->i_size & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, mapping, inode->i_size, - &zero_len); - if (zero_len && ext4_should_order_data(inode)) { err = ext4_jbd2_inode_add_write(handle, inode, inode->i_size, zero_len); @@ -6088,6 +6119,18 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, goto out_mmap_sem; } + /* Tail zero the EOF folio on truncate up. */ + if (!shrink && oldsize & (inode->i_sb->s_blocksize - 1)) { + loff_t zero_len; + + zero_len = ext4_block_truncate_page( + inode->i_mapping, oldsize); + if (zero_len < 0) { + error = zero_len; + goto out_mmap_sem; + } + } + handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); if (IS_ERR(handle)) { error = PTR_ERR(handle); @@ -6098,18 +6141,12 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, orphan = 1; } /* - * Update c/mtime and tail zero the EOF folio on - * truncate up. ext4_truncate() handles the shrink case - * below. + * Update c/mtime on truncate up, ext4_truncate() will + * update c/mtime in shrink case below */ - if (!shrink) { + if (!shrink) inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); - if (oldsize & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, - inode->i_mapping, - oldsize, NULL); - } if (shrink) ext4_fc_track_range(handle, inode, -- 2.46.1

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 -------------------------------- ext4 sometimes fails on generic/269 and generic/270 in buffered_iomap mode. These tests run fsstress and ENOSPC hitters in parallel. If the fsstress process is killed when a direct write fails due to ENOSPC, it may interrupt zero_range, as follows: ext4_dio_write_iter ext4_orphan_add(inode) iomap_dio_rw return -ENOSPC; ext4_inode_extension_cleanup ext4_truncate_failed_write(inode) ext4_truncate(inode) ext4_block_truncate_page(mapping, inode->i_size) ext4_block_zero_page_range ext4_iomap_zero_range iomap_zero_range iomap_zero_iter iomap_write_begin if (fatal_signal_pending(current)) return -EINTR; ext4_orphan_del(NULL, inode) Then, the inode was removed from the orphan list without running ext4_ext_truncate(), which corrupted the file system. A later fsck found the file system inconsistent, causing the test case to fail. Since zero range operations typically involve small amounts of data and are frequently used to prevent the exposure of stale data, we avoid interrupting IOMAP_ZERO by signals within iomap_write_begin(). Fixes: a85e54b5fdda ("ext4: partial zero eof block on unaligned inode size extension") Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/iomap/buffered-io.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 0c06e97d08de..c994b2f058c3 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -844,7 +844,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, if (srcmap != &iter->iomap) BUG_ON(pos + len > srcmap->offset + srcmap->length); - if (fatal_signal_pending(current)) + /* + * Zero range operations typically involve small amounts of data + * and are frequently used to prevent the exposure of stale data. + * Therefore, do not interrupt it here. + */ + if (iter->flags != IOMAP_ZERO && fatal_signal_pending(current)) return -EINTR; if (!mapping_large_folio_support(iter->inode->i_mapping)) -- 2.46.1

From: Zhang Yi <yi.zhang@huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5WC2 -------------------------------- In __jbd2_journal_erase(), the block_stop parameter includes the last block of a contiguous region; however, the calculation of byte_stop is incorrect, as it does not account for the bytes in that last block. Consequently, the page cache is not cleared properly, which occasionally causes the ext4/050 test to fail. Considering that block_stop uses inclusion semantics complicates the calculation. Improve the calculation and fix the incorrect byte_stop by make both block_stop and byte_stop to use exclusion semantics. This fixes a failure in fstests ext4/050. Fixes: 01d5d96542fd ("ext4: add discard/zeroout flags to journal flush") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/jbd2/journal.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 98014cdcb714..68d3141a2f36 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -2000,17 +2000,15 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags) return err; } - if (block_start == ~0ULL) { - block_start = phys_block; - block_stop = block_start - 1; - } + if (block_start == ~0ULL) + block_stop = block_start = phys_block; /* * last block not contiguous with current block, * process last contiguous region and return to this block on * next loop */ - if (phys_block != block_stop + 1) { + if (phys_block != block_stop) { block--; } else { block_stop++; @@ -2029,11 +2027,10 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags) */ byte_start = block_start * journal->j_blocksize; byte_stop = block_stop * journal->j_blocksize; - byte_count = (block_stop - block_start + 1) * - journal->j_blocksize; + byte_count = (block_stop - block_start) * journal->j_blocksize; truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping, - byte_start, byte_stop); + byte_start, byte_stop - 1); if (flags & JBD2_JOURNAL_FLUSH_DISCARD) { err = blkdev_issue_discard(journal->j_dev, @@ -2048,7 +2045,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags) } if (unlikely(err != 0)) { - pr_err("JBD2: (error %d) unable to wipe journal at physical blocks %llu - %llu", + pr_err("JBD2: (error %d) unable to wipe journal at physical blocks [%llu, %llu)", err, block_start, block_stop); return err; } -- 2.46.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/15839 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZU6... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/15839 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZU6...
participants (2)
-
Baokun Li
-
patchwork bot