In ext4_iomap_write_begin(), we speed up mapping check by checking the folio dirty bit. If the folio is dirty, it means this folio has just been written and must have a counterpart allocated block or delalloc extent, so we don't need to map the folio again if we write to the same place, this could speed up a lot for the case of repeated overwrite to the same folio. However, we only check the entire folio has been dirty or not, so it doesn't support if we have more than one blocks per folio yet.
This series fix the handles of sub-folio properly in iomap, and extend this improvement by checking partial blocks in one folio.
Zhang Yi (6): iomap: correct the range of a partial dirty clear iomap: support invalidating partial folios iomap: advance the ifs allocation if we have more than one blocks per folio iomap: correct the dirty length in page mkwrite iomap: add iomap_is_fully_dirty() ext4: improve sub-polio check in ext4_iomap_write_begin()
fs/ext4/inode.c | 2 +- fs/iomap/buffered-io.c | 64 ++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 3 files changed, 57 insertions(+), 10 deletions(-)
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
The block range calculation in ifs_clear_range_dirty() is incorrect when partial clear a range in a folio. We can't clear the dirty bit of the first block or the last block if the start or end offset is blocksize unaligned, this has not yet caused any issue since we always clear a whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix this by round up the first block and round down the last block and correct the calculation of nr_blks.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 049b3a6f9b08..db191503cfcd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -136,11 +136,14 @@ static void ifs_clear_range_dirty(struct folio *folio, { struct inode *inode = folio->mapping->host; unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); - unsigned int first_blk = (off >> inode->i_blkbits); - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; - unsigned int nr_blks = last_blk - first_blk + 1; + unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode)); + unsigned int last_blk = (off + len) >> inode->i_blkbits; + unsigned int nr_blks = last_blk - first_blk; unsigned long flags;
+ if (!nr_blks) + return; + spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); spin_unlock_irqrestore(&ifs->state_lock, flags);
LGTM
在 2024/7/13 16:45, Zhang Yi 写道:
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
The block range calculation in ifs_clear_range_dirty() is incorrect when partial clear a range in a folio. We can't clear the dirty bit of the first block or the last block if the start or end offset is blocksize unaligned, this has not yet caused any issue since we always clear a whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix this by round up the first block and round down the last block and correct the calculation of nr_blks.
Signed-off-by: Zhang Yi yi.zhang@huawei.com
fs/iomap/buffered-io.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 049b3a6f9b08..db191503cfcd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -136,11 +136,14 @@ static void ifs_clear_range_dirty(struct folio *folio, { struct inode *inode = folio->mapping->host; unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
- unsigned int first_blk = (off >> inode->i_blkbits);
- unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
- unsigned int nr_blks = last_blk - first_blk + 1;
unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode));
unsigned int last_blk = (off + len) >> inode->i_blkbits;
unsigned int nr_blks = last_blk - first_blk; unsigned long flags;
if (!nr_blks)
return;
spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); spin_unlock_irqrestore(&ifs->state_lock, flags);
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
Current iomap_invalidate_folio() could only invalidate an entire folio, if we truncate a partial folio on a filesystem with blocksize < folio size, it will left over the dirty bits of truncated/punched blocks, and the write back process will try to map the invalid hole range, but fortunately it hasn't trigger any real problems now since ->map() will fix the length. Fix this by supporting invalidating partial folios.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index db191503cfcd..41185c7d4e7c 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -627,6 +627,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); ifs_free(folio); + } else { + iomap_clear_range_dirty(folio, offset, len); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
LGTM
在 2024/7/13 16:45, Zhang Yi 写道:
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
Current iomap_invalidate_folio() could only invalidate an entire folio, if we truncate a partial folio on a filesystem with blocksize < folio size, it will left over the dirty bits of truncated/punched blocks, and the write back process will try to map the invalid hole range, but fortunately it hasn't trigger any real problems now since ->map() will fix the length. Fix this by supporting invalidating partial folios.
Signed-off-by: Zhang Yi yi.zhang@huawei.com
fs/iomap/buffered-io.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index db191503cfcd..41185c7d4e7c 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -627,6 +627,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); ifs_free(folio);
- } else {
} } EXPORT_SYMBOL_GPL(iomap_invalidate_folio);iomap_clear_range_dirty(folio, offset, len);
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
Now we allocate ifs if i_blocks_per_folio is larger than one when writing back dirty folios in iomap_writepage_map(), so we don't attach an ifs after buffer write to an entire folio until it starts writing back, if we partial truncate that folio, iomap_invalidate_folio() can't clear counterpart block's dirty bit as expected. Fix this by advance the ifs allocation to __iomap_write_begin().
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 41185c7d4e7c..e76b79efc0df 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -682,6 +682,12 @@ int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t from = offset_in_folio(folio, pos), to = from + len; size_t poff, plen;
+ if (nr_blocks > 1) { + ifs = ifs_alloc(iter->inode, folio, iter->flags); + if ((iter->flags & IOMAP_NOWAIT) && !ifs) + return -EAGAIN; + } + /* * If the write or zeroing completely overlaps the current folio, then * entire folio will be dirtied so there is no need for @@ -693,10 +699,6 @@ int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, pos + len >= folio_pos(folio) + folio_size(folio)) return 0;
- ifs = ifs_alloc(iter->inode, folio, iter->flags); - if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1) - return -EAGAIN; - if (folio_test_uptodate(folio)) return 0; folio_clear_error(folio); @@ -1925,7 +1927,12 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, WARN_ON_ONCE(end_pos <= pos);
if (i_blocks_per_folio(inode, folio) > 1) { - if (!ifs) { + /* + * This should not happen since we always allocate ifs in + * iomap_folio_mkwrite_iter() and there is more than one + * blocks per folio in __iomap_write_begin(). + */ + if (WARN_ON_ONCE(!ifs)) { ifs = ifs_alloc(inode, folio, 0); iomap_set_range_dirty(folio, 0, end_pos - pos); }
在 2024/7/13 16:45, Zhang Yi 写道:
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
Now we allocate ifs if i_blocks_per_folio is larger than one when writing back dirty folios in iomap_writepage_map(), so we don't attach an ifs after buffer write to an entire folio until it starts writing back, if we partial truncate that folio, iomap_invalidate_folio() can't clear counterpart block's dirty bit as expected. Fix this by advance the ifs allocation to __iomap_write_begin().
Signed-off-by: Zhang Yi yi.zhang@huawei.com
fs/iomap/buffered-io.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 41185c7d4e7c..e76b79efc0df 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -682,6 +682,12 @@ int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t from = offset_in_folio(folio, pos), to = from + len; size_t poff, plen;
- if (nr_blocks > 1) {
ifs = ifs_alloc(iter->inode, folio, iter->flags);
if ((iter->flags & IOMAP_NOWAIT) && !ifs)
return -EAGAIN;
- }
- /*
- If the write or zeroing completely overlaps the current folio, then
- entire folio will be dirtied so there is no need for
@@ -693,10 +699,6 @@ int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, pos + len >= folio_pos(folio) + folio_size(folio)) return 0;
- ifs = ifs_alloc(iter->inode, folio, iter->flags);
- if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
return -EAGAIN;
- if (folio_test_uptodate(folio)) return 0; folio_clear_error(folio);
@@ -1925,7 +1927,12 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, WARN_ON_ONCE(end_pos <= pos);
if (i_blocks_per_folio(inode, folio) > 1) {
if (!ifs) {
/*
* This should not happen since we always allocate ifs in
* iomap_folio_mkwrite_iter() and there is more than one
* blocks per folio in __iomap_write_begin().
*/
if (WARN_ON_ONCE(!ifs)) {
page mkwrite流程里怎么保证的?
ifs = ifs_alloc(inode, folio, 0); iomap_set_range_dirty(folio, 0, end_pos - pos); }
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire folio by folio_mark_dirty() even the map length is shorter than one folio. However, on the filesystem with more than one blocks per folio, we'd better to only set counterpart block's dirty bit according to iomap_length(), so open code folio_mark_dirty() and pass the correct length.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e76b79efc0df..b89ed06c6f86 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1488,7 +1488,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter, block_commit_write(&folio->page, 0, length); } else { WARN_ON_ONCE(!folio_test_uptodate(folio)); - folio_mark_dirty(folio); + + ifs_alloc(iter->inode, folio, 0); + iomap_set_range_dirty(folio, 0, length); + filemap_dirty_folio(iter->inode->i_mapping, folio); }
return length;
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
Add a new helper iomap_is_fully_dirty() to check if all blocks which correspond to the specified part of a folio are dirty.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 31 +++++++++++++++++++++++++++++++ include/linux/iomap.h | 1 + 2 files changed, 32 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index b89ed06c6f86..205f504b2dfb 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -180,6 +180,37 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len) ifs_set_range_dirty(folio, ifs, off, len); }
+/* + * iomap_is_fully_dirty checks whether blocks within a folio are + * dirty or not. + * + * Returns true if all blocks which correspond to the specified part + * of the folio are dirty. + */ +bool iomap_is_fully_dirty(struct folio *folio, size_t from, size_t count) +{ + struct iomap_folio_state *ifs = folio->private; + struct inode *inode = folio->mapping->host; + unsigned first, last, i; + + if (!ifs) + return folio_test_dirty(folio); + + /* Caller's range may extend past the end of this folio */ + count = min(folio_size(folio) - from, count); + + /* First and last blocks in range within folio */ + first = from >> inode->i_blkbits; + last = (from + count - 1) >> inode->i_blkbits; + + for (i = first; i <= last; i++) + if (!ifs_block_is_dirty(folio, ifs, i)) + return false; + + return true; +} +EXPORT_SYMBOL_GPL(iomap_is_fully_dirty); + static struct iomap_folio_state *ifs_alloc(struct inode *inode, struct folio *folio, unsigned int flags) { diff --git a/include/linux/iomap.h b/include/linux/iomap.h index daf0a86ba377..1391aa97b08c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -269,6 +269,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode, int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); +bool iomap_is_fully_dirty(struct folio *folio, size_t from, size_t count); struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IACNS4 CVE: NA
--------------------------------
In ext4_iomap_write_begin(), we speed up mapping check by checking the folio dirty bit. If the folio is dirty, it means this folio has just been written and must have a counterpart allocated block or delalloc extent, so we don't need to map the folio again if we write to the same place, this could speed up a lot for the case of repeated overwrite to the same folio.
However, we only check the entire folio has been dirty or not, so it doesn't support if we have more than one blocks per folio yet. After iomap could handle sub-folio properly, let's extend this improvement by checking partial blocks in one folio.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 642522ace038..9ed6014246dc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3988,7 +3988,7 @@ static int ext4_iomap_write_begin(struct file *file,
WARN_ON_ONCE(pos + len > folio_pos(folio) + folio_size(folio));
- if (folio_test_dirty(folio) && (i_blocks_per_folio(inode, folio) == 1)) + if (iomap_is_fully_dirty(folio, offset_in_folio(folio, pos), len)) goto out;
do {
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/9904 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/9904 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/I...