Christoph Hellwig (2): iomap: rename the flags variable in __iomap_dio_rw iomap: pass a flags argument to iomap_dio_rw
Jens Axboe (3): iomap: cleanup up iomap_dio_bio_end_io() iomap: use an unsigned type for IOMAP_DIO_* defines iomap: add IOMAP_DIO_INLINE_COMP
Zhihao Cheng (2): iomap: Enable IOMAP_DIO_INLINE_COMP in write paths ext4: Optimize endio process for DIO overwrites
fs/btrfs/inode.c | 4 +-- fs/ext4/file.c | 14 +++++--- fs/gfs2/file.c | 7 ++-- fs/iomap/direct-io.c | 84 ++++++++++++++++++++++++++++--------------- fs/xfs/xfs_file.c | 5 ++- fs/zonefs/super.c | 4 +-- include/linux/iomap.h | 10 ++++-- 7 files changed, 81 insertions(+), 47 deletions(-)
From: Jens Axboe axboe@kernel.dk
mainline inclusion from mainline-v6.6-rc1 commit 3486237c6fe8d0e5024f9c48bfe73843b1bd8284 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Make the logic a bit easier to follow:
1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt).
No functional changes in this patch.
Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: fs/iomap/direct-io.c [ 3e08773c3841 ("block: switch polling to be bio based") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/iomap/direct-io.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..4cedab21efe3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -154,25 +154,41 @@ static void iomap_dio_bio_end_io(struct bio *bio) { struct iomap_dio *dio = bio->bi_private; bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY); + struct kiocb *iocb = dio->iocb;
if (bio->bi_status) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); + if (!atomic_dec_and_test(&dio->ref)) + goto release_bio;
- if (atomic_dec_and_test(&dio->ref)) { - if (dio->wait_for_completion) { - struct task_struct *waiter = dio->submit.waiter; - WRITE_ONCE(dio->submit.waiter, NULL); - blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(dio->iocb->ki_filp); + /* + * Synchronous dio, task itself will handle any completion work + * that needs after IO. All we need to do is wake the task. + */ + if (dio->wait_for_completion) { + struct task_struct *waiter = dio->submit.waiter;
- INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - iomap_dio_complete_work(&dio->aio.work); - } + WRITE_ONCE(dio->submit.waiter, NULL); + blk_wake_io_task(waiter); + goto release_bio; }
+ /* Read completion can always complete inline. */ + if (!(dio->flags & IOMAP_DIO_WRITE)) { + iomap_dio_complete_work(&dio->aio.work); + goto release_bio; + } + + /* + * Async DIO completion that requires filesystem level completion work + * gets punted to a work queue to complete as the operation may require + * more IO to be issued to finalise filesystem metadata changes or + * guarantee data integrity. + */ + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, + &dio->aio.work); +release_bio: if (should_dirty) { bio_check_pages_dirty(bio); } else {
From: Jens Axboe axboe@kernel.dk
mainline inclusion from mainline-v6.6-rc1 commit 44842f647346cac4063b2bb8e9476fad09e363e7 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
IOMAP_DIO_DIRTY shifts by 31 bits, which makes UBSAN unhappy. Clean up all the defines by making the shifted value an unsigned value.
Reviewed-by: Darrick J. Wong djwong@kernel.org Reported-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/iomap/direct-io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 4cedab21efe3..7b0458d4d794 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -18,10 +18,10 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ -#define IOMAP_DIO_WRITE_FUA (1 << 28) -#define IOMAP_DIO_NEED_SYNC (1 << 29) -#define IOMAP_DIO_WRITE (1 << 30) -#define IOMAP_DIO_DIRTY (1 << 31) +#define IOMAP_DIO_WRITE_FUA (1U << 28) +#define IOMAP_DIO_NEED_SYNC (1U << 29) +#define IOMAP_DIO_WRITE (1U << 30) +#define IOMAP_DIO_DIRTY (1U << 31)
struct iomap_dio { struct kiocb *iocb;
From: Jens Axboe axboe@kernel.dk
mainline inclusion from mainline-v6.6-rc1 commit 7b3c14d1a96bf63c078c3bbfe5573fb964e80b95 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Rather than gate whether or not we need to punt a dio completion to a workqueue on whether the IO is a write or not, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not.
No functional changes in this patch.
Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: fs/iomap/direct-io.c [ 3a0be38cc84d("iomap: treat a write through cache the same as FUA") is not applied. a6d3d49587d1("iomap: switch __iomap_dio_rw to use iomap_iter") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/iomap/direct-io.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 7b0458d4d794..bb9a08235fa4 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -18,6 +18,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1U << 27) #define IOMAP_DIO_WRITE_FUA (1U << 28) #define IOMAP_DIO_NEED_SYNC (1U << 29) #define IOMAP_DIO_WRITE (1U << 30) @@ -173,8 +174,10 @@ static void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; }
- /* Read completion can always complete inline. */ - if (!(dio->flags & IOMAP_DIO_WRITE)) { + /* + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline + */ + if (dio->flags & IOMAP_DIO_INLINE_COMP) { iomap_dio_complete_work(&dio->aio.work); goto release_bio; } @@ -471,6 +474,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.last_queue = NULL;
if (iov_iter_rw(iter) == READ) { + /* reads can always complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + if (pos >= dio->i_size) goto out_free_dio;
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.12-rc1 commit 5724be5de88f5f6863d44c859f42f70d5cc667ed category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Rename flags to iomap_flags to make the usage a little more clear.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/iomap/direct-io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index bb9a08235fa4..3207e04ec80c 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -446,7 +446,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; loff_t end = iocb->ki_pos + count - 1, ret = 0; - unsigned int flags = IOMAP_DIRECT; + unsigned int iomap_flags = IOMAP_DIRECT; struct blk_plug plug; struct iomap_dio *dio;
@@ -483,7 +483,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (iter_is_iovec(iter)) dio->flags |= IOMAP_DIO_DIRTY; } else { - flags |= IOMAP_WRITE; + iomap_flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE;
/* for data sync or sync, we need sync completion processing */ @@ -505,7 +505,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, ret = -EAGAIN; goto out_free_dio; } - flags |= IOMAP_NOWAIT; + iomap_flags |= IOMAP_NOWAIT; }
ret = filemap_write_and_wait_range(mapping, pos, end); @@ -536,7 +536,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
blk_start_plug(&plug); do { - ret = iomap_apply(inode, pos, count, flags, ops, dio, + ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio, iomap_dio_actor); if (ret <= 0) { /* magic error code to fall back to buffered I/O */
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.12-rc1 commit 2f63296578cad1ae681152d5b2122a4595195f16 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Pass a set of flags to iomap_dio_rw instead of the boolean wait_for_completion argument. The IOMAP_DIO_FORCE_WAIT flag replaces the wait_for_completion, but only needs to be passed when the iocb isn't synchronous to start with to simplify the callers.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com [djwong: rework xfs_file.c so that we can push iomap changes separately] Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Conflicts: fs/btrfs/inode.c fs/xfs/xfs_file.c fs/zonefs/super.c Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/btrfs/inode.c | 4 ++-- fs/ext4/file.c | 5 ++--- fs/gfs2/file.c | 7 ++----- fs/iomap/direct-io.c | 11 +++++------ fs/xfs/xfs_file.c | 5 ++--- fs/zonefs/super.c | 4 ++-- include/linux/iomap.h | 10 ++++++++-- 7 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d87a53613e34..b12fc82e34ba 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8084,10 +8084,10 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) */ if (current->journal_info) ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, - &btrfs_sync_dops, is_sync_kiocb(iocb)); + &btrfs_sync_dops, 0); else ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, - &btrfs_dio_ops, is_sync_kiocb(iocb)); + &btrfs_dio_ops, 0);
if (ret == -ENOTBLK) ret = 0; diff --git a/fs/ext4/file.c b/fs/ext4/file.c index e474e064d65c..661141038e75 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -75,8 +75,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) return generic_file_read_iter(iocb, to); }
- ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0); inode_unlock_shared(inode);
file_accessed(iocb->ki_filp); @@ -573,7 +572,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) if (ilock_shared) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, - is_sync_kiocb(iocb) || unaligned_io || extend); + (unaligned_io || extend) ? IOMAP_DIO_FORCE_WAIT : 0); if (ret == -ENOTBLK) ret = 0;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 55a8eb3c1963..24ab28f02004 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -798,9 +798,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, if (ret) goto out_uninit;
- ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, - is_sync_kiocb(iocb)); - + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0); gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); @@ -834,8 +832,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (offset + len > i_size_read(&ip->i_inode)) goto out;
- ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0); if (ret == -ENOTBLK) ret = 0; out: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 3207e04ec80c..d4cf2481ecf4 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -439,13 +439,15 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio * __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion) + unsigned int dio_flags) { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; loff_t end = iocb->ki_pos + count - 1, ret = 0; + bool wait_for_completion = + is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT); unsigned int iomap_flags = IOMAP_DIRECT; struct blk_plug plug; struct iomap_dio *dio; @@ -453,9 +455,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (!count) return NULL;
- if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) - return ERR_PTR(-EIO); - dio = kmalloc(sizeof(*dio), GFP_KERNEL); if (!dio) return ERR_PTR(-ENOMEM); @@ -620,11 +619,11 @@ EXPORT_SYMBOL_GPL(__iomap_dio_rw); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion) + unsigned int dio_flags) { struct iomap_dio *dio;
- dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion); + dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags); if (IS_ERR_OR_NULL(dio)) return PTR_ERR_OR_ZERO(dio); return iomap_dio_complete(dio); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9f52365995c7..52643eac5d46 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -263,8 +263,7 @@ xfs_file_dio_aio_read( ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); if (ret) return ret; - ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0); xfs_iunlock(ip, XFS_IOLOCK_SHARED);
return ret; @@ -651,7 +650,7 @@ xfs_file_dio_aio_write( */ ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops, &xfs_dio_write_ops, - is_sync_kiocb(iocb) || unaligned_io); + unaligned_io ? IOMAP_DIO_FORCE_WAIT : 0); out: if (iolock) xfs_iunlock(ip, iolock); diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 1e53976d5975..bcdc14e99bb1 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -932,7 +932,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) ret = zonefs_file_dio_append(iocb, from); else ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops, - &zonefs_write_dio_ops, sync); + &zonefs_write_dio_ops, 0); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0) @@ -1067,7 +1067,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } file_accessed(iocb->ki_filp); ret = iomap_dio_rw(iocb, to, &zonefs_read_iomap_ops, - &zonefs_read_dio_ops, is_sync_kiocb(iocb)); + &zonefs_read_dio_ops, 0); } else { ret = generic_file_read_iter(iocb, to); if (ret == -EIO) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 78520f28806a..2476ec96b4e5 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -291,12 +291,18 @@ struct iomap_dio_ops { struct bio *bio, loff_t file_offset); };
+/* + * Wait for the I/O to complete in iomap_dio_rw even if the kiocb is not + * synchronous. + */ +#define IOMAP_DIO_FORCE_WAIT (1 << 0) + ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion); + unsigned int dio_flags); struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion); + unsigned int dio_flags); ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin);
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
--------------------------------
It will be more efficient to execute quick endio process(eg. non-sync overwriting case) under irq process rather than starting a worker to do it. Enable IOMAP_DIO_INLINE_COMP to control writing DIO to be finished inline (under irq context), which can be used for non-sync overwriting case. Besides, skip invalidating pages if DIO is finished inline, which will keep the same logic with dio_bio_end_aio in non-sync overwriting case.
Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/iomap/direct-io.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index d4cf2481ecf4..6cd4357a9781 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -110,7 +110,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) * zeros from unwritten extents. */ if (!dio->error && dio->size && - (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) { + (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages && + !(dio->flags & IOMAP_DIO_INLINE_COMP)) { int err; err = invalidate_inode_pages2_range(inode->i_mapping, offset >> PAGE_SHIFT, @@ -124,8 +125,10 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) * If this is a DSYNC write, make sure we push it to stable storage now * that we've written data. */ - if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC)) + if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC)) { + WARN_ON_ONCE(dio->flags & IOMAP_DIO_INLINE_COMP); ret = generic_write_sync(iocb, ret); + }
kfree(dio);
@@ -488,6 +491,10 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, /* for data sync or sync, we need sync completion processing */ if (iocb->ki_flags & IOCB_DSYNC) dio->flags |= IOMAP_DIO_NEED_SYNC; + else if (dio_flags & IOMAP_DIO_INLINE_COMP) { + /* writes could complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + }
/* * For datasync only writes, we optimistically try using FUA for
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I90ZB5 CVE: NA
--------------------------------
In DIO overwriting case, there is no need to convert unwritten exntents and ext4_handle_inode_extension() can be ignored, which means that endio process can be executed under irq context. Since commit 240930fb7e6b5 ("ext4: dio take shared inode lock when overwriting preallocated blocks") has provided a method to judge whether overwriting is happening, just do nothing in endio process if DIO overwriting happens. This patch enables ext4 processing endio under irq context in DIO overwriting case, which brings a performance improvement in the following fio test on a x86 physical machine with nvme when irq and fio run on the same cpu:
Test: fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=2G -numjobs=1 -overwrite=1 -time_based -runtime=60 -group_reporting -filename=/test/test -name=Rand_write_Testing --cpus_allowed=1
before: 953 MiB/s after: 1350 MiB/s, ~41% perf improvement.
Suggested-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com --- fs/ext4/file.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 661141038e75..66ed9288c75c 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -478,8 +478,10 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) loff_t offset = iocb->ki_pos; size_t count = iov_iter_count(from); const struct iomap_ops *iomap_ops = &ext4_iomap_ops; + const struct iomap_dio_ops *iomap_dops = &ext4_dio_write_ops; bool extend = false, unaligned_io = false; bool ilock_shared = true; + int dio_flags = 0;
/* * We initially start with shared inode lock unless it is @@ -569,10 +571,13 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) ext4_journal_stop(handle); }
- if (ilock_shared) + if (ilock_shared) { iomap_ops = &ext4_iomap_overwrite_ops; - ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, - (unaligned_io || extend) ? IOMAP_DIO_FORCE_WAIT : 0); + iomap_dops = NULL; + dio_flags = IOMAP_DIO_INLINE_COMP; + } else if (unaligned_io || extend) + dio_flags |= IOMAP_DIO_FORCE_WAIT; + ret = iomap_dio_rw(iocb, from, iomap_ops, iomap_dops, dio_flags); if (ret == -ENOTBLK) ret = 0;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5701 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/O...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5701 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/O...