[PATCH OLK-6.6 0/4] ext4: improve buffered write with more than one blocks_per_folio part 2

Drop unnecessary state_lock in ifs, which could improve some (~10%) buffer write performance. I tested it through UnixBench on my x86 and arm64 virtual machine with 50GB ramdisk & xfs filesystem, the results shows below. 1. UnixBench Test on xfs: ./Run -i 1 -c 1 fstime-w Before: x86 File Write 1024 bufsize 2000 maxblocks 524708.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 801965.0 KBps After: x86 File Write 1024 bufsize 2000 maxblocks 571315.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 876077.0 KBps 2. Libmacro test in ext4: Before: Running: pwrite_u1k# bin/pwrite -E -C 200 -L -S -W -N pwrite_u1k -s 1k -I 500 -f /var/tmp/libmicro.1240/data prc thr usecs/call samples errors cnt/samp size pwrite_u1k 1 1 0.86016 167 0 200 1024 After: Running: pwrite_u1k# bin/pwrite -E -C 200 -L -S -W -N pwrite_u1k -s 1k -I 500 -f /var/tmp/libmicro.1957/data prc thr usecs/call samples errors cnt/samp size pwrite_u1k 1 1 0.7699 190 0 200 1024 Thanks, Yi. Matthew Wilcox (Oracle) (2): iomap: hold state_lock over call to ifs_set_range_uptodate() iomap: protect read_bytes_pending with the state_lock Zhang Yi (2): iomap: drop unnecessary state_lock when setting ifs uptodate bits iomap: drop unnecessary state_lock when changing ifs dirty bits fs/iomap/buffered-io.c | 73 +++++++++++++++++++++++++----------------- 1 file changed, 43 insertions(+), 30 deletions(-) -- 2.39.2

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.7-rc1 commit 279d5fc3227f04ef2c6125e5c440e7952173a89a category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "Add folio_end_read", v2. The core of this patchset is the new folio_end_read() call which filesystems can use when finishing a page cache read instead of separate calls to mark the folio uptodate and unlock it. As an illustration of its use, I converted ext4, iomap & mpage; more can be converted. I think that's useful by itself, but the interesting optimisation is that we can implement that with a single XOR instruction that sets the uptodate bit, clears the lock bit, tests the waiter bit and provides a write memory barrier. That removes one memory barrier and one atomic instruction from each page read, which seems worth doing. That's in patch 15. The last two patches could be a separate series, but basically we can do the same thing with the writeback flag that we do with the unlock flag; clear it and test the waiters bit at the same time. This patch (of 17): This is really preparation for the next patch, but it lets us call folio_mark_uptodate() in just one place instead of two. Link: https://lkml.kernel.org/r/20231004165317.1061855-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231004165317.1061855-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/iomap/buffered-io.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 205f504b2dfb..bb96ac4c408f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -57,30 +57,32 @@ static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs, return test_bit(block, ifs->state); } -static void ifs_set_range_uptodate(struct folio *folio, +static bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) { struct inode *inode = folio->mapping->host; unsigned int first_blk = off >> inode->i_blkbits; unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk + 1; - unsigned long flags; - spin_lock_irqsave(&ifs->state_lock, flags); bitmap_set(ifs->state, first_blk, nr_blks); - if (ifs_is_fully_uptodate(folio, ifs)) - folio_mark_uptodate(folio); - spin_unlock_irqrestore(&ifs->state_lock, flags); + return ifs_is_fully_uptodate(folio, ifs); } static void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len) { struct iomap_folio_state *ifs = folio->private; + unsigned long flags; + bool uptodate = true; - if (ifs) - ifs_set_range_uptodate(folio, ifs, off, len); - else + if (ifs) { + spin_lock_irqsave(&ifs->state_lock, flags); + uptodate = ifs_set_range_uptodate(folio, ifs, off, len); + spin_unlock_irqrestore(&ifs->state_lock, flags); + } + + if (uptodate) folio_mark_uptodate(folio); } -- 2.39.2

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v6.7-rc1 commit f45b494e2a24d86afd79cab7c343b414c5213447 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Perform one atomic operation (acquiring the spinlock) instead of two (spinlock & atomic_sub) per read completion. Link: https://lkml.kernel.org/r/20231004165317.1061855-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/iomap/buffered-io.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index bb96ac4c408f..b61b3e7ed3bd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -29,9 +29,9 @@ typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length); * and I/O completions. */ struct iomap_folio_state { - atomic_t read_bytes_pending; - atomic_t write_bytes_pending; spinlock_t state_lock; + unsigned int read_bytes_pending; + atomic_t write_bytes_pending; /* * Each block has two bits in this bitmap: @@ -255,7 +255,7 @@ static void ifs_free(struct folio *folio) if (!ifs) return; - WARN_ON_ONCE(atomic_read(&ifs->read_bytes_pending)); + WARN_ON_ONCE(ifs->read_bytes_pending != 0); WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending)); WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) != folio_test_uptodate(folio)); @@ -322,19 +322,29 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, *lenp = plen; } -static void iomap_finish_folio_read(struct folio *folio, size_t offset, +static void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len, int error) { struct iomap_folio_state *ifs = folio->private; + bool uptodate = !error; + bool finished = true; - if (unlikely(error)) { - folio_clear_uptodate(folio); - folio_set_error(folio); - } else { - iomap_set_range_uptodate(folio, offset, len); + if (ifs) { + unsigned long flags; + + spin_lock_irqsave(&ifs->state_lock, flags); + if (!error) + uptodate = ifs_set_range_uptodate(folio, ifs, off, len); + ifs->read_bytes_pending -= len; + finished = !ifs->read_bytes_pending; + spin_unlock_irqrestore(&ifs->state_lock, flags); } - if (!ifs || atomic_sub_and_test(len, &ifs->read_bytes_pending)) + if (error) + folio_set_error(folio); + if (uptodate) + folio_mark_uptodate(folio); + if (finished) folio_unlock(folio); } @@ -432,8 +442,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, } ctx->cur_folio_in_bio = true; - if (ifs) - atomic_add(plen, &ifs->read_bytes_pending); + if (ifs) { + spin_lock_irq(&ifs->state_lock); + ifs->read_bytes_pending += plen; + spin_unlock_irq(&ifs->state_lock); + } sector = iomap_sector(iomap, pos); if (!ctx->bio || -- 2.39.2

hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA -------------------------------- Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a race issue when submitting multiple read bios for a page spans more than one file system block(which names state_lock now) by adding a spinlock to make the page uptodate synchronous. However, the race condition only happened between the read I/O submitting and completeing threads, it's sufficient to use page lock to protect other paths. So there is no need to add this spinlock in the write page uptodating path, drop it could reduce some unnecessary locking overhead. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> --- fs/iomap/buffered-io.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index b61b3e7ed3bd..a577fcdf20a6 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -73,14 +73,10 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len) { struct iomap_folio_state *ifs = folio->private; - unsigned long flags; bool uptodate = true; - if (ifs) { - spin_lock_irqsave(&ifs->state_lock, flags); + if (ifs) uptodate = ifs_set_range_uptodate(folio, ifs, off, len); - spin_unlock_irqrestore(&ifs->state_lock, flags); - } if (uptodate) folio_mark_uptodate(folio); @@ -437,7 +433,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, if (iomap_block_needs_zeroing(iter, pos)) { folio_zero_range(folio, poff, plen); - iomap_set_range_uptodate(folio, poff, plen); + if (ifs) { + /* + * Hold state_lock to protect from submitting multiple + * bios for the same locked folio and then get multiple + * callbacks in parallel. + */ + spin_lock_irq(&ifs->state_lock); + iomap_set_range_uptodate(folio, poff, plen); + spin_unlock_irq(&ifs->state_lock); + } else { + folio_mark_uptodate(folio); + } goto done; } -- 2.39.2

hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA -------------------------------- Hold the state_lock when set and clear ifs dirty bits is unnecessary since both paths are protected under folio lock, so it's safe to drop the state_lock, which could reduce some unnecessary locking overhead and improve the buffer write performance a bit. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> --- fs/iomap/buffered-io.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a577fcdf20a6..31257176584e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -137,14 +137,8 @@ static void ifs_clear_range_dirty(struct folio *folio, unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode)); unsigned int last_blk = (off + len) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk; - unsigned long flags; - if (!nr_blks) - return; - - spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); - spin_unlock_irqrestore(&ifs->state_lock, flags); } static void iomap_clear_range_dirty(struct folio *folio, size_t off, size_t len) @@ -163,11 +157,8 @@ static void ifs_set_range_dirty(struct folio *folio, unsigned int first_blk = (off >> inode->i_blkbits); unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk + 1; - unsigned long flags; - spin_lock_irqsave(&ifs->state_lock, flags); bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); - spin_unlock_irqrestore(&ifs->state_lock, flags); } static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len) -- 2.39.2

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10409 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/T... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10409 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/T...
participants (2)
-
patchwork bot
-
Zhang Yi