Drop unnecessary state_lock in ifs, which could improve some (~10%) buffer write performance. I tested it through UnixBench on my x86 and arm64 virtual machine with 50GB ramdisk & xfs filesystem, the results shows below.
1. UnixBench Test on xfs: ./Run -i 1 -c 1 fstime-w
Before: x86 File Write 1024 bufsize 2000 maxblocks 524708.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 801965.0 KBps
After: x86 File Write 1024 bufsize 2000 maxblocks 571315.0 KBps arm64 File Write 1024 bufsize 2000 maxblocks 876077.0 KBps
2. Libmacro test in ext4: Before: Running: pwrite_u1k# bin/pwrite -E -C 200 -L -S -W -N pwrite_u1k -s 1k -I 500 -f /var/tmp/libmicro.1240/data prc thr usecs/call samples errors cnt/samp size pwrite_u1k 1 1 0.86016 167 0 200 1024
After: Running: pwrite_u1k# bin/pwrite -E -C 200 -L -S -W -N pwrite_u1k -s 1k -I 500 -f /var/tmp/libmicro.1957/data prc thr usecs/call samples errors cnt/samp size pwrite_u1k 1 1 0.7699 190 0 200 1024
Thanks, Yi.
Matthew Wilcox (Oracle) (2): iomap: hold state_lock over call to ifs_set_range_uptodate() iomap: protect read_bytes_pending with the state_lock
Zhang Yi (2): iomap: drop unnecessary state_lock when setting ifs uptodate bits iomap: drop unnecessary state_lock when changing ifs dirty bits
fs/iomap/buffered-io.c | 73 +++++++++++++++++++++++++----------------- 1 file changed, 43 insertions(+), 30 deletions(-)
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-v6.7-rc1 commit 279d5fc3227f04ef2c6125e5c440e7952173a89a category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "Add folio_end_read", v2.
The core of this patchset is the new folio_end_read() call which filesystems can use when finishing a page cache read instead of separate calls to mark the folio uptodate and unlock it. As an illustration of its use, I converted ext4, iomap & mpage; more can be converted.
I think that's useful by itself, but the interesting optimisation is that we can implement that with a single XOR instruction that sets the uptodate bit, clears the lock bit, tests the waiter bit and provides a write memory barrier. That removes one memory barrier and one atomic instruction from each page read, which seems worth doing. That's in patch 15.
The last two patches could be a separate series, but basically we can do the same thing with the writeback flag that we do with the unlock flag; clear it and test the waiters bit at the same time.
This patch (of 17):
This is really preparation for the next patch, but it lets us call folio_mark_uptodate() in just one place instead of two.
Link: https://lkml.kernel.org/r/20231004165317.1061855-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231004165317.1061855-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Nicholas Piggin npiggin@gmail.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Richard Henderson richard.henderson@linaro.org Cc: Ivan Kokshaysky ink@jurassic.park.msu.ru Cc: Matt Turner mattst88@gmail.com Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: Michael Ellerman mpe@ellerman.id.au Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Albert Ou aou@eecs.berkeley.edu Cc: Heiko Carstens hca@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Andrew Morton akpm@linux-foundation.org --- fs/iomap/buffered-io.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 205f504b2dfb..bb96ac4c408f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -57,30 +57,32 @@ static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs, return test_bit(block, ifs->state); }
-static void ifs_set_range_uptodate(struct folio *folio, +static bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) { struct inode *inode = folio->mapping->host; unsigned int first_blk = off >> inode->i_blkbits; unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk + 1; - unsigned long flags;
- spin_lock_irqsave(&ifs->state_lock, flags); bitmap_set(ifs->state, first_blk, nr_blks); - if (ifs_is_fully_uptodate(folio, ifs)) - folio_mark_uptodate(folio); - spin_unlock_irqrestore(&ifs->state_lock, flags); + return ifs_is_fully_uptodate(folio, ifs); }
static void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len) { struct iomap_folio_state *ifs = folio->private; + unsigned long flags; + bool uptodate = true;
- if (ifs) - ifs_set_range_uptodate(folio, ifs, off, len); - else + if (ifs) { + spin_lock_irqsave(&ifs->state_lock, flags); + uptodate = ifs_set_range_uptodate(folio, ifs, off, len); + spin_unlock_irqrestore(&ifs->state_lock, flags); + } + + if (uptodate) folio_mark_uptodate(folio); }
From: "Matthew Wilcox (Oracle)" willy@infradead.org
mainline inclusion from mainline-v6.7-rc1 commit f45b494e2a24d86afd79cab7c343b414c5213447 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Perform one atomic operation (acquiring the spinlock) instead of two (spinlock & atomic_sub) per read completion.
Link: https://lkml.kernel.org/r/20231004165317.1061855-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Christophe Leroy christophe.leroy@csgroup.eu Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Heiko Carstens hca@linux.ibm.com Cc: Ivan Kokshaysky ink@jurassic.park.msu.ru Cc: Matt Turner mattst88@gmail.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Nicholas Piggin npiggin@gmail.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Richard Henderson richard.henderson@linaro.org Cc: Sven Schnelle svens@linux.ibm.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org --- fs/iomap/buffered-io.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index bb96ac4c408f..b61b3e7ed3bd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -29,9 +29,9 @@ typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length); * and I/O completions. */ struct iomap_folio_state { - atomic_t read_bytes_pending; - atomic_t write_bytes_pending; spinlock_t state_lock; + unsigned int read_bytes_pending; + atomic_t write_bytes_pending;
/* * Each block has two bits in this bitmap: @@ -255,7 +255,7 @@ static void ifs_free(struct folio *folio)
if (!ifs) return; - WARN_ON_ONCE(atomic_read(&ifs->read_bytes_pending)); + WARN_ON_ONCE(ifs->read_bytes_pending != 0); WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending)); WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) != folio_test_uptodate(folio)); @@ -322,19 +322,29 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, *lenp = plen; }
-static void iomap_finish_folio_read(struct folio *folio, size_t offset, +static void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len, int error) { struct iomap_folio_state *ifs = folio->private; + bool uptodate = !error; + bool finished = true;
- if (unlikely(error)) { - folio_clear_uptodate(folio); - folio_set_error(folio); - } else { - iomap_set_range_uptodate(folio, offset, len); + if (ifs) { + unsigned long flags; + + spin_lock_irqsave(&ifs->state_lock, flags); + if (!error) + uptodate = ifs_set_range_uptodate(folio, ifs, off, len); + ifs->read_bytes_pending -= len; + finished = !ifs->read_bytes_pending; + spin_unlock_irqrestore(&ifs->state_lock, flags); }
- if (!ifs || atomic_sub_and_test(len, &ifs->read_bytes_pending)) + if (error) + folio_set_error(folio); + if (uptodate) + folio_mark_uptodate(folio); + if (finished) folio_unlock(folio); }
@@ -432,8 +442,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, }
ctx->cur_folio_in_bio = true; - if (ifs) - atomic_add(plen, &ifs->read_bytes_pending); + if (ifs) { + spin_lock_irq(&ifs->state_lock); + ifs->read_bytes_pending += plen; + spin_unlock_irq(&ifs->state_lock); + }
sector = iomap_sector(iomap, pos); if (!ctx->bio ||
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA
--------------------------------
Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a race issue when submitting multiple read bios for a page spans more than one file system block(which names state_lock now) by adding a spinlock to make the page uptodate synchronous. However, the race condition only happened between the read I/O submitting and completeing threads, it's sufficient to use page lock to protect other paths. So there is no need to add this spinlock in the write page uptodating path, drop it could reduce some unnecessary locking overhead.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index b61b3e7ed3bd..a577fcdf20a6 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -73,14 +73,10 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len) { struct iomap_folio_state *ifs = folio->private; - unsigned long flags; bool uptodate = true;
- if (ifs) { - spin_lock_irqsave(&ifs->state_lock, flags); + if (ifs) uptodate = ifs_set_range_uptodate(folio, ifs, off, len); - spin_unlock_irqrestore(&ifs->state_lock, flags); - }
if (uptodate) folio_mark_uptodate(folio); @@ -437,7 +433,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
if (iomap_block_needs_zeroing(iter, pos)) { folio_zero_range(folio, poff, plen); - iomap_set_range_uptodate(folio, poff, plen); + if (ifs) { + /* + * Hold state_lock to protect from submitting multiple + * bios for the same locked folio and then get multiple + * callbacks in parallel. + */ + spin_lock_irq(&ifs->state_lock); + iomap_set_range_uptodate(folio, poff, plen); + spin_unlock_irq(&ifs->state_lock); + } else { + folio_mark_uptodate(folio); + } goto done; }
hulk inclusion category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/IAGIZR CVE: NA
--------------------------------
Hold the state_lock when set and clear ifs dirty bits is unnecessary since both paths are protected under folio lock, so it's safe to drop the state_lock, which could reduce some unnecessary locking overhead and improve the buffer write performance a bit.
Signed-off-by: Zhang Yi yi.zhang@huawei.com --- fs/iomap/buffered-io.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a577fcdf20a6..31257176584e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -137,14 +137,8 @@ static void ifs_clear_range_dirty(struct folio *folio, unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode)); unsigned int last_blk = (off + len) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk; - unsigned long flags;
- if (!nr_blks) - return; - - spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); - spin_unlock_irqrestore(&ifs->state_lock, flags); }
static void iomap_clear_range_dirty(struct folio *folio, size_t off, size_t len) @@ -163,11 +157,8 @@ static void ifs_set_range_dirty(struct folio *folio, unsigned int first_blk = (off >> inode->i_blkbits); unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; unsigned int nr_blks = last_blk - first_blk + 1; - unsigned long flags;
- spin_lock_irqsave(&ifs->state_lock, flags); bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); - spin_unlock_irqrestore(&ifs->state_lock, flags); }
static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/10409 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/T...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/10409 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/T...