[PATCH OLK-6.6 0/3] some readahead improvement

Jan Kara (3): readahead: make sure sync readahead reads needed page readahead: don't shorten readahead window in read_pages() readahead: properly shorten readahead when falling back to do_page_cache_ra() mm/readahead.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) -- 2.25.1

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v6.11-rc1 commit 8051b82a0be05751e41be1dfa4201c131e589450 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB9L9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------- Patch series "mm: Fix various readahead quirks". When we were internally testing performance of recent kernels, we have noticed quite variable performance of readahead arising from various quirks in readahead code. So I went on a cleaning spree there. This is a batch of patches resulting out of that. A quick testing in my test VM with the following fio job file: [global] direct=0 ioengine=sync invalidate=1 blocksize=4k size=10g readwrite=read [reader] numjobs=1 shows that this patch series improves the throughput from variable one in 310-340 MB/s range to rather stable one at 350 MB/s. As a side effect these cleanups also address the issue noticed by Bruz Zhang [1]. [1] https://lore.kernel.org/all/20240618114941.5935-1-zhangpengpeng0808@gmail.co... Zhang Peng reported: : I test this batch of patch with fio, it indeed has a huge sppedup : in sequential read when block size is 4KiB. The result as follow, : for async read, iodepth is set to 128, and other settings : are self-evident. : : casename upstream withFix speedup : ---------------- -------- -------- ------- : randread-4k-sync 48991 47 : seqread-4k-sync 1162758 14229 : seqread-1024k-sync 1460208 1452522 : randread-4k-libaio 47467 4730 : randread-4k-posixaio 49190 49512 : seqread-4k-libaio 1085932 1234635 : seqread-1024k-libaio 1423341 1402214 -1 : seqread-4k-posixaio 1165084 1369613 1 : seqread-1024k-posixaio 1435422 1408808 -1.8 This patch (of 10): page_cache_sync_ra() is called when a folio we want to read is not in the page cache. It is expected that it creates the folio (and perhaps the following folios as well) and submits reads for them unless some error happens. However if index == ra->start + ra->size, ondemand_readahead() will treat the call as another async readahead hit. Thus ra->start will be advanced and we create pages and queue reads from ra->start + ra->size further. Consequentially the page at 'index' is not created and filemap_get_pages() has to always go through filemap_create_folio() path. This behavior has particularly unfortunate consequences when we have two IO threads sequentially reading from a shared file (as is the case when NFS serves sequential reads). In that case what can happen is: suppose ra->size == ra->async_size == 128, ra->start = 512 T1 T2 reads 128 pages at index 512 - hits async readahead mark filemap_readahead() ondemand_readahead() if (index == expected ...) ra->start = 512 + 128 = 640 ra->size = 128 ra->async_size = 128 page_cache_ra_order() blocks in ra_alloc_folio() reads 128 pages at index 640 - no page found page_cache_sync_readahead() ondemand_readahead() if (index == expected ...) ra->start = 640 + 128 = 768 ra->size = 128 ra->async_size = 128 page_cache_ra_order() submits reads from 768 - still no page found at index 640 filemap_create_folio() - goes on to index 641 page_cache_sync_readahead() ondemand_readahead() - founds ra is confused, trims is to small size finds pages were already inserted And as a result read performance suffers. Fix the problem by triggering async readahead case in ondemand_readahead() only if we are calling the function because we hit the readahead marker. In any other case we need to read the folio at 'index' and thus we cannot really use the current ra state. Note that the above situation could be viewed as a special case of file->f_ra state corruption. In fact two thread reading using the shared file can also seemingly corrupt file->f_ra in interesting ways due to concurrent access. I never saw that in practice and the fix is going to be much more complex so for now at least fix this practical problem while we ponder about the theoretically correct solution. Link: https://lkml.kernel.org/r/20240625100859.15507-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240625101909.12234-1-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Tested-by: Zhang Peng <zhangpengpeng0808@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/readahead.c [context conflict.] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/readahead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/readahead.c b/mm/readahead.c index 87609bd117..5aba3a9973 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -597,7 +597,7 @@ static void ondemand_readahead(struct readahead_control *ractl, */ expected = round_down(ra->start + ra->size - ra->async_size, 1UL << order); - if (index == expected || index == (ra->start + ra->size)) { + if (folio && index == expected) { ra->start += ra->size; /* * In the case of MADV_HUGEPAGE, the actual size might exceed -- 2.25.1

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v6.14-rc1 commit 7a1eb89f79188ec32599064eca7c14d42260760a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBRWBB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Patch series "readahead: Reintroduce fix for improper RA window sizing". This small patch series reintroduces a fix of readahead window confusion (and thus read throughput reduction) when page_cache_ra_order() ends up failing due to folios already present in the page cache. After thinking about this for a while I have ended up with a dumb fix that just rechecks if we have something to read before calling do_page_cache_ra(). This fixes the problem reported in [1]. I still think it doesn't make much sense to update readahead window size in read_pages() so patch 1 removes that but the real fix in patch 2 does not depend on it. [1] https://lore.kernel.org/all/49648605-d800-4859-be49-624bbe60519d@gmail.com This patch (of 2): When ->readahead callback doesn't read all requested pages, read_pages() shortens the readahead window (ra->size). However we don't know why pages were not read and what appropriate window size is. So don't try to secondguess the filesystem. If it needs different readahead window, it should set it manually similarly as during expansion the filesystem can use readahead_expand(). Link: https://lkml.kernel.org/r/20241204181016.15273-1-jack@suse.cz Link: https://lkml.kernel.org/r/20241204181016.15273-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/readahead.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 5aba3a9973..7d36a29e1d 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -164,20 +164,10 @@ static void read_pages(struct readahead_control *rac) if (aops->readahead) { aops->readahead(rac); - /* - * Clean up the remaining folios. The sizes in ->ra - * may be used to size the next readahead, so make sure - * they accurately reflect what happened. - */ + /* Clean up the remaining folios. */ while ((folio = readahead_folio(rac)) != NULL) { - unsigned long nr = folio_nr_pages(folio); - folio_get(folio); - rac->ra->size -= nr; - if (rac->ra->async_size >= nr) { - rac->ra->async_size -= nr; - filemap_remove_folio(folio); - } + filemap_remove_folio(folio); folio_unlock(folio); folio_put(folio); } -- 2.25.1

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v6.14-rc1 commit d5ea5e5e50dffd13fccedb690a0a1f27be56191a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBRWBB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When we succeed in creating some folios in page_cache_ra_order() but then need to fallback to single page folios, we don't shorten the amount to read passed to do_page_cache_ra() by the amount we've already read. This then results in reading more and also in placing another readahead mark in the middle of the readahead window which confuses readahead code. Fix the problem by properly reducing number of pages to read. Unlike previous attempt at this fix (commit 7c877586da31) which had to be reverted, we are now careful to check there is indeed something to read so that we don't submit negative-sized readahead. Link: https://lkml.kernel.org/r/20241204181016.15273-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/readahead.c [context conflict, 26cfdb395eef("readahead: allocate folios with mapping_min_order in readahead") net merged] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> --- mm/readahead.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 7d36a29e1d..69757a6249 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -487,7 +487,8 @@ void page_cache_ra_order(struct readahead_control *ractl, struct file_ra_state *ra, unsigned int new_order) { struct address_space *mapping = ractl->mapping; - pgoff_t index = readahead_index(ractl); + pgoff_t start = readahead_index(ractl); + pgoff_t index = start; pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; unsigned int nofs; @@ -546,12 +547,18 @@ void page_cache_ra_order(struct readahead_control *ractl, /* * If there were already pages in the page cache, then we may have * left some gaps. Let the regular readahead code take care of this - * situation. + * situation below. */ if (!err) return; fallback: - do_page_cache_ra(ractl, ra->size, ra->async_size); + /* + * ->readahead() may have updated readahead window size so we have to + * check there's still something to read. + */ + if (ra->size > index - start) + do_page_cache_ra(ractl, ra->size - (index - start), + ra->async_size); } /* -- 2.25.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/15470 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/L7B... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/15470 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/L7B...
participants (2)
-
patchwork bot
-
Tong Tiangen