[PATCH OLK-6.6 v2 00/55] fuse: support large folios

Changes since v1: * Add patch 55 to fix read-write inconsistency issue. Port the fuse large folios support patches to OLK6.6. This code hasn't been system tested and is just for exploration and verification, not for merging into the release. Amir Goldstein (1): fuse: allocate ff->release_args only if release is needed Baokun Li (1): fuse: avoid writing more pages than max_pages Bernd Schubert (1): fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure David Howells (1): iov_iter: Provide copy_folio_from_iter() Joanne Koong (32): fuse: convert fuse_writepages_fill() to use a folio for its tmp page fuse: drop unused fuse_mount arg in fuse_writepage_finish() fuse: refactor finished writeback stats updates into helper function fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: add support in virtio for requests using folios fuse: convert cuse to use folios fuse: convert readlink to use folios fuse: convert readdir to use folios fuse: convert reads to use folios fuse: convert writes (non-writeback) to use folios fuse: convert ioctls to use folios fuse: convert retrieves to use folios fuse: revert back to __readahead_folio() for readahead mm/writeback: add folio_mark_dirty_lock() fuse: move initialization of fuse_file to fuse_writepages() instead of in callback fuse: move fuse file initialization to wpa allocation time fuse: refactor out shared logic in fuse_writepages_fill() and fuse_writepage_locked() fuse: convert writebacks to use folios fuse: convert direct io to use folios fuse: remove pages for requests and exclusively use folios fuse: fix direct io folio offset and length calculation fuse: support copying large folios fuse: support large folios for retrieves fuse: refactor fuse_fill_write_pages() fuse: support large folios for writethrough writes fuse: support large folios for folio reads fuse: support large folios for symlinks fuse: support large folios for stores fuse: support large folios for queued writes fuse: support large folios for readahead fuse: optimize direct io large folios processing fuse: enable dynamic configuration of fuse max pages limit (FUSE_MAX_MAX_PAGES) Josef Bacik (10): fuse: use fuse_range_is_writeback() instead of iterating pages fuse: convert readahead to use folios fuse: convert fuse_send_write_pages to use folios fuse: convert fuse_fill_write_pages to use folios fuse: convert fuse_page_mkwrite to use folios fuse: convert fuse_do_readpage to use folios fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_retrieve to use folios fuse: convert fuse_notify_store to use folios fuse: use the folio based vmstat helpers Lei Huang (1): fuse: Fix missing FOLL_PIN for direct-io Matthew Wilcox (Oracle) (6): fuse: Remove fuse_writepage fuse: Convert fuse_writepage_locked to take a folio mm: add folio_end_read() fuse: Convert fuse_readpages_end() to use folio_end_read() fuse: Convert fuse_write_end() to use a folio fuse: Convert fuse_write_begin() to use a folio Miklos Szeredi (1): fuse: clear PG_uptodate when using a stolen page Vivek Kasireddy (1): mm/gup: introduce unpin_folio/unpin_folios helpers Documentation/admin-guide/sysctl/fs.rst | 10 + fs/fuse/Makefile | 1 + fs/fuse/cuse.c | 29 +- fs/fuse/dev.c | 163 ++--- fs/fuse/dir.c | 28 +- fs/fuse/file.c | 774 +++++++++++++----------- fs/fuse/fuse_i.h | 51 +- fs/fuse/inode.c | 11 +- fs/fuse/ioctl.c | 33 +- fs/fuse/readdir.c | 18 +- fs/fuse/sysctl.c | 40 ++ fs/fuse/virtio_fs.c | 52 +- include/linux/mm.h | 3 + include/linux/pagemap.h | 1 + include/linux/uio.h | 6 + mm/filemap.c | 22 + mm/folio-compat.c | 6 + mm/gup.c | 47 ++ mm/page-writeback.c | 22 +- mm/util.c | 1 + 20 files changed, 789 insertions(+), 529 deletions(-) create mode 100644 fs/fuse/sysctl.c -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Use filemap_migrate_folio() to support dirty folio migration instead of writepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 30 +----------------------------- 1 file changed, 1 insertion(+), 29 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index fca2be898336..f57341f6e867 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2040,34 +2040,6 @@ static int fuse_writepage_locked(struct page *page) return error; } -static int fuse_writepage(struct page *page, struct writeback_control *wbc) -{ - struct fuse_conn *fc = get_fuse_conn(page->mapping->host); - int err; - - if (fuse_page_is_writeback(page->mapping->host, page->index)) { - /* - * ->writepages() should be called for sync() and friends. We - * should only get here on direct reclaim and then we are - * allowed to skip a page which is already in flight - */ - WARN_ON(wbc->sync_mode == WB_SYNC_ALL); - - redirty_page_for_writepage(wbc, page); - unlock_page(page); - return 0; - } - - if (wbc->sync_mode == WB_SYNC_NONE && - fc->num_background >= fc->congestion_threshold) - return AOP_WRITEPAGE_ACTIVATE; - - err = fuse_writepage_locked(page); - unlock_page(page); - - return err; -} - struct fuse_fill_wb_data { struct fuse_writepage_args *wpa; struct fuse_file *ff; @@ -3258,10 +3230,10 @@ static const struct file_operations fuse_file_operations = { static const struct address_space_operations fuse_file_aops = { .read_folio = fuse_read_folio, .readahead = fuse_readahead, - .writepage = fuse_writepage, .writepages = fuse_writepages, .launder_folio = fuse_launder_folio, .dirty_folio = filemap_dirty_folio, + .migrate_folio = filemap_migrate_folio, .bmap = fuse_bmap, .direct_IO = fuse_direct_IO, .write_begin = fuse_write_begin, -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> The one remaining caller of fuse_writepage_locked() already has a folio, so convert this function entirely. Saves a few calls to compound_head() but no attempt is made to support large folios in this patch. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: mm/util.c --- fs/fuse/file.c | 30 +++++++++++++++--------------- mm/util.c | 1 + 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f57341f6e867..3f4ac7a497ac 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1976,26 +1976,26 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, rcu_read_unlock(); } -static int fuse_writepage_locked(struct page *page) +static int fuse_writepage_locked(struct folio *folio) { - struct address_space *mapping = page->mapping; + struct address_space *mapping = folio->mapping; struct inode *inode = mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_writepage_args *wpa; struct fuse_args_pages *ap; - struct page *tmp_page; + struct folio *tmp_folio; int error = -ENOMEM; - set_page_writeback(page); + folio_start_writeback(folio); wpa = fuse_writepage_args_alloc(); if (!wpa) goto err; ap = &wpa->ia.ap; - tmp_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM); - if (!tmp_page) + tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); + if (!tmp_folio) goto err_free; error = -EIO; @@ -2004,21 +2004,21 @@ static int fuse_writepage_locked(struct page *page) goto err_nofile; fuse_writepage_add_to_bucket(fc, wpa); - fuse_write_args_fill(&wpa->ia, wpa->ia.ff, page_offset(page), 0); + fuse_write_args_fill(&wpa->ia, wpa->ia.ff, folio_pos(folio), 0); - copy_highpage(tmp_page, page); + folio_copy(tmp_folio, folio); wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE; wpa->next = NULL; ap->args.in_pages = true; ap->num_pages = 1; - ap->pages[0] = tmp_page; + ap->pages[0] = &tmp_folio->page; ap->descs[0].offset = 0; ap->descs[0].length = PAGE_SIZE; ap->args.end = fuse_writepage_end; wpa->inode = inode; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - inc_node_page_state(tmp_page, NR_WRITEBACK_TEMP); + node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); spin_lock(&fi->lock); tree_insert(&fi->writepages, wpa); @@ -2026,17 +2026,17 @@ static int fuse_writepage_locked(struct page *page) fuse_flush_writepages(inode); spin_unlock(&fi->lock); - end_page_writeback(page); + folio_end_writeback(folio); return 0; err_nofile: - __free_page(tmp_page); + folio_put(tmp_folio); err_free: kfree(wpa); err: - mapping_set_error(page->mapping, error); - end_page_writeback(page); + mapping_set_error(folio->mapping, error); + folio_end_writeback(folio); return error; } @@ -2402,7 +2402,7 @@ static int fuse_launder_folio(struct folio *folio) /* Serialize with pending writeback for the same page */ fuse_wait_on_page_writeback(inode, folio->index); - err = fuse_writepage_locked(&folio->page); + err = fuse_writepage_locked(folio); if (!err) fuse_wait_on_page_writeback(inode, folio->index); } diff --git a/mm/util.c b/mm/util.c index f3d6751b2f2a..f9517c0c5eb6 100644 --- a/mm/util.c +++ b/mm/util.c @@ -814,6 +814,7 @@ void folio_copy(struct folio *dst, struct folio *src) cond_resched(); } } +EXPORT_SYMBOL(folio_copy); int folio_mc_copy(struct folio *dst, struct folio *src) { -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Provide a function for filesystems to call when they have finished reading an entire folio. Link: https://lkml.kernel.org/r/20231004165317.1061855-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/pagemap.h | 1 + mm/filemap.c | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index e44e377661f2..e3a9f36ae7f0 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1216,6 +1216,7 @@ static inline void wait_on_page_locked(struct page *page) folio_wait_locked(page_folio(page)); } +void folio_end_read(struct folio *folio, bool success); void wait_on_page_writeback(struct page *page); void folio_wait_writeback(struct folio *folio); int folio_wait_writeback_killable(struct folio *folio); diff --git a/mm/filemap.c b/mm/filemap.c index 77a8947b8e5e..b916692df123 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1542,6 +1542,28 @@ void folio_unlock(struct folio *folio) } EXPORT_SYMBOL(folio_unlock); +/** + * folio_end_read - End read on a folio. + * @folio: The folio. + * @success: True if all reads completed successfully. + * + * When all reads against a folio have completed, filesystems should + * call this function to let the pagecache know that no more reads + * are outstanding. This will unlock the folio and wake up any thread + * sleeping on the lock. The folio will also be marked uptodate if all + * reads succeeded. + * + * Context: May be called from interrupt or process context. May not be + * called from NMI context. + */ +void folio_end_read(struct folio *folio, bool success) +{ + if (likely(success)) + folio_mark_uptodate(folio); + folio_unlock(folio); +} +EXPORT_SYMBOL(folio_end_read); + /** * folio_end_private_2 - Clear PG_private_2 and wake any waiters. * @folio: The folio. -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Nobody checks the error flag on fuse folios, so stop setting it. Optimise the (optional) setting of the uptodate flag and clearing of the lock flag by using folio_end_read(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3f4ac7a497ac..88d3928d07a9 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -921,14 +921,10 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, } for (i = 0; i < ap->num_pages; i++) { - struct page *page = ap->pages[i]; + struct folio *folio = page_folio(ap->pages[i]); - if (!err) - SetPageUptodate(page); - else - SetPageError(page); - unlock_page(page); - put_page(page); + folio_end_read(folio, !err); + folio_put(folio); } if (ia->ff) fuse_file_put(ia->ff, false, false); -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Convert the passed page to a folio and operate on that. Replaces five calls to compound_head() with one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/fuse/file.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 88d3928d07a9..1a0167da9828 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2363,29 +2363,30 @@ static int fuse_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - struct inode *inode = page->mapping->host; + struct folio *folio = page_folio(page); + struct inode *inode = folio->mapping->host; /* Haven't copied anything? Skip zeroing, size extending, dirtying. */ if (!copied) goto unlock; pos += copied; - if (!PageUptodate(page)) { + if (!folio_test_uptodate(folio)) { /* Zero any unwritten bytes at the end of the page */ size_t endoff = pos & ~PAGE_MASK; if (endoff) - zero_user_segment(page, endoff, PAGE_SIZE); - SetPageUptodate(page); + folio_zero_segment(folio, endoff, PAGE_SIZE); + folio_mark_uptodate(folio); } if (pos > inode->i_size) i_size_write(inode, pos); - set_page_dirty(page); + folio_mark_dirty(folio); unlock: - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); return copied; } -- 2.46.1

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Fetch a folio from the page cache instead of a page and use it throughout removing several calls to compound_head() and supporting large folios (in this function). We still have to convert back to a page for calling internal fuse functions, but hopefully they will be converted soon. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/fuse/file.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1a0167da9828..1ab829cc2d16 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2320,41 +2320,42 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping, { pgoff_t index = pos >> PAGE_SHIFT; struct fuse_conn *fc = get_fuse_conn(file_inode(file)); - struct page *page; + struct folio *folio; loff_t fsize; int err = -ENOMEM; WARN_ON(!fc->writeback_cache); - page = grab_cache_page_write_begin(mapping, index); - if (!page) + folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + mapping_gfp_mask(mapping)); + if (IS_ERR(folio)) goto error; - fuse_wait_on_page_writeback(mapping->host, page->index); + fuse_wait_on_page_writeback(mapping->host, folio->index); - if (PageUptodate(page) || len == PAGE_SIZE) + if (folio_test_uptodate(folio) || len >= folio_size(folio)) goto success; /* - * Check if the start this page comes after the end of file, in which - * case the readpage can be optimized away. + * Check if the start of this folio comes after the end of file, + * in which case the readpage can be optimized away. */ fsize = i_size_read(mapping->host); - if (fsize <= (pos & PAGE_MASK)) { - size_t off = pos & ~PAGE_MASK; + if (fsize <= folio_pos(folio)) { + size_t off = offset_in_folio(folio, pos); if (off) - zero_user_segment(page, 0, off); + folio_zero_segment(folio, 0, off); goto success; } - err = fuse_do_readpage(file, page); + err = fuse_do_readpage(file, &folio->page); if (err) goto cleanup; success: - *pagep = page; + *pagep = &folio->page; return 0; cleanup: - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); error: return err; } -- 2.46.1

From: Miklos Szeredi <mszeredi@redhat.com> Originally when a stolen page was inserted into fuse's page cache by fuse_try_move_page(), it would be marked uptodate. Then fuse_readpages_end() would call SetPageUptodate() again on the already uptodate page. Commit 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") changed that by replacing the SetPageUptodate() + unlock_page() combination with folio_end_read(), which does mostly the same, except it sets the uptodate flag with an xor operation, which in the above scenario resulted in the uptodate flag being cleared, which in turn resulted in EIO being returned on the read. Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(), conforming to the expectation of folio_end_read(). Reported-by: Jürg Billeter <j@bitron.ch> Debugged-by: Matthew Wilcox <willy@infradead.org> Fixes: 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8573d79ef29c..06f0912c836a 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -773,7 +773,6 @@ static int fuse_check_folio(struct folio *folio) (folio->flags & PAGE_FLAGS_CHECK_AT_PREP & ~(1 << PG_locked | 1 << PG_referenced | - 1 << PG_uptodate | 1 << PG_lru | 1 << PG_active | 1 << PG_workingset | @@ -818,9 +817,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) newfolio = page_folio(buf->page); - if (!folio_test_uptodate(newfolio)) - folio_mark_uptodate(newfolio); - + folio_clear_uptodate(newfolio); folio_clear_mappedtodisk(newfolio); if (fuse_check_folio(newfolio) != 0) -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> To pave the way for refactoring out the shared logic in fuse_writepages_fill() and fuse_writepage_locked(), this change converts the temporary page in fuse_writepages_fill() to use the folio API. This is similar to the change in commit e0887e095a80 ("fuse: Convert fuse_writepage_locked to take a folio"), which converted the tmp page in fuse_writepage_locked() to use the folio API. inc_node_page_state() is intentionally preserved here instead of converting to node_stat_add_folio() since it is updating the stat of the underlying page and to better maintain API symmetry with dec_node_page_stat() in fuse_writepage_finish_stat(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1ab829cc2d16..3c34e0c14a12 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2187,7 +2187,7 @@ static int fuse_writepages_fill(struct folio *folio, struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); - struct page *tmp_page; + struct folio *tmp_folio; int err; if (!data->ff) { @@ -2203,8 +2203,8 @@ static int fuse_writepages_fill(struct folio *folio, } err = -ENOMEM; - tmp_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM); - if (!tmp_page) + tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); + if (!tmp_folio) goto out_unlock; /* @@ -2224,7 +2224,7 @@ static int fuse_writepages_fill(struct folio *folio, err = -ENOMEM; wpa = fuse_writepage_args_alloc(); if (!wpa) { - __free_page(tmp_page); + folio_put(tmp_folio); goto out_unlock; } fuse_writepage_add_to_bucket(fc, wpa); @@ -2242,14 +2242,14 @@ static int fuse_writepages_fill(struct folio *folio, } folio_start_writeback(folio); - copy_highpage(tmp_page, &folio->page); - ap->pages[ap->num_pages] = tmp_page; + folio_copy(tmp_folio, folio); + ap->pages[ap->num_pages] = &tmp_folio->page; ap->descs[ap->num_pages].offset = 0; ap->descs[ap->num_pages].length = PAGE_SIZE; data->orig_pages[ap->num_pages] = &folio->page; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - inc_node_page_state(tmp_page, NR_WRITEBACK_TEMP); + inc_node_page_state(&tmp_folio->page, NR_WRITEBACK_TEMP); err = 0; if (data->wpa) { -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> fuse_send_readpages() waits for writeback on each page. This can be replaced by a single call to fuse_range_is_writeback(). [SzM: split this off from "fuse: convert readahead to use folios"] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3c34e0c14a12..b77643e013f6 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -971,12 +971,17 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) static void fuse_readahead(struct readahead_control *rac) { struct inode *inode = rac->mapping->host; + struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); unsigned int i, max_pages, nr_pages = 0; + pgoff_t first = readahead_index(rac); + pgoff_t last = first + readahead_count(rac) - 1; if (fuse_is_bad(inode)) return; + wait_event(fi->page_waitq, !fuse_range_is_writeback(inode, first, last)); + max_pages = min_t(unsigned int, fc->max_pages, fc->max_read / PAGE_SIZE); @@ -1003,8 +1008,6 @@ static void fuse_readahead(struct readahead_control *rac) ap = &ia->ap; nr_pages = __readahead_batch(rac, ap->pages, nr_pages); for (i = 0; i < nr_pages; i++) { - fuse_wait_on_page_writeback(inode, - readahead_index(rac) + i); ap->descs[i].length = PAGE_SIZE; } ap->num_pages = nr_pages; -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> Currently we're using the __readahead_batch() helper which populates our fuse_args_pages->pages array with pages. Convert this to use the newer folio based pattern which is to call readahead_folio() to get the next folio in the read ahead batch. I've updated the code to use things like folio_size() and to take into account larger folio sizes, but this is purely to make that eventual work easier to do, we currently will not get large folios so this is more future proofing than actual support. [SzM: remove check for readahead_folio() won't return NULL (at least for now) so remove ugly assign in conditional.] Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b77643e013f6..bdee23a9edaa 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -924,7 +924,6 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, struct folio *folio = page_folio(ap->pages[i]); folio_end_read(folio, !err); - folio_put(folio); } if (ia->ff) fuse_file_put(ia->ff, false, false); @@ -973,7 +972,7 @@ static void fuse_readahead(struct readahead_control *rac) struct inode *inode = rac->mapping->host; struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); - unsigned int i, max_pages, nr_pages = 0; + unsigned int max_pages, nr_pages; pgoff_t first = readahead_index(rac); pgoff_t last = first + readahead_count(rac) - 1; @@ -985,9 +984,22 @@ static void fuse_readahead(struct readahead_control *rac) max_pages = min_t(unsigned int, fc->max_pages, fc->max_read / PAGE_SIZE); - for (;;) { + /* + * This is only accurate the first time through, since readahead_folio() + * doesn't update readahead_count() from the previous folio until the + * next call. Grab nr_pages here so we know how many pages we're going + * to have to process. This means that we will exit here with + * readahead_count() == folio_nr_pages(last_folio), but we will have + * consumed all of the folios, and read_pages() will call + * readahead_folio() again which will clean up the rac. + */ + nr_pages = readahead_count(rac); + + while (nr_pages) { struct fuse_io_args *ia; struct fuse_args_pages *ap; + struct folio *folio; + unsigned cur_pages = min(max_pages, nr_pages); if (fc->num_background >= fc->congestion_threshold && rac->ra->async_size >= readahead_count(rac)) @@ -997,21 +1009,19 @@ static void fuse_readahead(struct readahead_control *rac) */ break; - nr_pages = readahead_count(rac) - nr_pages; - if (nr_pages > max_pages) - nr_pages = max_pages; - if (nr_pages == 0) - break; - ia = fuse_io_alloc(NULL, nr_pages); + ia = fuse_io_alloc(NULL, cur_pages); if (!ia) return; ap = &ia->ap; - nr_pages = __readahead_batch(rac, ap->pages, nr_pages); - for (i = 0; i < nr_pages; i++) { - ap->descs[i].length = PAGE_SIZE; + + while (ap->num_pages < cur_pages) { + folio = readahead_folio(rac); + ap->pages[ap->num_pages] = &folio->page; + ap->descs[ap->num_pages].length = folio_size(folio); + ap->num_pages++; } - ap->num_pages = nr_pages; fuse_send_readpages(ia, rac->file); + nr_pages -= cur_pages; } } -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> Convert this to grab the folio from the fuse_args_pages and use the appropriate folio related functions. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index bdee23a9edaa..fd5868aad5a8 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1154,23 +1154,23 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, offset = ap->descs[0].offset; count = ia->write.out.size; for (i = 0; i < ap->num_pages; i++) { - struct page *page = ap->pages[i]; + struct folio *folio = page_folio(ap->pages[i]); if (err) { - ClearPageUptodate(page); + folio_clear_uptodate(folio); } else { - if (count >= PAGE_SIZE - offset) - count -= PAGE_SIZE - offset; + if (count >= folio_size(folio) - offset) + count -= folio_size(folio) - offset; else { if (short_write) - ClearPageUptodate(page); + folio_clear_uptodate(folio); count = 0; } offset = 0; } if (ia->write.page_locked && (i == ap->num_pages - 1)) - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); } return err; -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index fd5868aad5a8..8ea5ee96f77d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1192,7 +1192,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, do { size_t tmp; - struct page *page; + struct folio *folio; pgoff_t index = pos >> PAGE_SHIFT; size_t bytes = min_t(size_t, PAGE_SIZE - offset, iov_iter_count(ii)); @@ -1204,25 +1204,27 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, if (fault_in_iov_iter_readable(ii, bytes)) break; - err = -ENOMEM; - page = grab_cache_page_write_begin(mapping, index); - if (!page) + folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + mapping_gfp_mask(mapping)); + if (IS_ERR(folio)) { + err = PTR_ERR(folio); break; + } if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); - tmp = copy_page_from_iter_atomic(page, offset, bytes, ii); - flush_dcache_page(page); + tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii); + flush_dcache_folio(folio); if (!tmp) { - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); goto again; } err = 0; - ap->pages[ap->num_pages] = page; + ap->pages[ap->num_pages] = &folio->page; ap->descs[ap->num_pages].length = tmp; ap->num_pages++; @@ -1234,10 +1236,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, /* If we copied full page, mark it uptodate */ if (tmp == PAGE_SIZE) - SetPageUptodate(page); + folio_mark_uptodate(folio); - if (PageUptodate(page)) { - unlock_page(page); + if (folio_test_uptodate(folio)) { + folio_unlock(folio); } else { ia->write.page_locked = true; break; -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> Convert this to grab the folio directly, and update all the helpers to use the folio related functions. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8ea5ee96f77d..028b885e3f13 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -463,6 +463,16 @@ static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index) wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index)); } +static void fuse_wait_on_folio_writeback(struct inode *inode, + struct folio *folio) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + pgoff_t last = folio_next_index(folio) - 1; + + wait_event(fi->page_waitq, + !fuse_range_is_writeback(inode, folio_index(folio), last)); +} + /* * Wait for all pending writepages on the inode to finish. * @@ -2451,17 +2461,17 @@ static void fuse_vma_close(struct vm_area_struct *vma) */ static vm_fault_t fuse_page_mkwrite(struct vm_fault *vmf) { - struct page *page = vmf->page; + struct folio *folio = page_folio(vmf->page); struct inode *inode = file_inode(vmf->vma->vm_file); file_update_time(vmf->vma->vm_file); - lock_page(page); - if (page->mapping != inode->i_mapping) { - unlock_page(page); + folio_lock(folio); + if (folio->mapping != inode->i_mapping) { + folio_unlock(folio); return VM_FAULT_NOPAGE; } - fuse_wait_on_page_writeback(inode, page->index); + fuse_wait_on_folio_writeback(inode, folio); return VM_FAULT_LOCKED; } -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> Now that the buffered write path is using folios, convert fuse_do_readpage() to take a folio instead of a page, update it to use the appropriate folio helpers, and update the callers to pass in the folio directly instead of a page. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 028b885e3f13..e615e82f2a18 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -844,12 +844,13 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read, } } -static int fuse_do_readpage(struct file *file, struct page *page) +static int fuse_do_readfolio(struct file *file, struct folio *folio) { - struct inode *inode = page->mapping->host; + struct inode *inode = folio->mapping->host; struct fuse_mount *fm = get_fuse_mount(inode); - loff_t pos = page_offset(page); + loff_t pos = folio_pos(folio); struct fuse_page_desc desc = { .length = PAGE_SIZE }; + struct page *page = &folio->page; struct fuse_io_args ia = { .ap.args.page_zeroing = true, .ap.args.out_pages = true, @@ -861,11 +862,11 @@ static int fuse_do_readpage(struct file *file, struct page *page) u64 attr_ver; /* - * Page writeback can extend beyond the lifetime of the - * page-cache page, so make sure we read a properly synced - * page. + * With the temporary pages that are used to complete writeback, we can + * have writeback that extends beyond the lifetime of the folio. So + * make sure we read a properly synced folio. */ - fuse_wait_on_page_writeback(inode, page->index); + fuse_wait_on_folio_writeback(inode, folio); attr_ver = fuse_get_attr_version(fm->fc); @@ -883,25 +884,24 @@ static int fuse_do_readpage(struct file *file, struct page *page) if (res < desc.length) fuse_short_read(inode, attr_ver, res, &ia.ap); - SetPageUptodate(page); + folio_mark_uptodate(folio); return 0; } static int fuse_read_folio(struct file *file, struct folio *folio) { - struct page *page = &folio->page; - struct inode *inode = page->mapping->host; + struct inode *inode = folio->mapping->host; int err; err = -EIO; if (fuse_is_bad(inode)) goto out; - err = fuse_do_readpage(file, page); + err = fuse_do_readfolio(file, folio); fuse_invalidate_atime(inode); out: - unlock_page(page); + folio_unlock(folio); return err; } @@ -2371,7 +2371,7 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping, folio_zero_segment(folio, 0, off); goto success; } - err = fuse_do_readpage(file, &folio->page); + err = fuse_do_readfolio(file, folio); if (err) goto cleanup; success: -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> fuse_writepage_need_send is called by fuse_writepages_fill() which already has a folio. Change fuse_writepage_need_send() to take a folio instead, add a helper to check if the folio range is under writeback and use this, as well as the appropriate folio helpers in the rest of the function. Update fuse_writepage_need_send() to pass in the folio directly. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e615e82f2a18..f2cb3b68d45c 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -463,14 +463,19 @@ static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index) wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index)); } +static inline bool fuse_folio_is_writeback(struct inode *inode, + struct folio *folio) +{ + pgoff_t last = folio_next_index(folio) - 1; + return fuse_range_is_writeback(inode, folio_index(folio), last); +} + static void fuse_wait_on_folio_writeback(struct inode *inode, struct folio *folio) { struct fuse_inode *fi = get_fuse_inode(inode); - pgoff_t last = folio_next_index(folio) - 1; - wait_event(fi->page_waitq, - !fuse_range_is_writeback(inode, folio_index(folio), last)); + wait_event(fi->page_waitq, !fuse_folio_is_writeback(inode, folio)); } /* @@ -2169,7 +2174,7 @@ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, return false; } -static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page *page, +static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, struct fuse_args_pages *ap, struct fuse_fill_wb_data *data) { @@ -2181,7 +2186,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page *page, * the pages are faulted with get_user_pages(), and then after the read * completed. */ - if (fuse_page_is_writeback(data->inode, page->index)) + if (fuse_folio_is_writeback(data->inode, folio)) return true; /* Reached max pages */ @@ -2193,7 +2198,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page *page, return true; /* Discontinuity */ - if (data->orig_pages[ap->num_pages - 1]->index + 1 != page->index) + if (data->orig_pages[ap->num_pages - 1]->index + 1 != folio_index(folio)) return true; /* Need to grow the pages array? If so, did the expansion fail? */ @@ -2222,7 +2227,7 @@ static int fuse_writepages_fill(struct folio *folio, goto out_unlock; } - if (wpa && fuse_writepage_need_send(fc, &folio->page, ap, data)) { + if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) { fuse_writepages_send(data); data->wpa = NULL; } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Drop the unused "struct fuse_mount *fm" arg in fuse_writepage_finish(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f2cb3b68d45c..a73d51abe844 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1722,8 +1722,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) kfree(wpa); } -static void fuse_writepage_finish(struct fuse_mount *fm, - struct fuse_writepage_args *wpa) +static void fuse_writepage_finish(struct fuse_writepage_args *wpa) { struct fuse_args_pages *ap = &wpa->ia.ap; struct inode *inode = wpa->inode; @@ -1782,7 +1781,7 @@ __acquires(fi->lock) out_free: fi->writectr--; rb_erase(&wpa->writepages_entry, &fi->writepages); - fuse_writepage_finish(fm, wpa); + fuse_writepage_finish(wpa); spin_unlock(&fi->lock); /* After rb_erase() aux request list is private */ @@ -1918,7 +1917,7 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, fuse_send_writepage(fm, next, inarg->offset + inarg->size); } fi->writectr--; - fuse_writepage_finish(fm, wpa); + fuse_writepage_finish(wpa); spin_unlock(&fi->lock); fuse_writepage_free(wpa); } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Move the logic for updating the bdi and page stats for a finished writeback into a separate helper function, where it can be called from both fuse_writepage_finish() and fuse_writepage_add() (in the case where there is already an auxiliary write request for the page). No functional changes added. Suggested by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a73d51abe844..094528337dba 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1722,19 +1722,25 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) kfree(wpa); } +static void fuse_writepage_finish_stat(struct inode *inode, struct page *page) +{ + struct backing_dev_info *bdi = inode_to_bdi(inode); + + dec_wb_stat(&bdi->wb, WB_WRITEBACK); + dec_node_page_state(page, NR_WRITEBACK_TEMP); + wb_writeout_inc(&bdi->wb); +} + static void fuse_writepage_finish(struct fuse_writepage_args *wpa) { struct fuse_args_pages *ap = &wpa->ia.ap; struct inode *inode = wpa->inode; struct fuse_inode *fi = get_fuse_inode(inode); - struct backing_dev_info *bdi = inode_to_bdi(inode); int i; - for (i = 0; i < ap->num_pages; i++) { - dec_wb_stat(&bdi->wb, WB_WRITEBACK); - dec_node_page_state(ap->pages[i], NR_WRITEBACK_TEMP); - wb_writeout_inc(&bdi->wb); - } + for (i = 0; i < ap->num_pages; i++) + fuse_writepage_finish_stat(inode, ap->pages[i]); + wake_up(&fi->page_waitq); } @@ -1786,14 +1792,9 @@ __acquires(fi->lock) /* After rb_erase() aux request list is private */ for (aux = wpa->next; aux; aux = next) { - struct backing_dev_info *bdi = inode_to_bdi(aux->inode); - next = aux->next; aux->next = NULL; - - dec_wb_stat(&bdi->wb, WB_WRITEBACK); - dec_node_page_state(aux->ia.ap.pages[0], NR_WRITEBACK_TEMP); - wb_writeout_inc(&bdi->wb); + fuse_writepage_finish_stat(aux->inode, aux->ia.ap.pages[0]); fuse_writepage_free(aux); } @@ -2162,11 +2163,7 @@ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, spin_unlock(&fi->lock); if (tmp) { - struct backing_dev_info *bdi = inode_to_bdi(new_wpa->inode); - - dec_wb_stat(&bdi->wb, WB_WRITEBACK); - dec_node_page_state(new_ap->pages[0], NR_WRITEBACK_TEMP); - wb_writeout_inc(&bdi->wb); + fuse_writepage_finish_stat(new_wpa->inode, new_ap->pages[0]); fuse_writepage_free(new_wpa); } -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> We're just looking for pages in a mapping, use a folio and the folio lookup function directly instead of using the page helper. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 06f0912c836a..606ee9f1a640 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1706,15 +1706,15 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, index = outarg->offset >> PAGE_SHIFT; while (num && ap->num_pages < num_pages) { - struct page *page; + struct folio *folio; unsigned int this_num; - page = find_get_page(mapping, index); - if (!page) + folio = filemap_get_folio(mapping, index); + if (IS_ERR(folio)) break; this_num = min_t(unsigned, num, PAGE_SIZE - offset); - ap->pages[ap->num_pages] = page; + ap->pages[ap->num_pages] = &folio->page; ap->descs[ap->num_pages].offset = offset; ap->descs[ap->num_pages].length = this_num; ap->num_pages++; -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> This function creates pages in an inode and copies data into them, update the function to use a folio instead of a page, and use the appropriate folio helpers. [SzM: use filemap_grab_folio()] [Hau Tao: The third argument of folio_zero_range() should be the length to be zeroed, not the total length. Fix it by using folio_zero_segment() instead in fuse_notify_store()] Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 606ee9f1a640..a693b7242383 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1604,24 +1604,25 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, num = outarg.size; while (num) { + struct folio *folio; struct page *page; unsigned int this_num; - err = -ENOMEM; - page = find_or_create_page(mapping, index, - mapping_gfp_mask(mapping)); - if (!page) + folio = filemap_grab_folio(mapping, index); + err = PTR_ERR(folio); + if (IS_ERR(folio)) goto out_iput; - this_num = min_t(unsigned, num, PAGE_SIZE - offset); + page = &folio->page; + this_num = min_t(unsigned, num, folio_size(folio) - offset); err = fuse_copy_page(cs, &page, offset, this_num, 0); - if (!PageUptodate(page) && !err && offset == 0 && - (this_num == PAGE_SIZE || file_size == end)) { - zero_user_segment(page, this_num, PAGE_SIZE); - SetPageUptodate(page); + if (!folio_test_uptodate(folio) && !err && offset == 0 && + (this_num == folio_size(folio) || file_size == end)) { + folio_zero_segment(folio, this_num, folio_size(folio)); + folio_mark_uptodate(folio); } - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); if (err) goto out_iput; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> This adds support in struct fuse_args_pages and fuse_copy_pages() for using folios instead of pages for transferring data. Both folios and pages must be supported right now in struct fuse_args_pages and fuse_copy_pages() until all request types have been converted to use folios. Once all have been converted, then struct fuse_args_pages and fuse_copy_pages() will only support folios. Right now in fuse, all folios are one page (large folios are not yet supported). As such, copying folio->page is sufficient for copying the entire folio in fuse_copy_pages(). No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 40 ++++++++++++++++++++++++++++++++-------- fs/fuse/fuse_i.h | 22 +++++++++++++++++++--- 2 files changed, 51 insertions(+), 11 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index a693b7242383..0345f187c493 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -977,17 +977,41 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, struct fuse_req *req = cs->req; struct fuse_args_pages *ap = container_of(req->args, typeof(*ap), args); + if (ap->uses_folios) { + for (i = 0; i < ap->num_folios && (nbytes || zeroing); i++) { + int err; + unsigned int offset = ap->folio_descs[i].offset; + unsigned int count = min(nbytes, ap->folio_descs[i].length); + struct page *orig, *pagep; - for (i = 0; i < ap->num_pages && (nbytes || zeroing); i++) { - int err; - unsigned int offset = ap->descs[i].offset; - unsigned int count = min(nbytes, ap->descs[i].length); + orig = pagep = &ap->folios[i]->page; - err = fuse_copy_page(cs, &ap->pages[i], offset, count, zeroing); - if (err) - return err; + err = fuse_copy_page(cs, &pagep, offset, count, zeroing); + if (err) + return err; + + nbytes -= count; + + /* + * fuse_copy_page may have moved a page from a pipe + * instead of copying into our given page, so update + * the folios if it was replaced. + */ + if (pagep != orig) + ap->folios[i] = page_folio(pagep); + } + } else { + for (i = 0; i < ap->num_pages && (nbytes || zeroing); i++) { + int err; + unsigned int offset = ap->descs[i].offset; + unsigned int count = min(nbytes, ap->descs[i].length); - nbytes -= count; + err = fuse_copy_page(cs, &ap->pages[i], offset, count, zeroing); + if (err) + return err; + + nbytes -= count; + } } return 0; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 1bb136bcbe9e..eb9e50e96866 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -269,6 +269,12 @@ struct fuse_page_desc { unsigned int offset; }; +/** FUSE folio descriptor */ +struct fuse_folio_desc { + unsigned int length; + unsigned int offset; +}; + struct fuse_args { uint64_t nodeid; uint32_t opcode; @@ -296,9 +302,19 @@ struct fuse_args { struct fuse_args_pages { struct fuse_args args; - struct page **pages; - struct fuse_page_desc *descs; - unsigned int num_pages; + union { + struct { + struct page **pages; + struct fuse_page_desc *descs; + unsigned int num_pages; + }; + struct { + struct folio **folios; + struct fuse_folio_desc *folio_descs; + unsigned int num_folios; + }; + }; + bool uses_folios; }; #define FUSE_ARGS(args) struct fuse_args args = {} -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Until all requests have been converted to use folios instead of pages, virtio will need to support both types. Once all requests have been converted, then virtio will support just folios. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/virtio_fs.c | 87 +++++++++++++++++++++++++++++---------------- 1 file changed, 56 insertions(+), 31 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index ba816b50c1d4..f974c7896783 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -624,6 +624,7 @@ static void virtio_fs_request_complete(struct fuse_req *req, struct fuse_args_pages *ap; unsigned int len, i, thislen; struct page *page; + struct folio *folio; /* * TODO verify that server properly follows FUSE protocol @@ -635,15 +636,29 @@ static void virtio_fs_request_complete(struct fuse_req *req, if (args->out_pages && args->page_zeroing) { len = args->out_args[args->out_numargs - 1].size; ap = container_of(args, typeof(*ap), args); - for (i = 0; i < ap->num_pages; i++) { - thislen = ap->descs[i].length; - if (len < thislen) { - WARN_ON(ap->descs[i].offset); - page = ap->pages[i]; - zero_user_segment(page, len, thislen); - len = 0; - } else { - len -= thislen; + if (ap->uses_folios) { + for (i = 0; i < ap->num_folios; i++) { + thislen = ap->folio_descs[i].length; + if (len < thislen) { + WARN_ON(ap->folio_descs[i].offset); + folio = ap->folios[i]; + folio_zero_segment(folio, len, thislen); + len = 0; + } else { + len -= thislen; + } + } + } else { + for (i = 0; i < ap->num_pages; i++) { + thislen = ap->descs[i].length; + if (len < thislen) { + WARN_ON(ap->descs[i].offset); + page = ap->pages[i]; + zero_user_segment(page, len, thislen); + len = 0; + } else { + len -= thislen; + } } } } @@ -1132,16 +1147,22 @@ __releases(fiq->lock) } /* Count number of scatter-gather elements required */ -static unsigned int sg_count_fuse_pages(struct fuse_page_desc *page_descs, - unsigned int num_pages, - unsigned int total_len) +static unsigned int sg_count_fuse_pages(struct fuse_args_pages *ap, + unsigned int total_len) { unsigned int i; unsigned int this_len; - for (i = 0; i < num_pages && total_len; i++) { - this_len = min(page_descs[i].length, total_len); - total_len -= this_len; + if (ap->uses_folios) { + for (i = 0; i < ap->num_folios && total_len; i++) { + this_len = min(ap->folio_descs[i].length, total_len); + total_len -= this_len; + } + } else { + for (i = 0; i < ap->num_pages && total_len; i++) { + this_len = min(ap->descs[i].length, total_len); + total_len -= this_len; + } } return i; @@ -1159,8 +1180,7 @@ static unsigned int sg_count_fuse_req(struct fuse_req *req) if (args->in_pages) { size = args->in_args[args->in_numargs - 1].size; - total_sgs += sg_count_fuse_pages(ap->descs, ap->num_pages, - size); + total_sgs += sg_count_fuse_pages(ap, size); } if (!test_bit(FR_ISREPLY, &req->flags)) @@ -1173,28 +1193,35 @@ static unsigned int sg_count_fuse_req(struct fuse_req *req) if (args->out_pages) { size = args->out_args[args->out_numargs - 1].size; - total_sgs += sg_count_fuse_pages(ap->descs, ap->num_pages, - size); + total_sgs += sg_count_fuse_pages(ap, size); } return total_sgs; } -/* Add pages to scatter-gather list and return number of elements used */ +/* Add pages/folios to scatter-gather list and return number of elements used */ static unsigned int sg_init_fuse_pages(struct scatterlist *sg, - struct page **pages, - struct fuse_page_desc *page_descs, - unsigned int num_pages, + struct fuse_args_pages *ap, unsigned int total_len) { unsigned int i; unsigned int this_len; - for (i = 0; i < num_pages && total_len; i++) { - sg_init_table(&sg[i], 1); - this_len = min(page_descs[i].length, total_len); - sg_set_page(&sg[i], pages[i], this_len, page_descs[i].offset); - total_len -= this_len; + if (ap->uses_folios) { + for (i = 0; i < ap->num_folios && total_len; i++) { + sg_init_table(&sg[i], 1); + this_len = min(ap->folio_descs[i].length, total_len); + sg_set_folio(&sg[i], ap->folios[i], this_len, + ap->folio_descs[i].offset); + total_len -= this_len; + } + } else { + for (i = 0; i < ap->num_pages && total_len; i++) { + sg_init_table(&sg[i], 1); + this_len = min(ap->descs[i].length, total_len); + sg_set_page(&sg[i], ap->pages[i], this_len, ap->descs[i].offset); + total_len -= this_len; + } } return i; @@ -1218,9 +1245,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg, sg_init_one(&sg[total_sgs++], argbuf, len); if (argpages) - total_sgs += sg_init_fuse_pages(&sg[total_sgs], - ap->pages, ap->descs, - ap->num_pages, + total_sgs += sg_init_fuse_pages(&sg[total_sgs], ap, args[numargs - 1].size); if (len_used) -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert cuse requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/cuse.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index b6cad106c37e..6018af98dd08 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -303,8 +303,8 @@ struct cuse_init_args { struct fuse_args_pages ap; struct cuse_init_in in; struct cuse_init_out out; - struct page *page; - struct fuse_page_desc desc; + struct folio *folio; + struct fuse_folio_desc desc; }; /** @@ -322,7 +322,7 @@ static void cuse_process_init_reply(struct fuse_mount *fm, struct fuse_args_pages *ap = &ia->ap; struct cuse_conn *cc = fc_to_cc(fc), *pos; struct cuse_init_out *arg = &ia->out; - struct page *page = ap->pages[0]; + struct folio *folio = ap->folios[0]; struct cuse_devinfo devinfo = { }; struct device *dev; struct cdev *cdev; @@ -339,7 +339,7 @@ static void cuse_process_init_reply(struct fuse_mount *fm, /* parse init reply */ cc->unrestricted_ioctl = arg->flags & CUSE_UNRESTRICTED_IOCTL; - rc = cuse_parse_devinfo(page_address(page), ap->args.out_args[1].size, + rc = cuse_parse_devinfo(folio_address(folio), ap->args.out_args[1].size, &devinfo); if (rc) goto err; @@ -407,7 +407,7 @@ static void cuse_process_init_reply(struct fuse_mount *fm, kobject_uevent(&dev->kobj, KOBJ_ADD); out: kfree(ia); - __free_page(page); + folio_put(folio); return; err_cdev: @@ -425,7 +425,7 @@ static void cuse_process_init_reply(struct fuse_mount *fm, static int cuse_send_init(struct cuse_conn *cc) { int rc; - struct page *page; + struct folio *folio; struct fuse_mount *fm = &cc->fm; struct cuse_init_args *ia; struct fuse_args_pages *ap; @@ -433,13 +433,14 @@ static int cuse_send_init(struct cuse_conn *cc) BUILD_BUG_ON(CUSE_INIT_INFO_MAX > PAGE_SIZE); rc = -ENOMEM; - page = alloc_page(GFP_KERNEL | __GFP_ZERO); - if (!page) + + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, 0); + if (!folio) goto err; ia = kzalloc(sizeof(*ia), GFP_KERNEL); if (!ia) - goto err_free_page; + goto err_free_folio; ap = &ia->ap; ia->in.major = FUSE_KERNEL_VERSION; @@ -455,18 +456,19 @@ static int cuse_send_init(struct cuse_conn *cc) ap->args.out_args[1].size = CUSE_INIT_INFO_MAX; ap->args.out_argvar = true; ap->args.out_pages = true; - ap->num_pages = 1; - ap->pages = &ia->page; - ap->descs = &ia->desc; - ia->page = page; + ap->uses_folios = true; + ap->num_folios = 1; + ap->folios = &ia->folio; + ap->folio_descs = &ia->desc; + ia->folio = folio; ia->desc.length = ap->args.out_args[1].size; ap->args.end = cuse_process_init_reply; rc = fuse_simple_background(fm, &ap->args, GFP_KERNEL); if (rc) { kfree(ia); -err_free_page: - __free_page(page); +err_free_folio: + folio_put(folio); } err: return rc; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert readlink requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dir.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 95f9913a3537..bebe32d2421c 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1561,14 +1561,15 @@ static int fuse_permission(struct mnt_idmap *idmap, return err; } -static int fuse_readlink_page(struct inode *inode, struct page *page) +static int fuse_readlink_page(struct inode *inode, struct folio *folio) { struct fuse_mount *fm = get_fuse_mount(inode); - struct fuse_page_desc desc = { .length = PAGE_SIZE - 1 }; + struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 }; struct fuse_args_pages ap = { - .num_pages = 1, - .pages = &page, - .descs = &desc, + .uses_folios = true, + .num_folios = 1, + .folios = &folio, + .folio_descs = &desc, }; char *link; ssize_t res; @@ -1590,7 +1591,7 @@ static int fuse_readlink_page(struct inode *inode, struct page *page) if (WARN_ON(res >= PAGE_SIZE)) return -EIO; - link = page_address(page); + link = folio_address(folio); link[res] = '\0'; return 0; @@ -1600,7 +1601,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *callback) { struct fuse_conn *fc = get_fuse_conn(inode); - struct page *page; + struct folio *folio; int err; err = -EIO; @@ -1614,20 +1615,20 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode, if (!dentry) goto out_err; - page = alloc_page(GFP_KERNEL); + folio = folio_alloc(GFP_KERNEL, 0); err = -ENOMEM; - if (!page) + if (!folio) goto out_err; - err = fuse_readlink_page(inode, page); + err = fuse_readlink_page(inode, folio); if (err) { - __free_page(page); + folio_put(folio); goto out_err; } - set_delayed_call(callback, page_put_link, page); + set_delayed_call(callback, page_put_link, &folio->page); - return page_address(page); + return folio_address(folio); out_err: return ERR_PTR(err); @@ -2172,7 +2173,7 @@ void fuse_init_dir(struct inode *inode) static int fuse_symlink_read_folio(struct file *null, struct folio *folio) { - int err = fuse_readlink_page(folio->mapping->host, &folio->page); + int err = fuse_readlink_page(folio->mapping->host, folio); if (!err) folio_mark_uptodate(folio); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert readdir requests to use a folio instead of a page. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/readdir.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c index 9e6d587b3e67..32604211403c 100644 --- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -331,24 +331,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) { int plus; ssize_t res; - struct page *page; + struct folio *folio; struct inode *inode = file_inode(file); struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_io_args ia = {}; struct fuse_args_pages *ap = &ia.ap; - struct fuse_page_desc desc = { .length = PAGE_SIZE }; + struct fuse_folio_desc desc = { .length = PAGE_SIZE }; u64 attr_version = 0; bool locked; - page = alloc_page(GFP_KERNEL); - if (!page) + folio = folio_alloc(GFP_KERNEL, 0); + if (!folio) return -ENOMEM; plus = fuse_use_readdirplus(inode, ctx); ap->args.out_pages = true; - ap->num_pages = 1; - ap->pages = &page; - ap->descs = &desc; + ap->uses_folios = true; + ap->num_folios = 1; + ap->folios = &folio; + ap->folio_descs = &desc; if (plus) { attr_version = fuse_get_attr_version(fm->fc); fuse_read_args_fill(&ia, file, ctx->pos, PAGE_SIZE, @@ -367,15 +368,15 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) if (ff->open_flags & FOPEN_CACHE_DIR) fuse_readdir_cache_end(file, ctx->pos); } else if (plus) { - res = parse_dirplusfile(page_address(page), res, + res = parse_dirplusfile(folio_address(folio), res, file, ctx, attr_version); } else { - res = parse_dirfile(page_address(page), res, file, + res = parse_dirfile(folio_address(folio), res, file, ctx); } } - __free_page(page); + folio_put(folio); fuse_invalidate_atime(inode); return res; } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert read requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 67 ++++++++++++++++++++++++++++++++---------------- fs/fuse/fuse_i.h | 12 +++++++++ 2 files changed, 57 insertions(+), 22 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 094528337dba..fd617647e113 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -739,12 +739,37 @@ static struct fuse_io_args *fuse_io_alloc(struct fuse_io_priv *io, return ia; } +static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, + unsigned int nfolios) +{ + struct fuse_io_args *ia; + + ia = kzalloc(sizeof(*ia), GFP_KERNEL); + if (ia) { + ia->io = io; + ia->ap.uses_folios = true; + ia->ap.folios = fuse_folios_alloc(nfolios, GFP_KERNEL, + &ia->ap.folio_descs); + if (!ia->ap.folios) { + kfree(ia); + ia = NULL; + } + } + return ia; +} + static void fuse_io_free(struct fuse_io_args *ia) { kfree(ia->ap.pages); kfree(ia); } +static void fuse_io_folios_free(struct fuse_io_args *ia) +{ + kfree(ia->ap.folios); + kfree(ia); +} + static void fuse_aio_complete_req(struct fuse_mount *fm, struct fuse_args *args, int err) { @@ -844,7 +869,7 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read, * reached the client fs yet. So the hole is not present there. */ if (!fc->writeback_cache) { - loff_t pos = page_offset(ap->pages[0]) + num_read; + loff_t pos = folio_pos(ap->folios[0]) + num_read; fuse_read_update_size(inode, pos, attr_ver); } } @@ -854,14 +879,14 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio) struct inode *inode = folio->mapping->host; struct fuse_mount *fm = get_fuse_mount(inode); loff_t pos = folio_pos(folio); - struct fuse_page_desc desc = { .length = PAGE_SIZE }; - struct page *page = &folio->page; + struct fuse_folio_desc desc = { .length = PAGE_SIZE }; struct fuse_io_args ia = { .ap.args.page_zeroing = true, .ap.args.out_pages = true, - .ap.num_pages = 1, - .ap.pages = &page, - .ap.descs = &desc, + .ap.uses_folios = true, + .ap.num_folios = 1, + .ap.folios = &folio, + .ap.folio_descs = &desc, }; ssize_t res; u64 attr_ver; @@ -920,8 +945,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, size_t num_read = args->out_args[0].size; struct address_space *mapping = NULL; - for (i = 0; mapping == NULL && i < ap->num_pages; i++) - mapping = ap->pages[i]->mapping; + for (i = 0; mapping == NULL && i < ap->num_folios; i++) + mapping = ap->folios[i]->mapping; if (mapping) { struct inode *inode = mapping->host; @@ -935,15 +960,12 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, fuse_invalidate_atime(inode); } - for (i = 0; i < ap->num_pages; i++) { - struct folio *folio = page_folio(ap->pages[i]); - - folio_end_read(folio, !err); - } + for (i = 0; i < ap->num_folios; i++) + folio_end_read(ap->folios[i], !err); if (ia->ff) fuse_file_put(ia->ff, false, false); - fuse_io_free(ia); + fuse_io_folios_free(ia); } static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) @@ -951,8 +973,9 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) struct fuse_file *ff = file->private_data; struct fuse_mount *fm = ff->fm; struct fuse_args_pages *ap = &ia->ap; - loff_t pos = page_offset(ap->pages[0]); - size_t count = ap->num_pages << PAGE_SHIFT; + loff_t pos = folio_pos(ap->folios[0]); + /* Currently, all folios in FUSE are one page */ + size_t count = ap->num_folios << PAGE_SHIFT; ssize_t res; int err; @@ -963,7 +986,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) /* Don't overflow end offset */ if (pos + (count - 1) == LLONG_MAX) { count--; - ap->descs[ap->num_pages - 1].length--; + ap->folio_descs[ap->num_folios - 1].length--; } WARN_ON((loff_t) (pos + count) < 0); @@ -1024,16 +1047,16 @@ static void fuse_readahead(struct readahead_control *rac) */ break; - ia = fuse_io_alloc(NULL, cur_pages); + ia = fuse_io_folios_alloc(NULL, cur_pages); if (!ia) return; ap = &ia->ap; - while (ap->num_pages < cur_pages) { + while (ap->num_folios < cur_pages) { folio = readahead_folio(rac); - ap->pages[ap->num_pages] = &folio->page; - ap->descs[ap->num_pages].length = folio_size(folio); - ap->num_pages++; + ap->folios[ap->num_folios] = folio; + ap->folio_descs[ap->num_folios].length = folio_size(folio); + ap->num_folios++; } fuse_send_readpages(ia, rac->file); nr_pages -= cur_pages; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index eb9e50e96866..2b05359ad736 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -992,6 +992,18 @@ static inline struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags, return pages; } +static inline struct folio **fuse_folios_alloc(unsigned int nfolios, gfp_t flags, + struct fuse_folio_desc **desc) +{ + struct folio **folios; + + folios = kzalloc(nfolios * (sizeof(struct folio *) + + sizeof(struct fuse_folio_desc)), flags); + *desc = (void *) (folios + nfolios); + + return folios; +} + static inline void fuse_page_descs_length_init(struct fuse_page_desc *descs, unsigned int index, unsigned int nr_pages) -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert non-writeback write requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 33 ++++++++++++++++++--------------- fs/fuse/fuse_i.h | 2 +- 2 files changed, 19 insertions(+), 16 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index fd617647e113..8cd4c847c273 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1176,8 +1176,8 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, bool short_write; int err; - for (i = 0; i < ap->num_pages; i++) - fuse_wait_on_page_writeback(inode, ap->pages[i]->index); + for (i = 0; i < ap->num_folios; i++) + fuse_wait_on_folio_writeback(inode, ap->folios[i]); fuse_write_args_fill(ia, ff, pos, count); ia->write.in.flags = fuse_write_flags(iocb); @@ -1189,10 +1189,10 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, err = -EIO; short_write = ia->write.out.size < count; - offset = ap->descs[0].offset; + offset = ap->folio_descs[0].offset; count = ia->write.out.size; - for (i = 0; i < ap->num_pages; i++) { - struct folio *folio = page_folio(ap->pages[i]); + for (i = 0; i < ap->num_folios; i++) { + struct folio *folio = ap->folios[i]; if (err) { folio_clear_uptodate(folio); @@ -1206,7 +1206,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, } offset = 0; } - if (ia->write.page_locked && (i == ap->num_pages - 1)) + if (ia->write.folio_locked && (i == ap->num_folios - 1)) folio_unlock(folio); folio_put(folio); } @@ -1222,11 +1222,12 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, struct fuse_args_pages *ap = &ia->ap; struct fuse_conn *fc = get_fuse_conn(mapping->host); unsigned offset = pos & (PAGE_SIZE - 1); + unsigned int nr_pages = 0; size_t count = 0; int err; ap->args.in_pages = true; - ap->descs[0].offset = offset; + ap->folio_descs[0].offset = offset; do { size_t tmp; @@ -1262,9 +1263,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, } err = 0; - ap->pages[ap->num_pages] = &folio->page; - ap->descs[ap->num_pages].length = tmp; - ap->num_pages++; + ap->folios[ap->num_folios] = folio; + ap->folio_descs[ap->num_folios].length = tmp; + ap->num_folios++; + nr_pages++; count += tmp; pos += tmp; @@ -1279,13 +1281,13 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, if (folio_test_uptodate(folio)) { folio_unlock(folio); } else { - ia->write.page_locked = true; + ia->write.folio_locked = true; break; } if (!fc->big_writes) break; } while (iov_iter_count(ii) && count < fc->max_write && - ap->num_pages < max_pages && offset == 0); + nr_pages < max_pages && offset == 0); return count > 0 ? count : err; } @@ -1319,8 +1321,9 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii) unsigned int nr_pages = fuse_wr_pages(pos, iov_iter_count(ii), fc->max_pages); - ap->pages = fuse_pages_alloc(nr_pages, GFP_KERNEL, &ap->descs); - if (!ap->pages) { + ap->uses_folios = true; + ap->folios = fuse_folios_alloc(nr_pages, GFP_KERNEL, &ap->folio_descs); + if (!ap->folios) { err = -ENOMEM; break; } @@ -1342,7 +1345,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii) err = -EIO; } } - kfree(ap->pages); + kfree(ap->folios); } while (!err && iov_iter_count(ii)); fuse_write_update_attr(inode, pos, res); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 2b05359ad736..c8043bf5724b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1063,7 +1063,7 @@ struct fuse_io_args { struct { struct fuse_write_in in; struct fuse_write_out out; - bool page_locked; + bool folio_locked; } write; }; struct fuse_args_pages ap; -- 2.46.1

From: David Howells <dhowells@redhat.com> Provide a copy_folio_from_iter() wrapper. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20240814203850.2240469-14-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org> --- include/linux/uio.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 56249514c82d..1b75ef411ad2 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -189,6 +189,12 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, return copy_page_to_iter(&folio->page, offset, bytes, i); } +static inline size_t copy_folio_from_iter(struct folio *folio, size_t offset, + size_t bytes, struct iov_iter *i) +{ + return copy_page_from_iter(&folio->page, offset, bytes, i); +} + static inline size_t copy_folio_from_iter_atomic(struct folio *folio, size_t offset, size_t bytes, struct iov_iter *i) { -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert ioctl requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/fuse_i.h | 10 ++++++++++ fs/fuse/ioctl.c | 32 ++++++++++++++++---------------- 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c8043bf5724b..7f573394f2ba 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1014,6 +1014,16 @@ static inline void fuse_page_descs_length_init(struct fuse_page_desc *descs, descs[i].length = PAGE_SIZE - descs[i].offset; } +static inline void fuse_folio_descs_length_init(struct fuse_folio_desc *descs, + unsigned int index, + unsigned int nr_folios) +{ + int i; + + for (i = index; i < index + nr_folios; i++) + descs[i].length = PAGE_SIZE - descs[i].offset; +} + static inline void fuse_sync_bucket_dec(struct fuse_sync_bucket *bucket) { /* Need RCU protection to prevent use after free after the decrement */ diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c index 726640fa439e..dc3e7c8ff97b 100644 --- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -201,12 +201,12 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, BUILD_BUG_ON(sizeof(struct fuse_ioctl_iovec) * FUSE_IOCTL_MAX_IOV > PAGE_SIZE); err = -ENOMEM; - ap.pages = fuse_pages_alloc(fm->fc->max_pages, GFP_KERNEL, &ap.descs); + ap.folios = fuse_folios_alloc(fm->fc->max_pages, GFP_KERNEL, &ap.folio_descs); iov_page = (struct iovec *) __get_free_page(GFP_KERNEL); - if (!ap.pages || !iov_page) + if (!ap.folios || !iov_page) goto out; - fuse_page_descs_length_init(ap.descs, 0, fm->fc->max_pages); + fuse_folio_descs_length_init(ap.folio_descs, 0, fm->fc->max_pages); /* * If restricted, initialize IO parameters as encoded in @cmd. @@ -244,14 +244,14 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, err = -ENOMEM; if (max_pages > fm->fc->max_pages) goto out; - while (ap.num_pages < max_pages) { - ap.pages[ap.num_pages] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM); - if (!ap.pages[ap.num_pages]) + ap.uses_folios = true; + while (ap.num_folios < max_pages) { + ap.folios[ap.num_folios] = folio_alloc(GFP_KERNEL | __GFP_HIGHMEM, 0); + if (!ap.folios[ap.num_folios]) goto out; - ap.num_pages++; + ap.num_folios++; } - /* okay, let's send it to the client */ ap.args.opcode = FUSE_IOCTL; ap.args.nodeid = ff->nodeid; @@ -265,8 +265,8 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, err = -EFAULT; iov_iter_init(&ii, ITER_SOURCE, in_iov, in_iovs, in_size); - for (i = 0; iov_iter_count(&ii) && !WARN_ON(i >= ap.num_pages); i++) { - c = copy_page_from_iter(ap.pages[i], 0, PAGE_SIZE, &ii); + for (i = 0; iov_iter_count(&ii) && !WARN_ON(i >= ap.num_folios); i++) { + c = copy_folio_from_iter(ap.folios[i], 0, PAGE_SIZE, &ii); if (c != PAGE_SIZE && iov_iter_count(&ii)) goto out; } @@ -304,7 +304,7 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, in_iovs + out_iovs > FUSE_IOCTL_MAX_IOV) goto out; - vaddr = kmap_local_page(ap.pages[0]); + vaddr = kmap_local_folio(ap.folios[0], 0); err = fuse_copy_ioctl_iovec(fm->fc, iov_page, vaddr, transferred, in_iovs + out_iovs, (flags & FUSE_IOCTL_COMPAT) != 0); @@ -332,17 +332,17 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, err = -EFAULT; iov_iter_init(&ii, ITER_DEST, out_iov, out_iovs, transferred); - for (i = 0; iov_iter_count(&ii) && !WARN_ON(i >= ap.num_pages); i++) { - c = copy_page_to_iter(ap.pages[i], 0, PAGE_SIZE, &ii); + for (i = 0; iov_iter_count(&ii) && !WARN_ON(i >= ap.num_folios); i++) { + c = copy_folio_to_iter(ap.folios[i], 0, PAGE_SIZE, &ii); if (c != PAGE_SIZE && iov_iter_count(&ii)) goto out; } err = 0; out: free_page((unsigned long) iov_page); - while (ap.num_pages) - __free_page(ap.pages[--ap.num_pages]); - kfree(ap.pages); + while (ap.num_folios) + folio_put(ap.folios[--ap.num_folios]); + kfree(ap.folios); return err ? err : outarg.result; } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert retrieve requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 0345f187c493..504f5917f285 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1678,7 +1678,7 @@ static void fuse_retrieve_end(struct fuse_mount *fm, struct fuse_args *args, struct fuse_retrieve_args *ra = container_of(args, typeof(*ra), ap.args); - release_pages(ra->ap.pages, ra->ap.num_pages); + release_pages(ra->ap.folios, ra->ap.num_folios); kfree(ra); } @@ -1692,7 +1692,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, unsigned int num; unsigned int offset; size_t total_len = 0; - unsigned int num_pages; + unsigned int num_pages, cur_pages = 0; struct fuse_conn *fc = fm->fc; struct fuse_retrieve_args *ra; size_t args_size = sizeof(*ra); @@ -1711,15 +1711,16 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT; num_pages = min(num_pages, fc->max_pages); - args_size += num_pages * (sizeof(ap->pages[0]) + sizeof(ap->descs[0])); + args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->folio_descs[0])); ra = kzalloc(args_size, GFP_KERNEL); if (!ra) return -ENOMEM; ap = &ra->ap; - ap->pages = (void *) (ra + 1); - ap->descs = (void *) (ap->pages + num_pages); + ap->folios = (void *) (ra + 1); + ap->folio_descs = (void *) (ap->folios + num_pages); + ap->uses_folios = true; args = &ap->args; args->nodeid = outarg->nodeid; @@ -1730,7 +1731,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, index = outarg->offset >> PAGE_SHIFT; - while (num && ap->num_pages < num_pages) { + while (num && cur_pages < num_pages) { struct folio *folio; unsigned int this_num; @@ -1739,10 +1740,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, break; this_num = min_t(unsigned, num, PAGE_SIZE - offset); - ap->pages[ap->num_pages] = &folio->page; - ap->descs[ap->num_pages].offset = offset; - ap->descs[ap->num_pages].length = this_num; - ap->num_pages++; + ap->folios[ap->num_folios] = folio; + ap->folio_descs[ap->num_folios].offset = offset; + ap->folio_descs[ap->num_folios].length = this_num; + ap->num_folios++; + cur_pages++; offset = 0; num -= this_num; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> In commit 3eab9d7bc2f4 ("fuse: convert readahead to use folios"), the logic was converted to using the new folio readahead code, which drops the reference on the folio once it is locked, using an inferred reference on the folio. Previously we held a reference on the folio for the entire duration of the readpages call. This is fine, however for the case for splice pipe responses where we will remove the old folio and splice in the new folio (see fuse_try_move_page()), we assume that there is a reference held on the folio for ap->folios, which is no longer the case. To fix this, revert back to __readahead_folio() which allows us to hold the reference on the folio for the duration of readpages until either we drop the reference ourselves in fuse_readpages_end() or the reference is dropped after it's replaced in the page cache in the splice case. This will fix the UAF bug that was reported. Link: https://lore.kernel.org/linux-fsdevel/2f681f48-00f5-4e09-8431-2b3dbfaa881e@h... Fixes: 3eab9d7bc2f4 ("fuse: convert readahead to use folios") Reported-by: Christian Heusel <christian@heusel.eu> Closes: https://lore.kernel.org/all/2f681f48-00f5-4e09-8431-2b3dbfaa881e@heusel.eu/ Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/110 Reported-by: Mantas Mikulėnas <grawity@gmail.com> Closes: https://lore.kernel.org/all/34feb867-09e2-46e4-aa31-d9660a806d1a@gmail.com/ Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1236660 Cc: <stable@vger.kernel.org> # v6.13 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dev.c | 6 ++++++ fs/fuse/file.c | 13 +++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 504f5917f285..57790734ad48 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -785,6 +785,12 @@ static int fuse_check_folio(struct folio *folio) return 0; } +/* + * Attempt to steal a page from the splice() pipe and move it into the + * pagecache. If successful, the pointer in @pagep will be updated. The + * folio that was originally in @pagep will lose a reference and the new + * folio returned in @pagep will carry a reference. + */ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) { int err; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8cd4c847c273..a97dd602ce5d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -960,8 +960,10 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, fuse_invalidate_atime(inode); } - for (i = 0; i < ap->num_folios; i++) + for (i = 0; i < ap->num_folios; i++) { folio_end_read(ap->folios[i], !err); + folio_put(ap->folios[i]); + } if (ia->ff) fuse_file_put(ia->ff, false, false); @@ -1053,7 +1055,14 @@ static void fuse_readahead(struct readahead_control *rac) ap = &ia->ap; while (ap->num_folios < cur_pages) { - folio = readahead_folio(rac); + /* + * This returns a folio with a ref held on it. + * The ref needs to be held until the request is + * completed, since the splice case (see + * fuse_try_move_page()) drops the ref after it's + * replaced in the page cache. + */ + folio = __readahead_folio(rac); ap->folios[ap->num_folios] = folio; ap->folio_descs[ap->num_folios].length = folio_size(folio); ap->num_folios++; -- 2.46.1

From: Lei Huang <lei.huang@linux.intel.com> Our user space filesystem relies on fuse to provide POSIX interface. In our test, a known string is written into a file and the content is read back later to verify correct data returned. We observed wrong data returned in read buffer in rare cases although correct data are stored in our filesystem. Fuse kernel module calls iov_iter_get_pages2() to get the physical pages of the user-space read buffer passed in read(). The pages are not pinned to avoid page migration. When page migration occurs, the consequence are two-folds. 1) Applications do not receive correct data in read buffer. 2) fuse kernel writes data into a wrong place. Using iov_iter_extract_pages() to pin pages fixes the issue in our test. An auxiliary variable "struct page **pt_pages" is used in the patch to prepare the 2nd parameter for iov_iter_extract_pages() since iov_iter_get_pages2() uses a different type for the 2nd parameter. [SzM] add iov_iter_extract_will_pin(ii) and unpin only if true. Signed-off-by: Lei Huang <lei.huang@linux.intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/file.c fs/fuse/fuse_i.h --- fs/fuse/file.c | 15 ++++++++++----- fs/fuse/fuse_i.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a97dd602ce5d..bb001a50da74 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -648,7 +648,8 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, ssize_t nres, for (i = 0; i < ap->num_pages; i++) { if (should_dirty) set_page_dirty_lock(ap->pages[i]); - put_page(ap->pages[i]); + if (ap->args.is_pinned) + unpin_user_page(ap->pages[i]); } if (nres > 0 && ap->args.invalidate_vmap) @@ -1472,10 +1473,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, while (nbytes < *nbytesp && ap->num_pages < max_pages) { unsigned npages; size_t start; - ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages], - *nbytesp - nbytes, - max_pages - ap->num_pages, - &start); + struct page **pt_pages; + + pt_pages = &ap->pages[ap->num_pages]; + ret = iov_iter_extract_pages(ii, &pt_pages, + *nbytesp - nbytes, + max_pages - ap->num_pages, + 0, &start); if (ret < 0) break; @@ -1496,6 +1500,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, flush_kernel_vmap_range(ap->args.vmap_base, nbytes); ap->args.invalidate_vmap = !write && flush_or_invalidate; + ap->args.is_pinned = iov_iter_extract_will_pin(ii); ap->args.user_pages = true; if (write) ap->args.in_pages = true; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7f573394f2ba..59e26a4f07f9 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -293,6 +293,7 @@ struct fuse_args { bool may_block:1; bool is_ext:1; bool invalidate_vmap:1; + bool is_pinned:1; struct fuse_in_arg in_args[3]; struct fuse_arg out_args[2]; void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add a new convenience helper folio_mark_dirty_lock() that grabs the folio lock before calling folio_mark_dirty(). Refactor set_page_dirty_lock() to directly use folio_mark_dirty_lock(). Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: mm/folio-compat.c --- include/linux/mm.h | 1 + mm/folio-compat.c | 6 ++++++ mm/page-writeback.c | 22 +++++++++++----------- 3 files changed, 18 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e6ef9532fc3..9f04e6f8e117 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2559,6 +2559,7 @@ struct kvec; struct page *get_dump_page(unsigned long addr); bool folio_mark_dirty(struct folio *folio); +bool folio_mark_dirty_lock(struct folio *folio); bool set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); diff --git a/mm/folio-compat.c b/mm/folio-compat.c index a546271db69b..cde4a40f6645 100644 --- a/mm/folio-compat.c +++ b/mm/folio-compat.c @@ -64,6 +64,12 @@ int __set_page_dirty_nobuffers(struct page *page) } EXPORT_SYMBOL(__set_page_dirty_nobuffers); +int set_page_dirty_lock(struct page *page) +{ + return folio_mark_dirty_lock(page_folio(page)); +} +EXPORT_SYMBOL(set_page_dirty_lock); + bool clear_page_dirty_for_io(struct page *page) { return folio_clear_dirty_for_io(page_folio(page)); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 903933ed0d56..a2bb0d42c7cb 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2808,25 +2808,25 @@ bool folio_mark_dirty(struct folio *folio) EXPORT_SYMBOL(folio_mark_dirty); /* - * set_page_dirty() is racy if the caller has no reference against - * page->mapping->host, and if the page is unlocked. This is because another - * CPU could truncate the page off the mapping and then free the mapping. + * folio_mark_dirty() is racy if the caller has no reference against + * folio->mapping->host, and if the folio is unlocked. This is because another + * CPU could truncate the folio off the mapping and then free the mapping. * - * Usually, the page _is_ locked, or the caller is a user-space process which + * Usually, the folio _is_ locked, or the caller is a user-space process which * holds a reference on the inode by having an open file. * - * In other cases, the page should be locked before running set_page_dirty(). + * In other cases, the folio should be locked before running folio_mark_dirty(). */ -int set_page_dirty_lock(struct page *page) +bool folio_mark_dirty_lock(struct folio *folio) { - int ret; + bool ret; - lock_page(page); - ret = set_page_dirty(page); - unlock_page(page); + folio_lock(folio); + ret = folio_mark_dirty(folio); + folio_unlock(folio); return ret; } -EXPORT_SYMBOL(set_page_dirty_lock); +EXPORT_SYMBOL(folio_mark_dirty_lock); /* * This cancels just the dirty bit on the kernel page itself, it does NOT -- 2.46.1

From: Vivek Kasireddy <vivek.kasireddy@intel.com> Patch series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios", v16. Currently, some drivers (e.g, Udmabuf) that want to longterm-pin the pages/folios associated with a memfd, do so by simply taking a reference on them. This is not desirable because the pages/folios may reside in Movable zone or CMA block. Therefore, having drivers use memfd_pin_folios() API ensures that the folios are appropriately pinned via FOLL_PIN for longterm DMA. This patchset also introduces a few helpers and converts the Udmabuf driver to use folios and memfd_pin_folios() API to longterm-pin the folios for DMA. Two new Udmabuf selftests are also included to test the driver and the new API. This patch (of 9): These helpers are the folio versions of unpin_user_page/unpin_user_pages. They are currently only useful for unpinning folios pinned by memfd_pin_folios() or other associated routines. However, they could find new uses in the future, when more and more folio-only helpers are added to GUP. We should probably sanity check the folio as part of unpin similar to how it is done in unpin_user_page/unpin_user_pages but we cannot cleanly do that at the moment without also checking the subpage. Therefore, sanity checking needs to be added to these routines once we have a way to determine if any given folio is anon-exclusive (via a per folio AnonExclusive flag). Link: https://lkml.kernel.org/r/20240624063952.1572359-1-vivek.kasireddy@intel.com Link: https://lkml.kernel.org/r/20240624063952.1572359-2-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 2 ++ mm/gup.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9f04e6f8e117..e9a6b4dd6a58 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1620,11 +1620,13 @@ static inline void put_page(struct page *page) #define GUP_PIN_COUNTING_BIAS (1U << 10) void unpin_user_page(struct page *page); +void unpin_folio(struct folio *folio); void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty); void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, bool make_dirty); void unpin_user_pages(struct page **pages, unsigned long npages); +void unpin_folios(struct folio **folios, unsigned long nfolios); static inline bool is_cow_mapping(vm_flags_t flags) { diff --git a/mm/gup.c b/mm/gup.c index a31af5e12427..a7399aa7a281 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -188,6 +188,19 @@ void unpin_user_page(struct page *page) } EXPORT_SYMBOL(unpin_user_page); +/** + * unpin_folio() - release a dma-pinned folio + * @folio: pointer to folio to be released + * + * Folios that were pinned via memfd_pin_folios() or other similar routines + * must be released either using unpin_folio() or unpin_folios(). + */ +void unpin_folio(struct folio *folio) +{ + gup_put_folio(folio, 1, FOLL_PIN); +} +EXPORT_SYMBOL_GPL(unpin_folio); + /** * folio_add_pin - Try to get an additional pin on a pinned folio * @folio: The folio to be pinned @@ -400,6 +413,40 @@ void unpin_user_pages(struct page **pages, unsigned long npages) } EXPORT_SYMBOL(unpin_user_pages); +/** + * unpin_folios() - release an array of gup-pinned folios. + * @folios: array of folios to be marked dirty and released. + * @nfolios: number of folios in the @folios array. + * + * For each folio in the @folios array, release the folio using gup_put_folio. + * + * Please see the unpin_folio() documentation for details. + */ +void unpin_folios(struct folio **folios, unsigned long nfolios) +{ + unsigned long i = 0, j; + + /* + * If this WARN_ON() fires, then the system *might* be leaking folios + * (by leaving them pinned), but probably not. More likely, gup/pup + * returned a hard -ERRNO error to the caller, who erroneously passed + * it here. + */ + if (WARN_ON(IS_ERR_VALUE(nfolios))) + return; + + while (i < nfolios) { + for (j = i + 1; j < nfolios; j++) + if (folios[i] != folios[j]) + break; + + if (folios[i]) + gup_put_folio(folios[i], j - i, FOLL_PIN); + i = j; + } +} +EXPORT_SYMBOL_GPL(unpin_folios); + /* * Set the MMF_HAS_PINNED if not set yet; after set it'll be there for the mm's * lifecycle. Avoid setting the bit unless necessary, or it might cause write -- 2.46.1

From: Amir Goldstein <amir73il@gmail.com> This removed the need to pass isdir argument to fuse_put_file(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/dir.c | 2 +- fs/fuse/file.c | 69 +++++++++++++++++++++++++++--------------------- fs/fuse/fuse_i.h | 2 +- 3 files changed, 41 insertions(+), 32 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index bebe32d2421c..a163faefa9c2 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -634,7 +634,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, goto out_err; err = -ENOMEM; - ff = fuse_file_alloc(fm); + ff = fuse_file_alloc(fm, true); if (!ff) goto out_put_forget_req; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index bb001a50da74..e0c96e37596f 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -55,7 +55,7 @@ struct fuse_release_args { struct inode *inode; }; -struct fuse_file *fuse_file_alloc(struct fuse_mount *fm) +struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release) { struct fuse_file *ff; @@ -64,11 +64,13 @@ struct fuse_file *fuse_file_alloc(struct fuse_mount *fm) return NULL; ff->fm = fm; - ff->release_args = kzalloc(sizeof(*ff->release_args), - GFP_KERNEL_ACCOUNT); - if (!ff->release_args) { - kfree(ff); - return NULL; + if (release) { + ff->release_args = kzalloc(sizeof(*ff->release_args), + GFP_KERNEL_ACCOUNT); + if (!ff->release_args) { + kfree(ff); + return NULL; + } } INIT_LIST_HEAD(&ff->write_entry); @@ -104,14 +106,14 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args, kfree(ra); } -static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir) +static void fuse_file_put(struct fuse_file *ff, bool sync) { if (refcount_dec_and_test(&ff->count)) { - struct fuse_args *args = &ff->release_args->args; + struct fuse_release_args *ra = ff->release_args; + struct fuse_args *args = (ra ? &ra->args : NULL); - if (isdir ? ff->fm->fc->no_opendir : ff->fm->fc->no_open) { - /* Do nothing when client does not implement 'open' */ - fuse_release_end(ff->fm, args, 0); + if (!args) { + /* Do nothing when server does not implement 'open' */ } else if (sync) { fuse_simple_request(ff->fm, args); fuse_release_end(ff->fm, args, 0); @@ -131,15 +133,16 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, struct fuse_conn *fc = fm->fc; struct fuse_file *ff; int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN; + bool open = isdir ? !fc->no_opendir : !fc->no_open; - ff = fuse_file_alloc(fm); + ff = fuse_file_alloc(fm, open); if (!ff) return ERR_PTR(-ENOMEM); ff->fh = 0; /* Default for no-open */ ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0); - if (isdir ? !fc->no_opendir : !fc->no_open) { + if (open) { struct fuse_open_out outarg; int err; @@ -147,11 +150,13 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, if (!err) { ff->fh = outarg.fh; ff->open_flags = outarg.open_flags; - } else if (err != -ENOSYS) { fuse_file_free(ff); return ERR_PTR(err); } else { + /* No release needed */ + kfree(ff->release_args); + ff->release_args = NULL; if (isdir) fc->no_opendir = 1; else @@ -273,7 +278,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) } static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, - unsigned int flags, int opcode) + unsigned int flags, int opcode, bool sync) { struct fuse_conn *fc = ff->fm->fc; struct fuse_release_args *ra = ff->release_args; @@ -291,6 +296,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, wake_up_interruptible_all(&ff->poll_wait); + if (!ra) + return; + ra->inarg.fh = ff->fh; ra->inarg.flags = flags; ra->args.in_numargs = 1; @@ -300,6 +308,13 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff, ra->args.nodeid = ff->nodeid; ra->args.force = true; ra->args.nocreds = true; + + /* + * Hold inode until release is finished. + * From fuse_sync_release() the refcount is 1 and everything's + * synchronous, so we are fine with not doing igrab() here. + */ + ra->inode = sync ? NULL : igrab(&fi->inode); } void fuse_file_release(struct inode *inode, struct fuse_file *ff, @@ -309,14 +324,12 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, struct fuse_release_args *ra = ff->release_args; int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE; - fuse_prepare_release(fi, ff, open_flags, opcode); + fuse_prepare_release(fi, ff, open_flags, opcode, false); - if (ff->flock) { + if (ra && ff->flock) { ra->inarg.release_flags |= FUSE_RELEASE_FLOCK_UNLOCK; ra->inarg.lock_owner = fuse_lock_owner_id(ff->fm->fc, id); } - /* Hold inode until release is finished */ - ra->inode = igrab(inode); /* * Normally this will send the RELEASE request, however if @@ -327,7 +340,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff, * synchronous RELEASE is allowed (and desirable) in this case * because the server can be trusted not to screw up. */ - fuse_file_put(ff, ff->fm->fc->destroy, isdir); + fuse_file_put(ff, ff->fm->fc->destroy); } void fuse_release_common(struct file *file, bool isdir) @@ -362,12 +375,8 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff, unsigned int flags) { WARN_ON(refcount_read(&ff->count) > 1); - fuse_prepare_release(fi, ff, flags, FUSE_RELEASE); - /* - * iput(NULL) is a no-op and since the refcount is 1 and everything's - * synchronous, we are fine with not doing igrab() here" - */ - fuse_file_put(ff, true, false); + fuse_prepare_release(fi, ff, flags, FUSE_RELEASE, true); + fuse_file_put(ff, true); } EXPORT_SYMBOL_GPL(fuse_sync_release); @@ -966,7 +975,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, folio_put(ap->folios[i]); } if (ia->ff) - fuse_file_put(ia->ff, false, false); + fuse_file_put(ia->ff, false); fuse_io_folios_free(ia); } @@ -1756,7 +1765,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) __free_page(ap->pages[i]); if (wpa->ia.ff) - fuse_file_put(wpa->ia.ff, false, false); + fuse_file_put(wpa->ia.ff, false); kfree(ap->pages); kfree(wpa); @@ -2004,7 +2013,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc) ff = __fuse_write_file_get(fi); err = fuse_flush_times(inode, ff); if (ff) - fuse_file_put(ff, false, false); + fuse_file_put(ff, false); return err; } @@ -2370,7 +2379,7 @@ static int fuse_writepages(struct address_space *mapping, fuse_writepages_send(&data); } if (data.ff) - fuse_file_put(data.ff, false, false); + fuse_file_put(data.ff, false); kfree(data.orig_pages); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 59e26a4f07f9..90c44b3576d2 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1091,7 +1091,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, */ int fuse_open_common(struct inode *inode, struct file *file, bool isdir); -struct fuse_file *fuse_file_alloc(struct fuse_mount *fm); +struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release); void fuse_file_free(struct fuse_file *ff); void fuse_finish_open(struct inode *inode, struct file *file); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Prior to this change, data->ff is checked and if not initialized then initialized in the fuse_writepages_fill() callback, which gets called for every dirty page in the address space mapping. This logic is better placed in the main fuse_writepages() caller where data.ff is initialized before walking the dirty pages. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e0c96e37596f..8dbf2d9fadb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2265,13 +2265,6 @@ static int fuse_writepages_fill(struct folio *folio, struct folio *tmp_folio; int err; - if (!data->ff) { - err = -EIO; - data->ff = fuse_write_file_get(fi); - if (!data->ff) - goto out_unlock; - } - if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) { fuse_writepages_send(data); data->wpa = NULL; @@ -2350,13 +2343,13 @@ static int fuse_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct inode *inode = mapping->host; + struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_fill_wb_data data; int err; - err = -EIO; if (fuse_is_bad(inode)) - goto out; + return -EIO; if (wbc->sync_mode == WB_SYNC_NONE && fc->num_background >= fc->congestion_threshold) @@ -2364,7 +2357,9 @@ static int fuse_writepages(struct address_space *mapping, data.inode = inode; data.wpa = NULL; - data.ff = NULL; + data.ff = fuse_write_file_get(fi); + if (!data.ff) + return -EIO; err = -ENOMEM; data.orig_pages = kcalloc(fc->max_pages, @@ -2378,11 +2373,10 @@ static int fuse_writepages(struct address_space *mapping, WARN_ON(!data.wpa->ia.ap.num_pages); fuse_writepages_send(&data); } - if (data.ff) - fuse_file_put(data.ff, false); kfree(data.orig_pages); out: + fuse_file_put(data.ff, false); return err; } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Before this change, wpa->ia.ff is initialized with an acquired reference on the fuse file right before it submits the writeback request. If there are auxiliary writebacks, then the initialization and reference acquisition needs to also be set before we submit the auxiliary writeback request. To make the logic simpler and to pave the way for a subsequent refactoring of fuse_writepages_fill() and fuse_writepage_locked(), this change initializes and acquires wpa->ia.ff when the wpa is allocated. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8dbf2d9fadb1..26c8d68bf164 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1764,8 +1764,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) for (i = 0; i < ap->num_pages; i++) __free_page(ap->pages[i]); - if (wpa->ia.ff) - fuse_file_put(wpa->ia.ff, false); + fuse_file_put(wpa->ia.ff, false); kfree(ap->pages); kfree(wpa); @@ -1938,7 +1937,6 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, wpa->next = next->next; next->next = NULL; - next->ia.ff = fuse_file_get(wpa->ia.ff); tree_insert(&fi->writepages, next); /* @@ -2157,7 +2155,6 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) int num_pages = wpa->ia.ap.num_pages; int i; - wpa->ia.ff = fuse_file_get(data->ff); spin_lock(&fi->lock); list_add_tail(&wpa->queue_entry, &fi->queued_writes); fuse_flush_writepages(inode); @@ -2302,6 +2299,7 @@ static int fuse_writepages_fill(struct folio *folio, ap = &wpa->ia.ap; fuse_write_args_fill(&wpa->ia, data->ff, folio_pos(folio), 0); wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE; + wpa->ia.ff = fuse_file_get(data->ff); wpa->next = NULL; ap->args.in_pages = true; ap->args.end = fuse_writepage_end; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> This change refactors the shared logic in fuse_writepages_fill() and fuse_writepages_locked() into two separate helper functions, fuse_writepage_args_page_fill() and fuse_writepage_args_setup(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/file.c --- fs/fuse/file.c | 103 +++++++++++++++++++++++++++---------------------- 1 file changed, 57 insertions(+), 46 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 26c8d68bf164..f402735b7515 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2049,49 +2049,77 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, rcu_read_unlock(); } +static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio, + struct folio *tmp_folio, uint32_t page_index) +{ + struct inode *inode = folio->mapping->host; + struct fuse_args_pages *ap = &wpa->ia.ap; + + folio_copy(tmp_folio, folio); + + ap->pages[page_index] = &tmp_folio->page; + ap->descs[page_index].offset = 0; + ap->descs[page_index].length = PAGE_SIZE; + + inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); + inc_node_page_state(&tmp_folio->page, NR_WRITEBACK_TEMP); +} + +static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio, + struct fuse_file *ff) +{ + struct inode *inode = folio->mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_writepage_args *wpa; + struct fuse_args_pages *ap; + + wpa = fuse_writepage_args_alloc(); + if (!wpa) + return NULL; + + fuse_writepage_add_to_bucket(fc, wpa); + fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0); + wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE; + wpa->inode = inode; + wpa->ia.ff = ff; + + ap = &wpa->ia.ap; + ap->args.in_pages = true; + ap->args.end = fuse_writepage_end; + + return wpa; +} + static int fuse_writepage_locked(struct folio *folio) { struct address_space *mapping = folio->mapping; struct inode *inode = mapping->host; - struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_writepage_args *wpa; struct fuse_args_pages *ap; struct folio *tmp_folio; + struct fuse_file *ff; int error = -ENOMEM; - folio_start_writeback(folio); - - wpa = fuse_writepage_args_alloc(); - if (!wpa) - goto err; - ap = &wpa->ia.ap; - tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); if (!tmp_folio) - goto err_free; + goto err; error = -EIO; - wpa->ia.ff = fuse_write_file_get(fi); - if (!wpa->ia.ff) + ff = fuse_write_file_get(fi); + if (!ff) goto err_nofile; - fuse_writepage_add_to_bucket(fc, wpa); - fuse_write_args_fill(&wpa->ia, wpa->ia.ff, folio_pos(folio), 0); + wpa = fuse_writepage_args_setup(folio, ff); + error = -ENOMEM; + if (!wpa) + goto err_writepage_args; - folio_copy(tmp_folio, folio); - wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE; - wpa->next = NULL; - ap->args.in_pages = true; + ap = &wpa->ia.ap; ap->num_pages = 1; - ap->pages[0] = &tmp_folio->page; - ap->descs[0].offset = 0; - ap->descs[0].length = PAGE_SIZE; - ap->args.end = fuse_writepage_end; - wpa->inode = inode; - inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); + folio_start_writeback(folio); + fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0); spin_lock(&fi->lock); tree_insert(&fi->writepages, wpa); @@ -2103,13 +2131,12 @@ static int fuse_writepage_locked(struct folio *folio) return 0; +err_writepage_args: + fuse_file_put(ff, false); err_nofile: folio_put(tmp_folio); -err_free: - kfree(wpa); err: mapping_set_error(folio->mapping, error); - folio_end_writeback(folio); return error; } @@ -2287,36 +2314,20 @@ static int fuse_writepages_fill(struct folio *folio, */ if (data->wpa == NULL) { err = -ENOMEM; - wpa = fuse_writepage_args_alloc(); + wpa = fuse_writepage_args_setup(folio, data->ff); if (!wpa) { folio_put(tmp_folio); goto out_unlock; } - fuse_writepage_add_to_bucket(fc, wpa); - + fuse_file_get(wpa->ia.ff); data->max_pages = 1; - ap = &wpa->ia.ap; - fuse_write_args_fill(&wpa->ia, data->ff, folio_pos(folio), 0); - wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE; - wpa->ia.ff = fuse_file_get(data->ff); - wpa->next = NULL; - ap->args.in_pages = true; - ap->args.end = fuse_writepage_end; - ap->num_pages = 0; - wpa->inode = inode; } folio_start_writeback(folio); - folio_copy(tmp_folio, folio); - ap->pages[ap->num_pages] = &tmp_folio->page; - ap->descs[ap->num_pages].offset = 0; - ap->descs[ap->num_pages].length = PAGE_SIZE; + fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_pages); data->orig_pages[ap->num_pages] = &folio->page; - inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - inc_node_page_state(&tmp_folio->page, NR_WRITEBACK_TEMP); - err = 0; if (data->wpa) { /* -- 2.46.1

From: Josef Bacik <josef@toxicpanda.com> In order to make it easier to switch to folios in the fuse_args_pages update the places where we update the vmstat counters for writeback to use the folio related helpers. On the inc side this is easy as we already have the folio, on the dec side we have to page_folio() the pages for now. Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f402735b7515..af5cab2b8dd2 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1770,12 +1770,12 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) kfree(wpa); } -static void fuse_writepage_finish_stat(struct inode *inode, struct page *page) +static void fuse_writepage_finish_stat(struct inode *inode, struct folio *folio) { struct backing_dev_info *bdi = inode_to_bdi(inode); dec_wb_stat(&bdi->wb, WB_WRITEBACK); - dec_node_page_state(page, NR_WRITEBACK_TEMP); + node_stat_sub_folio(folio, NR_WRITEBACK_TEMP); wb_writeout_inc(&bdi->wb); } @@ -1787,7 +1787,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa) int i; for (i = 0; i < ap->num_pages; i++) - fuse_writepage_finish_stat(inode, ap->pages[i]); + fuse_writepage_finish_stat(inode, page_folio(ap->pages[i])); wake_up(&fi->page_waitq); } @@ -1842,7 +1842,8 @@ __acquires(fi->lock) for (aux = wpa->next; aux; aux = next) { next = aux->next; aux->next = NULL; - fuse_writepage_finish_stat(aux->inode, aux->ia.ap.pages[0]); + fuse_writepage_finish_stat(aux->inode, + page_folio(aux->ia.ap.pages[0])); fuse_writepage_free(aux); } @@ -2062,7 +2063,7 @@ static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struc ap->descs[page_index].length = PAGE_SIZE; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - inc_node_page_state(&tmp_folio->page, NR_WRITEBACK_TEMP); + node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); } static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio, @@ -2236,7 +2237,8 @@ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, spin_unlock(&fi->lock); if (tmp) { - fuse_writepage_finish_stat(new_wpa->inode, new_ap->pages[0]); + fuse_writepage_finish_stat(new_wpa->inode, + page_folio(new_ap->pages[0])); fuse_writepage_free(new_wpa); } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert writeback requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 126 +++++++++++++++++++++++++------------------------ 1 file changed, 64 insertions(+), 62 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index af5cab2b8dd2..1f3fa35e3af8 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -425,7 +425,7 @@ static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, wpa = rb_entry(n, struct fuse_writepage_args, writepages_entry); WARN_ON(get_fuse_inode(wpa->inode) != fi); curr_index = wpa->ia.write.in.offset >> PAGE_SHIFT; - if (idx_from >= curr_index + wpa->ia.ap.num_pages) + if (idx_from >= curr_index + wpa->ia.ap.num_folios) n = n->rb_right; else if (idx_to < curr_index) n = n->rb_left; @@ -1761,12 +1761,12 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) if (wpa->bucket) fuse_sync_bucket_dec(wpa->bucket); - for (i = 0; i < ap->num_pages; i++) - __free_page(ap->pages[i]); + for (i = 0; i < ap->num_folios; i++) + folio_put(ap->folios[i]); fuse_file_put(wpa->ia.ff, false); - kfree(ap->pages); + kfree(ap->folios); kfree(wpa); } @@ -1786,8 +1786,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa) struct fuse_inode *fi = get_fuse_inode(inode); int i; - for (i = 0; i < ap->num_pages; i++) - fuse_writepage_finish_stat(inode, page_folio(ap->pages[i])); + for (i = 0; i < ap->num_folios; i++) + fuse_writepage_finish_stat(inode, ap->folios[i]); wake_up(&fi->page_waitq); } @@ -1802,7 +1802,8 @@ __acquires(fi->lock) struct fuse_inode *fi = get_fuse_inode(wpa->inode); struct fuse_write_in *inarg = &wpa->ia.write.in; struct fuse_args *args = &wpa->ia.ap.args; - __u64 data_size = wpa->ia.ap.num_pages * PAGE_SIZE; + /* Currently, all folios in FUSE are one page */ + __u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE; int err; fi->writectr++; @@ -1843,7 +1844,7 @@ __acquires(fi->lock) next = aux->next; aux->next = NULL; fuse_writepage_finish_stat(aux->inode, - page_folio(aux->ia.ap.pages[0])); + aux->ia.ap.folios[0]); fuse_writepage_free(aux); } @@ -1878,11 +1879,11 @@ static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root, struct fuse_writepage_args *wpa) { pgoff_t idx_from = wpa->ia.write.in.offset >> PAGE_SHIFT; - pgoff_t idx_to = idx_from + wpa->ia.ap.num_pages - 1; + pgoff_t idx_to = idx_from + wpa->ia.ap.num_folios - 1; struct rb_node **p = &root->rb_node; struct rb_node *parent = NULL; - WARN_ON(!wpa->ia.ap.num_pages); + WARN_ON(!wpa->ia.ap.num_folios); while (*p) { struct fuse_writepage_args *curr; pgoff_t curr_index; @@ -1893,7 +1894,7 @@ static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root, WARN_ON(curr->inode != wpa->inode); curr_index = curr->ia.write.in.offset >> PAGE_SHIFT; - if (idx_from >= curr_index + curr->ia.ap.num_pages) + if (idx_from >= curr_index + curr->ia.ap.num_folios) p = &(*p)->rb_right; else if (idx_to < curr_index) p = &(*p)->rb_left; @@ -2025,9 +2026,10 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) wpa = kzalloc(sizeof(*wpa), GFP_NOFS); if (wpa) { ap = &wpa->ia.ap; - ap->num_pages = 0; - ap->pages = fuse_pages_alloc(1, GFP_NOFS, &ap->descs); - if (!ap->pages) { + ap->num_folios = 0; + ap->uses_folios = true; + ap->folios = fuse_folios_alloc(1, GFP_NOFS, &ap->folio_descs); + if (!ap->folios) { kfree(wpa); wpa = NULL; } @@ -2051,16 +2053,16 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, } static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio, - struct folio *tmp_folio, uint32_t page_index) + struct folio *tmp_folio, uint32_t folio_index) { struct inode *inode = folio->mapping->host; struct fuse_args_pages *ap = &wpa->ia.ap; folio_copy(tmp_folio, folio); - ap->pages[page_index] = &tmp_folio->page; - ap->descs[page_index].offset = 0; - ap->descs[page_index].length = PAGE_SIZE; + ap->folios[folio_index] = tmp_folio; + ap->folio_descs[folio_index].offset = 0; + ap->folio_descs[folio_index].length = PAGE_SIZE; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); @@ -2117,7 +2119,7 @@ static int fuse_writepage_locked(struct folio *folio) goto err_writepage_args; ap = &wpa->ia.ap; - ap->num_pages = 1; + ap->num_folios = 1; folio_start_writeback(folio); fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0); @@ -2145,32 +2147,32 @@ struct fuse_fill_wb_data { struct fuse_writepage_args *wpa; struct fuse_file *ff; struct inode *inode; - struct page **orig_pages; - unsigned int max_pages; + struct folio **orig_folios; + unsigned int max_folios; }; static bool fuse_pages_realloc(struct fuse_fill_wb_data *data) { struct fuse_args_pages *ap = &data->wpa->ia.ap; struct fuse_conn *fc = get_fuse_conn(data->inode); - struct page **pages; - struct fuse_page_desc *descs; - unsigned int npages = min_t(unsigned int, - max_t(unsigned int, data->max_pages * 2, - FUSE_DEFAULT_MAX_PAGES_PER_REQ), + struct folio **folios; + struct fuse_folio_desc *descs; + unsigned int nfolios = min_t(unsigned int, + max_t(unsigned int, data->max_folios * 2, + FUSE_DEFAULT_MAX_PAGES_PER_REQ), fc->max_pages); - WARN_ON(npages <= data->max_pages); + WARN_ON(nfolios <= data->max_folios); - pages = fuse_pages_alloc(npages, GFP_NOFS, &descs); - if (!pages) + folios = fuse_folios_alloc(nfolios, GFP_NOFS, &descs); + if (!folios) return false; - memcpy(pages, ap->pages, sizeof(struct page *) * ap->num_pages); - memcpy(descs, ap->descs, sizeof(struct fuse_page_desc) * ap->num_pages); - kfree(ap->pages); - ap->pages = pages; - ap->descs = descs; - data->max_pages = npages; + memcpy(folios, ap->folios, sizeof(struct folio *) * ap->num_folios); + memcpy(descs, ap->folio_descs, sizeof(struct fuse_folio_desc) * ap->num_folios); + kfree(ap->folios); + ap->folios = folios; + ap->folio_descs = descs; + data->max_folios = nfolios; return true; } @@ -2180,7 +2182,7 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) struct fuse_writepage_args *wpa = data->wpa; struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); - int num_pages = wpa->ia.ap.num_pages; + int num_folios = wpa->ia.ap.num_folios; int i; spin_lock(&fi->lock); @@ -2188,8 +2190,8 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) fuse_flush_writepages(inode); spin_unlock(&fi->lock); - for (i = 0; i < num_pages; i++) - end_page_writeback(data->orig_pages[i]); + for (i = 0; i < num_folios; i++) + folio_end_writeback(data->orig_folios[i]); } /* @@ -2200,15 +2202,15 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) * swapping the new temp page with the old one. */ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, - struct page *page) + struct folio *folio) { struct fuse_inode *fi = get_fuse_inode(new_wpa->inode); struct fuse_writepage_args *tmp; struct fuse_writepage_args *old_wpa; struct fuse_args_pages *new_ap = &new_wpa->ia.ap; - WARN_ON(new_ap->num_pages != 0); - new_ap->num_pages = 1; + WARN_ON(new_ap->num_folios != 0); + new_ap->num_folios = 1; spin_lock(&fi->lock); old_wpa = fuse_insert_writeback(&fi->writepages, new_wpa); @@ -2222,9 +2224,9 @@ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, WARN_ON(tmp->inode != new_wpa->inode); curr_index = tmp->ia.write.in.offset >> PAGE_SHIFT; - if (curr_index == page->index) { - WARN_ON(tmp->ia.ap.num_pages != 1); - swap(tmp->ia.ap.pages[0], new_ap->pages[0]); + if (curr_index == folio->index) { + WARN_ON(tmp->ia.ap.num_folios != 1); + swap(tmp->ia.ap.folios[0], new_ap->folios[0]); break; } } @@ -2238,7 +2240,7 @@ static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, if (tmp) { fuse_writepage_finish_stat(new_wpa->inode, - page_folio(new_ap->pages[0])); + folio); fuse_writepage_free(new_wpa); } @@ -2249,7 +2251,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, struct fuse_args_pages *ap, struct fuse_fill_wb_data *data) { - WARN_ON(!ap->num_pages); + WARN_ON(!ap->num_folios); /* * Being under writeback is unlikely but possible. For example direct @@ -2261,19 +2263,19 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, return true; /* Reached max pages */ - if (ap->num_pages == fc->max_pages) + if (ap->num_folios == fc->max_pages) return true; /* Reached max write bytes */ - if ((ap->num_pages + 1) * PAGE_SIZE > fc->max_write) + if ((ap->num_folios + 1) * PAGE_SIZE > fc->max_write) return true; /* Discontinuity */ - if (data->orig_pages[ap->num_pages - 1]->index + 1 != folio_index(folio)) + if (data->orig_folios[ap->num_folios - 1]->index + 1 != folio_index(folio)) return true; /* Need to grow the pages array? If so, did the expansion fail? */ - if (ap->num_pages == data->max_pages && !fuse_pages_realloc(data)) + if (ap->num_folios == data->max_folios && !fuse_pages_realloc(data)) return true; return false; @@ -2310,7 +2312,7 @@ static int fuse_writepages_fill(struct folio *folio, * This is ensured by holding the page lock in page_mkwrite() while * checking fuse_page_is_writeback(). We already hold the page lock * since clear_page_dirty_for_io() and keep it held until we add the - * request to the fi->writepages list and increment ap->num_pages. + * request to the fi->writepages list and increment ap->num_folios. * After this fuse_page_is_writeback() will indicate that the page is * under writeback, so we can release the page lock. */ @@ -2322,13 +2324,13 @@ static int fuse_writepages_fill(struct folio *folio, goto out_unlock; } fuse_file_get(wpa->ia.ff); - data->max_pages = 1; + data->max_folios = 1; ap = &wpa->ia.ap; } folio_start_writeback(folio); - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_pages); - data->orig_pages[ap->num_pages] = &folio->page; + fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_folios); + data->orig_folios[ap->num_folios] = folio; err = 0; if (data->wpa) { @@ -2337,9 +2339,9 @@ static int fuse_writepages_fill(struct folio *folio, * fuse_page_is_writeback(). */ spin_lock(&fi->lock); - ap->num_pages++; + ap->num_folios++; spin_unlock(&fi->lock); - } else if (fuse_writepage_add(wpa, &folio->page)) { + } else if (fuse_writepage_add(wpa, folio)) { data->wpa = wpa; } else { folio_end_writeback(folio); @@ -2373,19 +2375,19 @@ static int fuse_writepages(struct address_space *mapping, return -EIO; err = -ENOMEM; - data.orig_pages = kcalloc(fc->max_pages, - sizeof(struct page *), - GFP_NOFS); - if (!data.orig_pages) + data.orig_folios = kcalloc(fc->max_pages, + sizeof(struct folio *), + GFP_NOFS); + if (!data.orig_folios) goto out; err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data); if (data.wpa) { - WARN_ON(!data.wpa->ia.ap.num_pages); + WARN_ON(!data.wpa->ia.ap.num_folios); fuse_writepages_send(&data); } - kfree(data.orig_pages); + kfree(data.orig_folios); out: fuse_file_put(data.ff, false); return err; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Convert direct io requests to use folios instead of pages. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 80 +++++++++++++++++++++--------------------------- fs/fuse/fuse_i.h | 22 ------------- 2 files changed, 35 insertions(+), 67 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1f3fa35e3af8..94bb0abc2f5b 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -654,11 +654,11 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, ssize_t nres, { unsigned int i; - for (i = 0; i < ap->num_pages; i++) { + for (i = 0; i < ap->num_folios; i++) { if (should_dirty) - set_page_dirty_lock(ap->pages[i]); + folio_mark_dirty_lock(ap->folios[i]); if (ap->args.is_pinned) - unpin_user_page(ap->pages[i]); + unpin_folio(ap->folios[i]); } if (nres > 0 && ap->args.invalidate_vmap) @@ -731,24 +731,6 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) kref_put(&io->refcnt, fuse_io_release); } -static struct fuse_io_args *fuse_io_alloc(struct fuse_io_priv *io, - unsigned int npages) -{ - struct fuse_io_args *ia; - - ia = kzalloc(sizeof(*ia), GFP_KERNEL); - if (ia) { - ia->io = io; - ia->ap.pages = fuse_pages_alloc(npages, GFP_KERNEL, - &ia->ap.descs); - if (!ia->ap.pages) { - kfree(ia); - ia = NULL; - } - } - return ia; -} - static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, unsigned int nfolios) { @@ -768,12 +750,6 @@ static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, return ia; } -static void fuse_io_free(struct fuse_io_args *ia) -{ - kfree(ia->ap.pages); - kfree(ia); -} - static void fuse_io_folios_free(struct fuse_io_args *ia) { kfree(ia->ap.folios); @@ -810,7 +786,7 @@ static void fuse_aio_complete_req(struct fuse_mount *fm, struct fuse_args *args, fuse_release_user_pages(&ia->ap, err ?: nres, io->should_dirty); fuse_aio_complete(io, err, pos); - fuse_io_free(ia); + fuse_io_folios_free(ia); } static ssize_t fuse_async_req_send(struct fuse_mount *fm, @@ -1450,6 +1426,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, bool use_pages_for_kvec_io) { bool flush_or_invalidate = false; + unsigned int nr_pages = 0; size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; @@ -1479,15 +1456,23 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, } } - while (nbytes < *nbytesp && ap->num_pages < max_pages) { - unsigned npages; + /* + * Until there is support for iov_iter_extract_folios(), we have to + * manually extract pages using iov_iter_extract_pages() and then + * copy that to a folios array. + */ + struct page **pages = kzalloc(max_pages * sizeof(struct page *), + GFP_KERNEL); + if (!pages) + return -ENOMEM; + + while (nbytes < *nbytesp && nr_pages < max_pages) { + unsigned nfolios, i; size_t start; - struct page **pt_pages; - pt_pages = &ap->pages[ap->num_pages]; - ret = iov_iter_extract_pages(ii, &pt_pages, + ret = iov_iter_extract_pages(ii, &pages, *nbytesp - nbytes, - max_pages - ap->num_pages, + max_pages - nr_pages, 0, &start); if (ret < 0) break; @@ -1495,15 +1480,20 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, nbytes += ret; ret += start; - npages = DIV_ROUND_UP(ret, PAGE_SIZE); + /* Currently, all folios in FUSE are one page */ + nfolios = DIV_ROUND_UP(ret, PAGE_SIZE); - ap->descs[ap->num_pages].offset = start; - fuse_page_descs_length_init(ap->descs, ap->num_pages, npages); + ap->folio_descs[ap->num_folios].offset = start; + fuse_folio_descs_length_init(ap->folio_descs, ap->num_folios, nfolios); + for (i = 0; i < nfolios; i++) + ap->folios[i + ap->num_folios] = page_folio(pages[i]); - ap->num_pages += npages; - ap->descs[ap->num_pages - 1].length -= + ap->num_folios += nfolios; + ap->folio_descs[ap->num_folios - 1].length -= (PAGE_SIZE - ret) & (PAGE_SIZE - 1); + nr_pages += nfolios; } + kfree(pages); if (write && flush_or_invalidate) flush_kernel_vmap_range(ap->args.vmap_base, nbytes); @@ -1543,14 +1533,14 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, bool fopen_direct_io = ff->open_flags & FOPEN_DIRECT_IO; max_pages = iov_iter_npages(iter, fc->max_pages); - ia = fuse_io_alloc(io, max_pages); + ia = fuse_io_folios_alloc(io, max_pages); if (!ia) return -ENOMEM; if (fopen_direct_io && fc->direct_io_allow_mmap) { res = filemap_write_and_wait_range(mapping, pos, pos + count - 1); if (res) { - fuse_io_free(ia); + fuse_io_folios_free(ia); return res; } } @@ -1565,7 +1555,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (fopen_direct_io && write) { res = invalidate_inode_pages2_range(mapping, idx_from, idx_to); if (res) { - fuse_io_free(ia); + fuse_io_folios_free(ia); return res; } } @@ -1592,7 +1582,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (!io->async || nres < 0) { fuse_release_user_pages(&ia->ap, nres, io->should_dirty); - fuse_io_free(ia); + fuse_io_folios_free(ia); } ia = NULL; if (nres < 0) { @@ -1611,13 +1601,13 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, } if (count) { max_pages = iov_iter_npages(iter, fc->max_pages); - ia = fuse_io_alloc(io, max_pages); + ia = fuse_io_folios_alloc(io, max_pages); if (!ia) break; } } if (ia) - fuse_io_free(ia); + fuse_io_folios_free(ia); if (res > 0) *ppos = pos; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 90c44b3576d2..87bedbca529d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -981,18 +981,6 @@ static inline bool fuse_is_bad(struct inode *inode) return unlikely(test_bit(FUSE_I_BAD, &get_fuse_inode(inode)->state)); } -static inline struct page **fuse_pages_alloc(unsigned int npages, gfp_t flags, - struct fuse_page_desc **desc) -{ - struct page **pages; - - pages = kzalloc(npages * (sizeof(struct page *) + - sizeof(struct fuse_page_desc)), flags); - *desc = (void *) (pages + npages); - - return pages; -} - static inline struct folio **fuse_folios_alloc(unsigned int nfolios, gfp_t flags, struct fuse_folio_desc **desc) { @@ -1005,16 +993,6 @@ static inline struct folio **fuse_folios_alloc(unsigned int nfolios, gfp_t flags return folios; } -static inline void fuse_page_descs_length_init(struct fuse_page_desc *descs, - unsigned int index, - unsigned int nr_pages) -{ - int i; - - for (i = index; i < index + nr_pages; i++) - descs[i].length = PAGE_SIZE - descs[i].offset; -} - static inline void fuse_folio_descs_length_init(struct fuse_folio_desc *descs, unsigned int index, unsigned int nr_folios) -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> All fuse requests use folios instead of pages for transferring data. Remove pages from the requests and exclusively use folios. No functional changes. [SzM: rename back folio_descs -> descs, etc.] Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/cuse.c | 3 +- fs/fuse/dev.c | 57 ++++++++++----------------- fs/fuse/dir.c | 3 +- fs/fuse/file.c | 58 +++++++++++++--------------- fs/fuse/fuse_i.h | 22 ++--------- fs/fuse/ioctl.c | 5 +-- fs/fuse/readdir.c | 3 +- fs/fuse/virtio_fs.c | 93 +++++++++++++++++---------------------------- 8 files changed, 90 insertions(+), 154 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 6018af98dd08..40f3e640e055 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -456,10 +456,9 @@ static int cuse_send_init(struct cuse_conn *cc) ap->args.out_args[1].size = CUSE_INIT_INFO_MAX; ap->args.out_argvar = true; ap->args.out_pages = true; - ap->uses_folios = true; ap->num_folios = 1; ap->folios = &ia->folio; - ap->folio_descs = &ia->desc; + ap->descs = &ia->desc; ia->folio = folio; ia->desc.length = ap->args.out_args[1].size; ap->args.end = cuse_process_init_reply; diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 57790734ad48..875d7ecd2fde 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -983,41 +983,27 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, struct fuse_req *req = cs->req; struct fuse_args_pages *ap = container_of(req->args, typeof(*ap), args); - if (ap->uses_folios) { - for (i = 0; i < ap->num_folios && (nbytes || zeroing); i++) { - int err; - unsigned int offset = ap->folio_descs[i].offset; - unsigned int count = min(nbytes, ap->folio_descs[i].length); - struct page *orig, *pagep; + for (i = 0; i < ap->num_folios && (nbytes || zeroing); i++) { + int err; + unsigned int offset = ap->descs[i].offset; + unsigned int count = min(nbytes, ap->descs[i].length); + struct page *orig, *pagep; - orig = pagep = &ap->folios[i]->page; + orig = pagep = &ap->folios[i]->page; - err = fuse_copy_page(cs, &pagep, offset, count, zeroing); - if (err) - return err; - - nbytes -= count; - - /* - * fuse_copy_page may have moved a page from a pipe - * instead of copying into our given page, so update - * the folios if it was replaced. - */ - if (pagep != orig) - ap->folios[i] = page_folio(pagep); - } - } else { - for (i = 0; i < ap->num_pages && (nbytes || zeroing); i++) { - int err; - unsigned int offset = ap->descs[i].offset; - unsigned int count = min(nbytes, ap->descs[i].length); + err = fuse_copy_page(cs, &pagep, offset, count, zeroing); + if (err) + return err; - err = fuse_copy_page(cs, &ap->pages[i], offset, count, zeroing); - if (err) - return err; + nbytes -= count; - nbytes -= count; - } + /* + * fuse_copy_page may have moved a page from a pipe instead of + * copying into our given page, so update the folios if it was + * replaced. + */ + if (pagep != orig) + ap->folios[i] = page_folio(pagep); } return 0; } @@ -1717,7 +1703,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT; num_pages = min(num_pages, fc->max_pages); - args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->folio_descs[0])); + args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0])); ra = kzalloc(args_size, GFP_KERNEL); if (!ra) @@ -1725,8 +1711,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, ap = &ra->ap; ap->folios = (void *) (ra + 1); - ap->folio_descs = (void *) (ap->folios + num_pages); - ap->uses_folios = true; + ap->descs = (void *) (ap->folios + num_pages); args = &ap->args; args->nodeid = outarg->nodeid; @@ -1747,8 +1732,8 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, this_num = min_t(unsigned, num, PAGE_SIZE - offset); ap->folios[ap->num_folios] = folio; - ap->folio_descs[ap->num_folios].offset = offset; - ap->folio_descs[ap->num_folios].length = this_num; + ap->descs[ap->num_folios].offset = offset; + ap->descs[ap->num_folios].length = this_num; ap->num_folios++; cur_pages++; diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index a163faefa9c2..b50a091b1dfb 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1566,10 +1566,9 @@ static int fuse_readlink_page(struct inode *inode, struct folio *folio) struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 }; struct fuse_args_pages ap = { - .uses_folios = true, .num_folios = 1, .folios = &folio, - .folio_descs = &desc, + .descs = &desc, }; char *link; ssize_t res; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 94bb0abc2f5b..5eb1571960f5 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -731,7 +731,7 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) kref_put(&io->refcnt, fuse_io_release); } -static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, +static struct fuse_io_args *fuse_io_alloc(struct fuse_io_priv *io, unsigned int nfolios) { struct fuse_io_args *ia; @@ -739,9 +739,8 @@ static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, ia = kzalloc(sizeof(*ia), GFP_KERNEL); if (ia) { ia->io = io; - ia->ap.uses_folios = true; ia->ap.folios = fuse_folios_alloc(nfolios, GFP_KERNEL, - &ia->ap.folio_descs); + &ia->ap.descs); if (!ia->ap.folios) { kfree(ia); ia = NULL; @@ -750,7 +749,7 @@ static struct fuse_io_args *fuse_io_folios_alloc(struct fuse_io_priv *io, return ia; } -static void fuse_io_folios_free(struct fuse_io_args *ia) +static void fuse_io_free(struct fuse_io_args *ia) { kfree(ia->ap.folios); kfree(ia); @@ -786,7 +785,7 @@ static void fuse_aio_complete_req(struct fuse_mount *fm, struct fuse_args *args, fuse_release_user_pages(&ia->ap, err ?: nres, io->should_dirty); fuse_aio_complete(io, err, pos); - fuse_io_folios_free(ia); + fuse_io_free(ia); } static ssize_t fuse_async_req_send(struct fuse_mount *fm, @@ -869,10 +868,9 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio) struct fuse_io_args ia = { .ap.args.page_zeroing = true, .ap.args.out_pages = true, - .ap.uses_folios = true, .ap.num_folios = 1, .ap.folios = &folio, - .ap.folio_descs = &desc, + .ap.descs = &desc, }; ssize_t res; u64 attr_ver; @@ -953,7 +951,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, if (ia->ff) fuse_file_put(ia->ff, false); - fuse_io_folios_free(ia); + fuse_io_free(ia); } static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) @@ -974,7 +972,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) /* Don't overflow end offset */ if (pos + (count - 1) == LLONG_MAX) { count--; - ap->folio_descs[ap->num_folios - 1].length--; + ap->descs[ap->num_folios - 1].length--; } WARN_ON((loff_t) (pos + count) < 0); @@ -1035,7 +1033,7 @@ static void fuse_readahead(struct readahead_control *rac) */ break; - ia = fuse_io_folios_alloc(NULL, cur_pages); + ia = fuse_io_alloc(NULL, cur_pages); if (!ia) return; ap = &ia->ap; @@ -1050,7 +1048,7 @@ static void fuse_readahead(struct readahead_control *rac) */ folio = __readahead_folio(rac); ap->folios[ap->num_folios] = folio; - ap->folio_descs[ap->num_folios].length = folio_size(folio); + ap->descs[ap->num_folios].length = folio_size(folio); ap->num_folios++; } fuse_send_readpages(ia, rac->file); @@ -1184,7 +1182,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, err = -EIO; short_write = ia->write.out.size < count; - offset = ap->folio_descs[0].offset; + offset = ap->descs[0].offset; count = ia->write.out.size; for (i = 0; i < ap->num_folios; i++) { struct folio *folio = ap->folios[i]; @@ -1222,7 +1220,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, int err; ap->args.in_pages = true; - ap->folio_descs[0].offset = offset; + ap->descs[0].offset = offset; do { size_t tmp; @@ -1259,7 +1257,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, err = 0; ap->folios[ap->num_folios] = folio; - ap->folio_descs[ap->num_folios].length = tmp; + ap->descs[ap->num_folios].length = tmp; ap->num_folios++; nr_pages++; @@ -1316,8 +1314,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii) unsigned int nr_pages = fuse_wr_pages(pos, iov_iter_count(ii), fc->max_pages); - ap->uses_folios = true; - ap->folios = fuse_folios_alloc(nr_pages, GFP_KERNEL, &ap->folio_descs); + ap->folios = fuse_folios_alloc(nr_pages, GFP_KERNEL, &ap->descs); if (!ap->folios) { err = -ENOMEM; break; @@ -1483,13 +1480,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, /* Currently, all folios in FUSE are one page */ nfolios = DIV_ROUND_UP(ret, PAGE_SIZE); - ap->folio_descs[ap->num_folios].offset = start; - fuse_folio_descs_length_init(ap->folio_descs, ap->num_folios, nfolios); + ap->descs[ap->num_folios].offset = start; + fuse_folio_descs_length_init(ap->descs, ap->num_folios, nfolios); for (i = 0; i < nfolios; i++) ap->folios[i + ap->num_folios] = page_folio(pages[i]); ap->num_folios += nfolios; - ap->folio_descs[ap->num_folios - 1].length -= + ap->descs[ap->num_folios - 1].length -= (PAGE_SIZE - ret) & (PAGE_SIZE - 1); nr_pages += nfolios; } @@ -1533,14 +1530,14 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, bool fopen_direct_io = ff->open_flags & FOPEN_DIRECT_IO; max_pages = iov_iter_npages(iter, fc->max_pages); - ia = fuse_io_folios_alloc(io, max_pages); + ia = fuse_io_alloc(io, max_pages); if (!ia) return -ENOMEM; if (fopen_direct_io && fc->direct_io_allow_mmap) { res = filemap_write_and_wait_range(mapping, pos, pos + count - 1); if (res) { - fuse_io_folios_free(ia); + fuse_io_free(ia); return res; } } @@ -1555,7 +1552,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (fopen_direct_io && write) { res = invalidate_inode_pages2_range(mapping, idx_from, idx_to); if (res) { - fuse_io_folios_free(ia); + fuse_io_free(ia); return res; } } @@ -1582,7 +1579,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (!io->async || nres < 0) { fuse_release_user_pages(&ia->ap, nres, io->should_dirty); - fuse_io_folios_free(ia); + fuse_io_free(ia); } ia = NULL; if (nres < 0) { @@ -1601,13 +1598,13 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, } if (count) { max_pages = iov_iter_npages(iter, fc->max_pages); - ia = fuse_io_folios_alloc(io, max_pages); + ia = fuse_io_alloc(io, max_pages); if (!ia) break; } } if (ia) - fuse_io_folios_free(ia); + fuse_io_free(ia); if (res > 0) *ppos = pos; @@ -2017,8 +2014,7 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) if (wpa) { ap = &wpa->ia.ap; ap->num_folios = 0; - ap->uses_folios = true; - ap->folios = fuse_folios_alloc(1, GFP_NOFS, &ap->folio_descs); + ap->folios = fuse_folios_alloc(1, GFP_NOFS, &ap->descs); if (!ap->folios) { kfree(wpa); wpa = NULL; @@ -2051,8 +2047,8 @@ static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struc folio_copy(tmp_folio, folio); ap->folios[folio_index] = tmp_folio; - ap->folio_descs[folio_index].offset = 0; - ap->folio_descs[folio_index].length = PAGE_SIZE; + ap->descs[folio_index].offset = 0; + ap->descs[folio_index].length = PAGE_SIZE; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); @@ -2158,10 +2154,10 @@ static bool fuse_pages_realloc(struct fuse_fill_wb_data *data) return false; memcpy(folios, ap->folios, sizeof(struct folio *) * ap->num_folios); - memcpy(descs, ap->folio_descs, sizeof(struct fuse_folio_desc) * ap->num_folios); + memcpy(descs, ap->descs, sizeof(struct fuse_folio_desc) * ap->num_folios); kfree(ap->folios); ap->folios = folios; - ap->folio_descs = descs; + ap->descs = descs; data->max_folios = nfolios; return true; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 87bedbca529d..51e95eac47ff 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -263,12 +263,6 @@ struct fuse_arg { void *value; }; -/** FUSE page descriptor */ -struct fuse_page_desc { - unsigned int length; - unsigned int offset; -}; - /** FUSE folio descriptor */ struct fuse_folio_desc { unsigned int length; @@ -303,19 +297,9 @@ struct fuse_args { struct fuse_args_pages { struct fuse_args args; - union { - struct { - struct page **pages; - struct fuse_page_desc *descs; - unsigned int num_pages; - }; - struct { - struct folio **folios; - struct fuse_folio_desc *folio_descs; - unsigned int num_folios; - }; - }; - bool uses_folios; + struct folio **folios; + struct fuse_folio_desc *descs; + unsigned int num_folios; }; #define FUSE_ARGS(args) struct fuse_args args = {} diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c index dc3e7c8ff97b..27115c618e94 100644 --- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -201,12 +201,12 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, BUILD_BUG_ON(sizeof(struct fuse_ioctl_iovec) * FUSE_IOCTL_MAX_IOV > PAGE_SIZE); err = -ENOMEM; - ap.folios = fuse_folios_alloc(fm->fc->max_pages, GFP_KERNEL, &ap.folio_descs); + ap.folios = fuse_folios_alloc(fm->fc->max_pages, GFP_KERNEL, &ap.descs); iov_page = (struct iovec *) __get_free_page(GFP_KERNEL); if (!ap.folios || !iov_page) goto out; - fuse_folio_descs_length_init(ap.folio_descs, 0, fm->fc->max_pages); + fuse_folio_descs_length_init(ap.descs, 0, fm->fc->max_pages); /* * If restricted, initialize IO parameters as encoded in @cmd. @@ -244,7 +244,6 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, err = -ENOMEM; if (max_pages > fm->fc->max_pages) goto out; - ap.uses_folios = true; while (ap.num_folios < max_pages) { ap.folios[ap.num_folios] = folio_alloc(GFP_KERNEL | __GFP_HIGHMEM, 0); if (!ap.folios[ap.num_folios]) diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c index 32604211403c..70dbdc4725e0 100644 --- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -346,10 +346,9 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) plus = fuse_use_readdirplus(inode, ctx); ap->args.out_pages = true; - ap->uses_folios = true; ap->num_folios = 1; ap->folios = &folio; - ap->folio_descs = &desc; + ap->descs = &desc; if (plus) { attr_version = fuse_get_attr_version(fm->fc); fuse_read_args_fill(&ia, file, ctx->pos, PAGE_SIZE, diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index f974c7896783..591fb2beacde 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -623,7 +623,6 @@ static void virtio_fs_request_complete(struct fuse_req *req, struct fuse_args *args; struct fuse_args_pages *ap; unsigned int len, i, thislen; - struct page *page; struct folio *folio; /* @@ -636,29 +635,15 @@ static void virtio_fs_request_complete(struct fuse_req *req, if (args->out_pages && args->page_zeroing) { len = args->out_args[args->out_numargs - 1].size; ap = container_of(args, typeof(*ap), args); - if (ap->uses_folios) { - for (i = 0; i < ap->num_folios; i++) { - thislen = ap->folio_descs[i].length; - if (len < thislen) { - WARN_ON(ap->folio_descs[i].offset); - folio = ap->folios[i]; - folio_zero_segment(folio, len, thislen); - len = 0; - } else { - len -= thislen; - } - } - } else { - for (i = 0; i < ap->num_pages; i++) { - thislen = ap->descs[i].length; - if (len < thislen) { - WARN_ON(ap->descs[i].offset); - page = ap->pages[i]; - zero_user_segment(page, len, thislen); - len = 0; - } else { - len -= thislen; - } + for (i = 0; i < ap->num_folios; i++) { + thislen = ap->descs[i].length; + if (len < thislen) { + WARN_ON(ap->descs[i].offset); + folio = ap->folios[i]; + folio_zero_segment(folio, len, thislen); + len = 0; + } else { + len -= thislen; } } } @@ -1147,22 +1132,16 @@ __releases(fiq->lock) } /* Count number of scatter-gather elements required */ -static unsigned int sg_count_fuse_pages(struct fuse_args_pages *ap, - unsigned int total_len) +static unsigned int sg_count_fuse_folios(struct fuse_folio_desc *folio_descs, + unsigned int num_folios, + unsigned int total_len) { unsigned int i; unsigned int this_len; - if (ap->uses_folios) { - for (i = 0; i < ap->num_folios && total_len; i++) { - this_len = min(ap->folio_descs[i].length, total_len); - total_len -= this_len; - } - } else { - for (i = 0; i < ap->num_pages && total_len; i++) { - this_len = min(ap->descs[i].length, total_len); - total_len -= this_len; - } + for (i = 0; i < num_folios && total_len; i++) { + this_len = min(folio_descs[i].length, total_len); + total_len -= this_len; } return i; @@ -1180,7 +1159,8 @@ static unsigned int sg_count_fuse_req(struct fuse_req *req) if (args->in_pages) { size = args->in_args[args->in_numargs - 1].size; - total_sgs += sg_count_fuse_pages(ap, size); + total_sgs += sg_count_fuse_folios(ap->descs, ap->num_folios, + size); } if (!test_bit(FR_ISREPLY, &req->flags)) @@ -1193,35 +1173,28 @@ static unsigned int sg_count_fuse_req(struct fuse_req *req) if (args->out_pages) { size = args->out_args[args->out_numargs - 1].size; - total_sgs += sg_count_fuse_pages(ap, size); + total_sgs += sg_count_fuse_folios(ap->descs, ap->num_folios, + size); } return total_sgs; } -/* Add pages/folios to scatter-gather list and return number of elements used */ -static unsigned int sg_init_fuse_pages(struct scatterlist *sg, - struct fuse_args_pages *ap, - unsigned int total_len) +/* Add folios to scatter-gather list and return number of elements used */ +static unsigned int sg_init_fuse_folios(struct scatterlist *sg, + struct folio **folios, + struct fuse_folio_desc *folio_descs, + unsigned int num_folios, + unsigned int total_len) { unsigned int i; unsigned int this_len; - if (ap->uses_folios) { - for (i = 0; i < ap->num_folios && total_len; i++) { - sg_init_table(&sg[i], 1); - this_len = min(ap->folio_descs[i].length, total_len); - sg_set_folio(&sg[i], ap->folios[i], this_len, - ap->folio_descs[i].offset); - total_len -= this_len; - } - } else { - for (i = 0; i < ap->num_pages && total_len; i++) { - sg_init_table(&sg[i], 1); - this_len = min(ap->descs[i].length, total_len); - sg_set_page(&sg[i], ap->pages[i], this_len, ap->descs[i].offset); - total_len -= this_len; - } + for (i = 0; i < num_folios && total_len; i++) { + sg_init_table(&sg[i], 1); + this_len = min(folio_descs[i].length, total_len); + sg_set_folio(&sg[i], folios[i], this_len, folio_descs[i].offset); + total_len -= this_len; } return i; @@ -1245,8 +1218,10 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg, sg_init_one(&sg[total_sgs++], argbuf, len); if (argpages) - total_sgs += sg_init_fuse_pages(&sg[total_sgs], ap, - args[numargs - 1].size); + total_sgs += sg_init_fuse_folios(&sg[total_sgs], + ap->folios, ap->descs, + ap->num_folios, + args[numargs - 1].size); if (len_used) *len_used = len; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> For the direct io case, the pages from userspace may be part of a huge folio, even if all folios in the page cache for fuse are small. Fix the logic for calculating the offset and length of the folio for the direct io case, which currently incorrectly assumes that all folios encountered are one page size. Fixes: 3b97c3652d91 ("fuse: convert direct io to use folios") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 5eb1571960f5..a1bf0ad5ffeb 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1476,18 +1476,22 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, nbytes += ret; - ret += start; - /* Currently, all folios in FUSE are one page */ - nfolios = DIV_ROUND_UP(ret, PAGE_SIZE); - - ap->descs[ap->num_folios].offset = start; - fuse_folio_descs_length_init(ap->descs, ap->num_folios, nfolios); - for (i = 0; i < nfolios; i++) - ap->folios[i + ap->num_folios] = page_folio(pages[i]); - - ap->num_folios += nfolios; - ap->descs[ap->num_folios - 1].length -= - (PAGE_SIZE - ret) & (PAGE_SIZE - 1); + nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE); + + for (i = 0; i < nfolios; i++) { + struct folio *folio = page_folio(pages[i]); + unsigned int offset = start + + (folio_page_idx(folio, pages[i]) << PAGE_SHIFT); + unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start); + + ap->descs[ap->num_folios].offset = offset; + ap->descs[ap->num_folios].length = len; + ap->folios[ap->num_folios] = folio; + start = 0; + ret -= len; + ap->num_folios++; + } + nr_pages += nfolios; } kfree(pages); -- 2.46.1

From: Bernd Schubert <bschubert@ddn.com> In fuse_get_user_pages(), set *nbytesp to 0 when struct page **pages allocation fails. This prevents the caller (fuse_direct_io) from making incorrect assumptions that could lead to NULL pointer dereferences when processing the request reply. Previously, *nbytesp was left unmodified on allocation failure, which could cause issues if the caller assumed pages had been added to ap->descs[] when they hadn't. Reported-by: syzbot+87b8e6ed25dbc41759f7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=87b8e6ed25dbc41759f7 Fixes: 3b97c3652d91 ("fuse: convert direct io to use folios") Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: Dmitry Antipov <dmantipov@yandex.ru> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/fuse/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a1bf0ad5ffeb..f25874a4b5de 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1460,8 +1460,10 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, */ struct page **pages = kzalloc(max_pages * sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; + if (!pages) { + ret = -ENOMEM; + goto out; + } while (nbytes < *nbytesp && nr_pages < max_pages) { unsigned nfolios, i; @@ -1507,6 +1509,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, else ap->args.out_pages = true; +out: *nbytesp = nbytes; return ret < 0 ? ret : 0; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Currently, all folios associated with fuse are one page size. As part of the work to enable large folios, this commit adds support for copying to/from folios larger than one page size. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/dev.c | 86 ++++++++++++++++++++++----------------------------- 1 file changed, 37 insertions(+), 49 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 875d7ecd2fde..377097a1d8c8 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -652,7 +652,7 @@ struct fuse_copy_state { struct page *pg; unsigned len; unsigned offset; - unsigned move_pages:1; + unsigned move_folios:1; }; static void fuse_copy_init(struct fuse_copy_state *cs, int write, @@ -791,10 +791,10 @@ static int fuse_check_folio(struct folio *folio) * folio that was originally in @pagep will lose a reference and the new * folio returned in @pagep will carry a reference. */ -static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) +static int fuse_try_move_folio(struct fuse_copy_state *cs, struct folio **foliop) { int err; - struct folio *oldfolio = page_folio(*pagep); + struct folio *oldfolio = *foliop; struct folio *newfolio; struct pipe_buffer *buf = cs->pipebufs; @@ -815,7 +815,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) cs->pipebufs++; cs->nr_segs--; - if (cs->len != PAGE_SIZE) + if (cs->len != folio_size(oldfolio)) goto out_fallback; if (!pipe_buf_try_steal(cs->pipe, buf)) @@ -861,7 +861,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) if (test_bit(FR_ABORTED, &cs->req->flags)) err = -ENOENT; else - *pagep = &newfolio->page; + *foliop = newfolio; spin_unlock(&cs->req->waitq.lock); if (err) { @@ -894,8 +894,8 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) goto out_put_old; } -static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, - unsigned offset, unsigned count) +static int fuse_ref_folio(struct fuse_copy_state *cs, struct folio *folio, + unsigned offset, unsigned count) { struct pipe_buffer *buf; int err; @@ -903,17 +903,17 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; - get_page(page); + folio_get(folio); err = unlock_request(cs->req); if (err) { - put_page(page); + folio_put(folio); return err; } fuse_copy_finish(cs); buf = cs->pipebufs; - buf->page = page; + buf->page = &folio->page; buf->offset = offset; buf->len = count; @@ -925,20 +925,21 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, } /* - * Copy a page in the request to/from the userspace buffer. Must be + * Copy a folio in the request to/from the userspace buffer. Must be * done atomically */ -static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, - unsigned offset, unsigned count, int zeroing) +static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop, + unsigned offset, unsigned count, int zeroing) { int err; - struct page *page = *pagep; + struct folio *folio = *foliop; + size_t size = folio_size(folio); - if (page && zeroing && count < PAGE_SIZE) - clear_highpage(page); + if (folio && zeroing && count < size) + folio_zero_range(folio, 0, size); while (count) { - if (cs->write && cs->pipebufs && page) { + if (cs->write && cs->pipebufs && folio) { /* * Can't control lifetime of pipe buffers, so always * copy user pages. @@ -948,12 +949,12 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, if (err) return err; } else { - return fuse_ref_page(cs, page, offset, count); + return fuse_ref_folio(cs, folio, offset, count); } } else if (!cs->len) { - if (cs->move_pages && page && - offset == 0 && count == PAGE_SIZE) { - err = fuse_try_move_page(cs, pagep); + if (cs->move_folios && folio && + offset == 0 && count == folio_size(folio)) { + err = fuse_try_move_folio(cs, foliop); if (err <= 0) return err; } else { @@ -962,22 +963,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, return err; } } - if (page) { - void *mapaddr = kmap_local_page(page); - void *buf = mapaddr + offset; + if (folio) { + void *mapaddr = kmap_local_folio(folio, offset); + void *buf = mapaddr; offset += fuse_copy_do(cs, &buf, &count); kunmap_local(mapaddr); } else offset += fuse_copy_do(cs, NULL, &count); } - if (page && !cs->write) - flush_dcache_page(page); + if (folio && !cs->write) + flush_dcache_folio(folio); return 0; } -/* Copy pages in the request to/from userspace buffer */ -static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, - int zeroing) +/* Copy folios in the request to/from userspace buffer */ +static int fuse_copy_folios(struct fuse_copy_state *cs, unsigned nbytes, + int zeroing) { unsigned i; struct fuse_req *req = cs->req; @@ -987,23 +988,12 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, int err; unsigned int offset = ap->descs[i].offset; unsigned int count = min(nbytes, ap->descs[i].length); - struct page *orig, *pagep; - - orig = pagep = &ap->folios[i]->page; - err = fuse_copy_page(cs, &pagep, offset, count, zeroing); + err = fuse_copy_folio(cs, &ap->folios[i], offset, count, zeroing); if (err) return err; nbytes -= count; - - /* - * fuse_copy_page may have moved a page from a pipe instead of - * copying into our given page, so update the folios if it was - * replaced. - */ - if (pagep != orig) - ap->folios[i] = page_folio(pagep); } return 0; } @@ -1033,7 +1023,7 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, for (i = 0; !err && i < numargs; i++) { struct fuse_arg *arg = &args[i]; if (i == numargs - 1 && argpages) - err = fuse_copy_pages(cs, arg->size, zeroing); + err = fuse_copy_folios(cs, arg->size, zeroing); else err = fuse_copy_one(cs, arg->value, arg->size); } @@ -1621,7 +1611,6 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, num = outarg.size; while (num) { struct folio *folio; - struct page *page; unsigned int this_num; folio = filemap_grab_folio(mapping, index); @@ -1629,9 +1618,8 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, if (IS_ERR(folio)) goto out_iput; - page = &folio->page; this_num = min_t(unsigned, num, folio_size(folio) - offset); - err = fuse_copy_page(cs, &page, offset, this_num, 0); + err = fuse_copy_folio(cs, &folio, offset, this_num, 0); if (!folio_test_uptodate(folio) && !err && offset == 0 && (this_num == folio_size(folio) || file_size == end)) { folio_zero_segment(folio, this_num, folio_size(folio)); @@ -1795,8 +1783,8 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size, static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code, unsigned int size, struct fuse_copy_state *cs) { - /* Don't try to move pages (yet) */ - cs->move_pages = 0; + /* Don't try to move folios (yet) */ + cs->move_folios = 0; switch (code) { case FUSE_NOTIFY_POLL: @@ -1934,7 +1922,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, spin_unlock(&fpq->lock); cs->req = req; if (!req->args->page_replace) - cs->move_pages = 0; + cs->move_folios = 0; if (oh.error) err = nbytes != sizeof(oh) ? -EINVAL : 0; @@ -2053,7 +2041,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, cs.pipe = pipe; if (flags & SPLICE_F_MOVE) - cs.move_pages = 1; + cs.move_folios = 1; ret = fuse_dev_do_write(fud, &cs, len); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for retrieves. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/dev.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 377097a1d8c8..a535e954f1ae 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1672,7 +1672,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, unsigned int num; unsigned int offset; size_t total_len = 0; - unsigned int num_pages, cur_pages = 0; + unsigned int num_pages; struct fuse_conn *fc = fm->fc; struct fuse_retrieve_args *ra; size_t args_size = sizeof(*ra); @@ -1690,6 +1690,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT; num_pages = min(num_pages, fc->max_pages); + num = min(num, num_pages << PAGE_SHIFT); args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0])); @@ -1710,25 +1711,29 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode, index = outarg->offset >> PAGE_SHIFT; - while (num && cur_pages < num_pages) { + while (num) { struct folio *folio; - unsigned int this_num; + unsigned int folio_offset; + unsigned int nr_bytes; + unsigned int nr_pages; folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) break; - this_num = min_t(unsigned, num, PAGE_SIZE - offset); + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset; + nr_bytes = min(folio_size(folio) - folio_offset, num); + nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + ap->folios[ap->num_folios] = folio; - ap->descs[ap->num_folios].offset = offset; - ap->descs[ap->num_folios].length = this_num; + ap->descs[ap->num_folios].offset = folio_offset; + ap->descs[ap->num_folios].length = nr_bytes; ap->num_folios++; - cur_pages++; offset = 0; - num -= this_num; - total_len += this_num; - index++; + num -= nr_bytes; + total_len += nr_bytes; + index += nr_pages; } ra->inarg.offset = outarg->offset; ra->inarg.size = total_len; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Refactor the logic in fuse_fill_write_pages() for copying out write data. This will make the future change for supporting large folios for writes easier. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f25874a4b5de..3c0d0eb33d1a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1215,21 +1215,21 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, struct fuse_args_pages *ap = &ia->ap; struct fuse_conn *fc = get_fuse_conn(mapping->host); unsigned offset = pos & (PAGE_SIZE - 1); - unsigned int nr_pages = 0; size_t count = 0; + unsigned int num; int err; + num = min(iov_iter_count(ii), fc->max_write); + num = min(num, max_pages << PAGE_SHIFT); + ap->args.in_pages = true; ap->descs[0].offset = offset; - do { + while (num) { size_t tmp; struct folio *folio; pgoff_t index = pos >> PAGE_SHIFT; - size_t bytes = min_t(size_t, PAGE_SIZE - offset, - iov_iter_count(ii)); - - bytes = min_t(size_t, bytes, fc->max_write - count); + unsigned int bytes = min(PAGE_SIZE - offset, num); again: err = -EFAULT; @@ -1259,10 +1259,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, ap->folios[ap->num_folios] = folio; ap->descs[ap->num_folios].length = tmp; ap->num_folios++; - nr_pages++; count += tmp; pos += tmp; + num -= tmp; offset += tmp; if (offset == PAGE_SIZE) offset = 0; @@ -1279,8 +1279,9 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, } if (!fc->big_writes) break; - } while (iov_iter_count(ii) && count < fc->max_write && - nr_pages < max_pages && offset == 0); + if (offset != 0) + break; + } return count > 0 ? count : err; } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for writethrough writes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3c0d0eb33d1a..fb35ee7b222a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1212,6 +1212,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, struct iov_iter *ii, loff_t pos, unsigned int max_pages) { + size_t max_folio_size = mapping_max_folio_size(mapping); struct fuse_args_pages *ap = &ia->ap; struct fuse_conn *fc = get_fuse_conn(mapping->host); unsigned offset = pos & (PAGE_SIZE - 1); @@ -1223,17 +1224,17 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, num = min(num, max_pages << PAGE_SHIFT); ap->args.in_pages = true; - ap->descs[0].offset = offset; while (num) { size_t tmp; struct folio *folio; pgoff_t index = pos >> PAGE_SHIFT; - unsigned int bytes = min(PAGE_SIZE - offset, num); + unsigned int bytes; + unsigned int folio_offset; again: err = -EFAULT; - if (fault_in_iov_iter_readable(ii, bytes)) + if (fault_in_iov_iter_readable(ii, max_folio_size) == max_folio_size) break; folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, @@ -1246,7 +1247,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, if (mapping_writably_mapped(mapping)) flush_dcache_folio(folio); - tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii); + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset; + bytes = min(folio_size(folio) - folio_offset, num); + + tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii); flush_dcache_folio(folio); if (!tmp) { @@ -1257,6 +1261,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, err = 0; ap->folios[ap->num_folios] = folio; + ap->descs[ap->num_folios].offset = folio_offset; ap->descs[ap->num_folios].length = tmp; ap->num_folios++; @@ -1264,11 +1269,11 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, pos += tmp; num -= tmp; offset += tmp; - if (offset == PAGE_SIZE) + if (offset == folio_size(folio)) offset = 0; - /* If we copied full page, mark it uptodate */ - if (tmp == PAGE_SIZE) + /* If we copied full folio, mark it uptodate */ + if (tmp == folio_size(folio)) folio_mark_uptodate(folio); if (folio_test_uptodate(folio)) { -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for folio reads into the page cache. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index fb35ee7b222a..2e69323e3e80 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -864,7 +864,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio) struct inode *inode = folio->mapping->host; struct fuse_mount *fm = get_fuse_mount(inode); loff_t pos = folio_pos(folio); - struct fuse_folio_desc desc = { .length = PAGE_SIZE }; + struct fuse_folio_desc desc = { .length = folio_size(folio) }; struct fuse_io_args ia = { .ap.args.page_zeroing = true, .ap.args.out_pages = true, -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Support large folios for symlinks and change the name from fuse_getlink_page() to fuse_getlink_folio(). Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/dir.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index b50a091b1dfb..f140b609e6a0 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1561,10 +1561,10 @@ static int fuse_permission(struct mnt_idmap *idmap, return err; } -static int fuse_readlink_page(struct inode *inode, struct folio *folio) +static int fuse_readlink_folio(struct inode *inode, struct folio *folio) { struct fuse_mount *fm = get_fuse_mount(inode); - struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 }; + struct fuse_folio_desc desc = { .length = folio_size(folio) - 1 }; struct fuse_args_pages ap = { .num_folios = 1, .folios = &folio, @@ -1619,7 +1619,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode, if (!folio) goto out_err; - err = fuse_readlink_page(inode, folio); + err = fuse_readlink_folio(inode, folio); if (err) { folio_put(folio); goto out_err; @@ -2172,7 +2172,7 @@ void fuse_init_dir(struct inode *inode) static int fuse_symlink_read_folio(struct file *null, struct folio *folio) { - int err = fuse_readlink_page(folio->mapping->host, folio); + int err = fuse_readlink_folio(folio->mapping->host, folio); if (!err) folio_mark_uptodate(folio); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for stores. Also change variable naming from "this_num" to "nr_bytes". Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/dev.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index a535e954f1ae..69033de18c90 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1611,18 +1611,23 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, num = outarg.size; while (num) { struct folio *folio; - unsigned int this_num; + unsigned int folio_offset; + unsigned int nr_bytes; + unsigned int nr_pages; folio = filemap_grab_folio(mapping, index); err = PTR_ERR(folio); if (IS_ERR(folio)) goto out_iput; - this_num = min_t(unsigned, num, folio_size(folio) - offset); - err = fuse_copy_folio(cs, &folio, offset, this_num, 0); + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset; + nr_bytes = min_t(unsigned, num, folio_size(folio) - folio_offset); + nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + + err = fuse_copy_folio(cs, &folio, folio_offset, nr_bytes, 0); if (!folio_test_uptodate(folio) && !err && offset == 0 && - (this_num == folio_size(folio) || file_size == end)) { - folio_zero_segment(folio, this_num, folio_size(folio)); + (nr_bytes == folio_size(folio) || file_size == end)) { + folio_zero_segment(folio, nr_bytes, folio_size(folio)); folio_mark_uptodate(folio); } folio_unlock(folio); @@ -1631,9 +1636,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, if (err) goto out_iput; - num -= this_num; + num -= nr_bytes; offset = 0; - index++; + index += nr_pages; } err = 0; -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for queued writes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2e69323e3e80..ceeb9a4a2036 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1800,11 +1800,14 @@ __acquires(fi->lock) { struct fuse_writepage_args *aux, *next; struct fuse_inode *fi = get_fuse_inode(wpa->inode); + struct fuse_args_pages *ap = &wpa->ia.ap; struct fuse_write_in *inarg = &wpa->ia.write.in; - struct fuse_args *args = &wpa->ia.ap.args; - /* Currently, all folios in FUSE are one page */ - __u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE; - int err; + struct fuse_args *args = &ap->args; + __u64 data_size = 0; + int err, i; + + for (i = 0; i < ap->num_folios; i++) + data_size += ap->descs[i].length; fi->writectr++; if (inarg->offset + data_size <= size) { -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Add support for folios larger than one page size for readahead. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ceeb9a4a2036..f1e4563cfa77 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -954,14 +954,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, fuse_io_free(ia); } -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file, + unsigned int count) { struct fuse_file *ff = file->private_data; struct fuse_mount *fm = ff->fm; struct fuse_args_pages *ap = &ia->ap; loff_t pos = folio_pos(ap->folios[0]); - /* Currently, all folios in FUSE are one page */ - size_t count = ap->num_folios << PAGE_SHIFT; ssize_t res; int err; @@ -999,6 +998,7 @@ static void fuse_readahead(struct readahead_control *rac) unsigned int max_pages, nr_pages; pgoff_t first = readahead_index(rac); pgoff_t last = first + readahead_count(rac) - 1; + struct folio *folio = NULL; if (fuse_is_bad(inode)) return; @@ -1022,8 +1022,8 @@ static void fuse_readahead(struct readahead_control *rac) while (nr_pages) { struct fuse_io_args *ia; struct fuse_args_pages *ap; - struct folio *folio; unsigned cur_pages = min(max_pages, nr_pages); + unsigned int pages = 0; if (fc->num_background >= fc->congestion_threshold && rac->ra->async_size >= readahead_count(rac)) @@ -1038,21 +1038,27 @@ static void fuse_readahead(struct readahead_control *rac) return; ap = &ia->ap; - while (ap->num_folios < cur_pages) { - /* - * This returns a folio with a ref held on it. - * The ref needs to be held until the request is - * completed, since the splice case (see - * fuse_try_move_page()) drops the ref after it's - * replaced in the page cache. - */ - folio = __readahead_folio(rac); + while (pages < cur_pages) { + unsigned int folio_pages; + + if (!folio) + folio = __readahead_folio(rac); + + folio_pages = folio_nr_pages(folio); + if (folio_pages > cur_pages - pages) { + if (!pages) + return; + break; + } + ap->folios[ap->num_folios] = folio; ap->descs[ap->num_folios].length = folio_size(folio); ap->num_folios++; + pages += folio_pages; + folio = NULL; } - fuse_send_readpages(ia, rac->file); - nr_pages -= cur_pages; + fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT); + nr_pages -= pages; } } -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Optimize processing folios larger than one page size for the direct io case. If contiguous pages are part of the same folio, collate the processing instead of processing each page in the folio separately. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> --- fs/fuse/file.c | 52 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 13 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f1e4563cfa77..23c09556b211 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1478,7 +1478,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, } while (nbytes < *nbytesp && nr_pages < max_pages) { - unsigned nfolios, i; + struct folio *prev_folio = NULL; + unsigned npages, i; size_t start; ret = iov_iter_extract_pages(ii, &pages, @@ -1490,23 +1491,48 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, nbytes += ret; - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE); + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE); - for (i = 0; i < nfolios; i++) { + /* + * We must check each extracted page. We can't assume every page + * in a large folio is used. For example, userspace may mmap() a + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of + * a large folio, in which case the extracted pages could be + * + * folio A page 0 + * folio A page 1 + * folio B page 0 + * folio A page 3 + * + * where folio A belongs to the file and folio B is an anonymous + * COW page. + */ + for (i = 0; i < npages && ret; i++) { struct folio *folio = page_folio(pages[i]); - unsigned int offset = start + - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT); - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start); + unsigned int offset; + unsigned int len; + + WARN_ON(!folio); + + len = min_t(unsigned int, ret, PAGE_SIZE - start); + + if (folio == prev_folio && pages[i] != pages[i - 1]) { + WARN_ON(ap->folios[ap->num_folios - 1] != folio); + ap->descs[ap->num_folios - 1].length += len; + WARN_ON(ap->descs[ap->num_folios - 1].length > folio_size(folio)); + } else { + offset = start + (folio_page_idx(folio, pages[i]) << PAGE_SHIFT); + ap->descs[ap->num_folios].offset = offset; + ap->descs[ap->num_folios].length = len; + ap->folios[ap->num_folios] = folio; + start = 0; + ap->num_folios++; + prev_folio = folio; + } - ap->descs[ap->num_folios].offset = offset; - ap->descs[ap->num_folios].length = len; - ap->folios[ap->num_folios] = folio; - start = 0; ret -= len; - ap->num_folios++; } - - nr_pages += nfolios; + nr_pages += npages; } kfree(pages); -- 2.46.1

From: Joanne Koong <joannelkoong@gmail.com> Introduce the capability to dynamically configure the max pages limit (FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators to dynamically set the maximum number of pages that can be used for servicing requests in fuse. Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set to 256 pages. One result of this is that the buffer size for a write request is limited to 1 MiB on a 4k-page system. The default value for this sysctl is the original limit (256 pages). $ sysctl -a | grep max_pages_limit fs.fuse.max_pages_limit = 256 $ sysctl -n fs.fuse.max_pages_limit 256 $ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit 1024 $ sysctl -n fs.fuse.max_pages_limit 1024 $ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit 65535 $ sysctl -n fs.fuse.max_pages_limit 65535 Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Conflicts: fs/fuse/Makefile fs/fuse/fuse_i.h fs/fuse/ioctl.c --- Documentation/admin-guide/sysctl/fs.rst | 10 +++++++ fs/fuse/Makefile | 1 + fs/fuse/fuse_i.h | 14 +++++++-- fs/fuse/inode.c | 11 ++++++- fs/fuse/ioctl.c | 2 ++ fs/fuse/sysctl.c | 40 +++++++++++++++++++++++++ 6 files changed, 74 insertions(+), 4 deletions(-) create mode 100644 fs/fuse/sysctl.c diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index a321b84eccaa..297228f9f299 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -332,3 +332,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit one. The current default value for ``max_user_watches`` is 4% of the available low memory, divided by the "watch" cost in bytes. + +5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems +===================================================================== + +This directory contains the following configuration options for FUSE +filesystems: + +``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for +setting/getting the maximum number of pages that can be used for servicing +requests in FUSE. diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index 0c48b35c058d..184c199e0487 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -9,5 +9,6 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o fuse-$(CONFIG_FUSE_DAX) += dax.o +fuse-$(CONFIG_SYSCTL) += sysctl.o virtiofs-y := virtio_fs.o diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 51e95eac47ff..3f4be114896a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -36,9 +36,6 @@ /** Default max number of pages that can be used in a single read request */ #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 -/** Maximum of max_pages received in init_out */ -#define FUSE_MAX_MAX_PAGES 256 - /** Bias for fi->writectr, meaning new writepages must not be sent */ #define FUSE_NOWRITE INT_MIN @@ -48,6 +45,9 @@ /** Number of dentries for each connection in the control filesystem */ #define FUSE_CTL_NUM_DENTRIES 5 +/** Maximum of max_pages received in init_out */ +extern unsigned int fuse_max_pages_limit; + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -1372,4 +1372,12 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, void fuse_file_release(struct inode *inode, struct fuse_file *ff, unsigned int open_flags, fl_owner_t id, bool isdir); +#ifdef CONFIG_SYSCTL +extern int fuse_sysctl_register(void); +extern void fuse_sysctl_unregister(void); +#else +#define fuse_sysctl_register() (0) +#define fuse_sysctl_unregister() do { } while (0) +#endif /* CONFIG_SYSCTL */ + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 735abf426a06..c83513a02f89 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -35,6 +35,8 @@ DEFINE_MUTEX(fuse_mutex); static int set_global_limit(const char *val, const struct kernel_param *kp); +unsigned int fuse_max_pages_limit = 256; + unsigned max_user_bgreq; module_param_call(max_user_bgreq, set_global_limit, param_get_uint, &max_user_bgreq, 0644); @@ -944,7 +946,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; - fc->max_pages_limit = FUSE_MAX_MAX_PAGES; + fc->max_pages_limit = fuse_max_pages_limit; INIT_LIST_HEAD(&fc->mounts); list_add(&fm->fc_entry, &fc->mounts); @@ -2014,8 +2016,14 @@ static int __init fuse_fs_init(void) if (err) goto out3; + err = fuse_sysctl_register(); + if (err) + goto out4; + return 0; + out4: + unregister_filesystem(&fuse_fs_type); out3: unregister_fuseblk(); out2: @@ -2026,6 +2034,7 @@ static int __init fuse_fs_init(void) static void fuse_fs_cleanup(void) { + fuse_sysctl_unregister(); unregister_filesystem(&fuse_fs_type); unregister_fuseblk(); diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c index 27115c618e94..33e9558395a3 100644 --- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -9,6 +9,8 @@ #include <linux/compat.h> #include <linux/fileattr.h> +#define FUSE_VERITY_ENABLE_ARG_MAX_PAGES 256 + static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args, struct fuse_ioctl_out *outarg) { diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c new file mode 100644 index 000000000000..b272bb333005 --- /dev/null +++ b/fs/fuse/sysctl.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * linux/fs/fuse/fuse_sysctl.c + * + * Sysctl interface to fuse parameters + */ +#include <linux/sysctl.h> + +#include "fuse_i.h" + +static struct ctl_table_header *fuse_table_header; + +/* Bound by fuse_init_out max_pages, which is a u16 */ +static unsigned int sysctl_fuse_max_pages_limit = 65535; + +static struct ctl_table fuse_sysctl_table[] = { + { + .procname = "max_pages_limit", + .data = &fuse_max_pages_limit, + .maxlen = sizeof(fuse_max_pages_limit), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ONE, + .extra2 = &sysctl_fuse_max_pages_limit, + }, +}; + +int fuse_sysctl_register(void) +{ + fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table); + if (!fuse_table_header) + return -ENOMEM; + return 0; +} + +void fuse_sysctl_unregister(void) +{ + unregister_sysctl_table(fuse_table_header); + fuse_table_header = NULL; +} -- 2.46.1

Avoid filling more pages than max_pages in fuse_fill_write_pages(), which leads to data loss. This causes the fuse xfstests generic/522 test case to fail. Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/fuse/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 23c09556b211..989e19bdc3a0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1222,6 +1222,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, struct fuse_args_pages *ap = &ia->ap; struct fuse_conn *fc = get_fuse_conn(mapping->host); unsigned offset = pos & (PAGE_SIZE - 1); + unsigned int nr_pages = 0; size_t count = 0; unsigned int num; int err; @@ -1270,6 +1271,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, ap->descs[ap->num_folios].offset = folio_offset; ap->descs[ap->num_folios].length = tmp; ap->num_folios++; + nr_pages += 1 << folio_order(folio); count += tmp; pos += tmp; @@ -1292,6 +1294,8 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, break; if (offset != 0) break; + if (nr_pages >= max_pages) + break; } return count > 0 ? count : err; -- 2.46.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/15457 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/JDH... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/15457 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/JDH...
participants (2)
-
Baokun Li
-
patchwork bot