From: Josef Bacik josef@toxicpanda.com
stable inclusion from stable-v6.6.24 commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q8ZZ CVE: CVE-2024-35784
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
commit b0ad381fa7690244802aed119b478b4bdafc31dd upstream.
While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock.
This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: fs/btrfs/extent_io.c [The key to fixing the patch is the fiemap_process_hole function, which locks only before querying delalloc. Earlier versions do not have this function, and adaptation requires a lot of refactoring patches. So do something similar directly in the get_extent_skip_holes function, which contains the logic to query delalloc.] Signed-off-by: Zizhi Wo wozizhi@huawei.com --- fs/btrfs/extent_io.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 685a375bb6af..0d7843323930 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4532,11 +4532,30 @@ static struct extent_map *get_extent_skip_holes(struct btrfs_inode *inode, return NULL;
while (1) { + struct extent_state *cached_state = NULL; + u64 lockstart; + u64 lockend; + len = last - offset; if (len == 0) break; len = ALIGN(len, sectorsize); + lockstart = round_down(offset, sectorsize); + lockend = round_up(offset + len, sectorsize); + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); em = btrfs_get_extent_fiemap(inode, offset, len); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (IS_ERR_OR_NULL(em)) return em;
@@ -4679,7 +4698,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 isize = i_size_read(&inode->vfs_inode); struct btrfs_key found_key; struct extent_map *em = NULL; - struct extent_state *cached_state = NULL; struct btrfs_path *path; struct btrfs_root *root = inode->root; struct fiemap_cache cache = { 0 }; @@ -4758,9 +4776,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, last_for_get_extent = isize; }
- lock_extent_bits(&inode->io_tree, start, start + len - 1, - &cached_state); - em = get_extent_skip_holes(inode, start, last_for_get_extent); if (!em) goto out; @@ -4871,9 +4886,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache); free_extent_map(em); out: - unlock_extent_cached(&inode->io_tree, start, start + len - 1, - &cached_state); - out_free_ulist: btrfs_free_path(path); ulist_free(roots);