From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.17-rc6 commit a9a4bc8c76d747aa40b30e2dfc176c781f353a08 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
After 963 iterations of generic/530, it deadlocked during recovery on a pinned inode cluster buffer like so:
XFS (pmem1): Starting recovery (logdev: internal) INFO: task kworker/8:0:306037 blocked for more than 122 seconds. Not tainted 5.17.0-rc6-dgc+ #975 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:0 state:D stack:13024 pid:306037 ppid: 2 flags:0x00004000 Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 schedule_timeout+0x114/0x160 __down+0x99/0xf0 down+0x5e/0x70 xfs_buf_lock+0x36/0xf0 xfs_buf_find+0x418/0x850 xfs_buf_get_map+0x47/0x380 xfs_buf_read_map+0x54/0x240 xfs_trans_read_buf_map+0x1bd/0x490 xfs_imap_to_bp+0x4f/0x70 xfs_iunlink_map_ino+0x66/0xd0 xfs_iunlink_map_prev.constprop.0+0x148/0x2f0 xfs_iunlink_remove_inode+0xf2/0x1d0 xfs_inactive_ifree+0x1a3/0x900 xfs_inode_unlink+0xcc/0x210 xfs_inodegc_worker+0x1ac/0x2f0 process_one_work+0x1ac/0x390 worker_thread+0x56/0x3c0 kthread+0xf6/0x120 ret_from_fork+0x1f/0x30 </TASK> task:mount state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000 Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 schedule_timeout+0x114/0x160 __down+0x99/0xf0 down+0x5e/0x70 xfs_buf_lock+0x36/0xf0 xfs_buf_find+0x418/0x850 xfs_buf_get_map+0x47/0x380 xfs_buf_read_map+0x54/0x240 xfs_trans_read_buf_map+0x1bd/0x490 xfs_imap_to_bp+0x4f/0x70 xfs_iget+0x300/0xb40 xlog_recover_process_one_iunlink+0x4c/0x170 xlog_recover_process_iunlinks.isra.0+0xee/0x130 xlog_recover_finish+0x57/0x110 xfs_log_mount_finish+0xfc/0x1e0 xfs_mountfs+0x540/0x910 xfs_fs_fill_super+0x495/0x850 get_tree_bdev+0x171/0x270 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x304/0xba0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> task:xfsaild/pmem1 state:D stack:14544 pid:324525 ppid: 2 flags:0x00004000 Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 io_schedule+0x4b/0x80 xfs_buf_wait_unpin+0x9e/0xf0 __xfs_buf_submit+0x14a/0x230 xfs_buf_delwri_submit_buffers+0x107/0x280 xfs_buf_delwri_submit_nowait+0x10/0x20 xfsaild+0x27e/0x9d0 kthread+0xf6/0x120 ret_from_fork+0x1f/0x30
We have the mount process waiting on an inode cluster buffer read, inodegc doing unlink waiting on the same inode cluster buffer, and the AIL push thread blocked in writeback waiting for the inode cluster buffer to become unpinned.
What has happened here is that the AIL push thread has raced with the inodegc process modifying, committing and pinning the inode cluster buffer here in xfs_buf_delwri_submit_buffers() here:
blk_start_plug(&plug); list_for_each_entry_safe(bp, n, buffer_list, b_list) { if (!wait_list) { if (xfs_buf_ispinned(bp)) { pinned++; continue; } Here >>>>>> if (!xfs_buf_trylock(bp)) continue;
Basically, the AIL has found the buffer wasn't pinned and got the lock without blocking, but then the buffer was pinned. This implies the processing here was pre-empted between the pin check and the lock, because the pin count can only be increased while holding the buffer locked. Hence when it has gone to submit the IO, it has blocked waiting for the buffer to be unpinned.
With all executing threads now waiting on the buffer to be unpinned, we normally get out of situations like this via the background log worker issuing a log force which will unpinned stuck buffers like this. But at this point in recovery, we haven't started the log worker. In fact, the first thing we do after processing intents and unlinked inodes is *start the log worker*. IOWs, we start it too late to have it break deadlocks like this.
Avoid this and any other similar deadlock vectors in intent and unlinked inode recovery by starting the log worker before we recover intents and unlinked inodes. This part of recovery runs as though the filesystem is fully active, so we really should have the same infrastructure running as we normally do at runtime.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_log.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 92c5d6ef47d6..e154c0d44f9c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -782,10 +782,9 @@ xfs_log_mount_finish( * mount failure occurs. */ mp->m_super->s_flags |= SB_ACTIVE; + xfs_log_work_queue(mp); if (xlog_recovery_needed(log)) error = xlog_recover_finish(log); - if (!error) - xfs_log_work_queue(mp); mp->m_super->s_flags &= ~SB_ACTIVE; evict_inodes(mp->m_super);
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.17-rc6 commit dbd0f5299302f8506637592e2373891a748c6990 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
AIL flushing can get stuck here:
[316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds. [316649.007807] Not tainted 5.17.0-rc6-dgc+ #975 [316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [316649.011720] task:xfsaild/pmem1 state:D stack:14544 pid:324525 ppid: 2 flags:0x00004000 [316649.014112] Call Trace: [316649.014841] <TASK> [316649.015492] __schedule+0x30d/0x9e0 [316649.017745] schedule+0x55/0xd0 [316649.018681] io_schedule+0x4b/0x80 [316649.019683] xfs_buf_wait_unpin+0x9e/0xf0 [316649.021850] __xfs_buf_submit+0x14a/0x230 [316649.023033] xfs_buf_delwri_submit_buffers+0x107/0x280 [316649.024511] xfs_buf_delwri_submit_nowait+0x10/0x20 [316649.025931] xfsaild+0x27e/0x9d0 [316649.028283] kthread+0xf6/0x120 [316649.030602] ret_from_fork+0x1f/0x30
in the situation where flushing gets preempted between the unpin check and the buffer trylock under nowait conditions:
blk_start_plug(&plug); list_for_each_entry_safe(bp, n, buffer_list, b_list) { if (!wait_list) { if (xfs_buf_ispinned(bp)) { pinned++; continue; } Here >>>>>> if (!xfs_buf_trylock(bp)) continue;
This means submission is stuck until something else triggers a log force to unpin the buffer.
To get onto the delwri list to begin with, the buffer pin state has already been checked, and hence it's relatively rare we get a race between flushing and encountering a pinned buffer in delwri submission to begin with. Further, to increase the pin count the buffer has to be locked, so the only way we can hit this race without failing the trylock is to be preempted between the pincount check seeing zero and the trylock being run.
Hence to avoid this problem, just invert the order of trylock vs pin check. We shouldn't hit that many pinned buffers here, so optimising away the trylock for pinned buffers should not matter for performance at all.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Chandan Babu R chandan.babu@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_buf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 21d1d3d14e3b..90a88bfc1f61 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -2149,12 +2149,13 @@ xfs_buf_delwri_submit_buffers( blk_start_plug(&plug); list_for_each_entry_safe(bp, n, buffer_list, b_list) { if (!wait_list) { + if (!xfs_buf_trylock(bp)) + continue; if (xfs_buf_ispinned(bp)) { + xfs_buf_unlock(bp); pinned++; continue; } - if (!xfs_buf_trylock(bp)) - continue; } else { xfs_buf_lock(bp); }
From: Dave Chinner dchinner@redhat.com
mainline inclusion from mainline-v5.17-rc6 commit 941fbdfd6dd0f1d7961c28123b5460912f678cb5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
xfs_ail_push_all_sync() has a loop like this:
while max_ail_lsn { prepare_to_wait(ail_empty) target = max_ail_lsn wake_up(ail_task); schedule() }
Which is designed to sleep until the AIL is emptied. When xfs_ail_update_finish() moves the tail of the log, it does:
if (list_empty(&ailp->ail_head)) wake_up_all(&ailp->ail_empty);
So it will only wake up the sync push waiter when the AIL goes empty. If, by the time the push waiter has woken, the AIL has more in it, it will reset the target, wake the push task and go back to sleep.
The problem here is that if the AIL is having items added to it when xfs_ail_push_all_sync() is called, then they may get inserted into the AIL at a LSN higher than the target LSN. At this point, xfsaild_push() will see that the target is X, the item LSNs are (X+N) and skip over them, hence never pushing the out.
The result of this the AIL will not get emptied by the AIL push thread, hence xfs_ail_finish_update() will never see the AIL being empty even if it moves the tail. Hence xfs_ail_push_all_sync() never gets woken and hence cannot update the push target to capture the items beyond the current target on the LSN.
This is a TOCTOU type of issue so the way to avoid it is to not use the push target at all for sync pushes. We know that a sync push is being requested by the fact the ail_empty wait queue is active, hence the xfsaild can just set the target to max_ail_lsn on every push that we see the wait queue active. Hence we no longer will leave items on the AIL that are beyond the LSN sampled at the start of a sync push.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Chandan Babu R chandan.babu@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_trans_ail.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 69aac416e2ce..241e39558bc1 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -448,10 +448,22 @@ xfsaild_push(
spin_lock(&ailp->ail_lock);
- /* barrier matches the ail_target update in xfs_ail_push() */ - smp_rmb(); - target = ailp->ail_target; - ailp->ail_target_prev = target; + /* + * If we have a sync push waiter, we always have to push till the AIL is + * empty. Update the target to point to the end of the AIL so that + * capture updates that occur after the sync push waiter has gone to + * sleep. + */ + if (waitqueue_active(&ailp->ail_empty)) { + lip = xfs_ail_max(ailp); + if (lip) + target = lip->li_lsn; + } else { + /* barrier matches the ail_target update in xfs_ail_push() */ + smp_rmb(); + target = ailp->ail_target; + ailp->ail_target_prev = target; + }
/* we're done if the AIL is empty or our push has reached the end */ lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); @@ -724,7 +736,6 @@ xfs_ail_push_all_sync( spin_lock(&ailp->ail_lock); while ((lip = xfs_ail_max(ailp)) != NULL) { prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE); - ailp->ail_target = lip->li_lsn; wake_up_process(ailp->ail_task); spin_unlock(&ailp->ail_lock); schedule();
From: Zheng Yongjun zhengyongjun3@huawei.com
mainline inclusion from mainline-v5.10-rc5 commit 1189686e5440041057f8cc21a7c1d13bb6642cb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Eric Sandeen sandeen@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_btree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 24c7d30e41df..e925946235ab 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -4075,7 +4075,7 @@ xfs_btree_delrec( * surviving block, and log it. */ xfs_btree_set_numrecs(left, lrecs + rrecs); - xfs_btree_get_sibling(cur, right, &cptr, XFS_BB_RIGHTSIB), + xfs_btree_get_sibling(cur, right, &cptr, XFS_BB_RIGHTSIB); xfs_btree_set_sibling(cur, left, &cptr, XFS_BB_RIGHTSIB); xfs_btree_log_block(cur, lbp, XFS_BB_NUMRECS | XFS_BB_RIGHTSIB);
From: Brian Foster bfoster@redhat.com
mainline inclusion from mainline-v5.11-rc4 commit 06058bc40534530e617e5623775c53bb24f032cb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Freed extents are marked busy from the point the freeing transaction commits until the associated CIL context is checkpointed to the log. This prevents reuse and overwrite of recently freed blocks before the changes are committed to disk, which can lead to corruption after a crash. The exception to this rule is that metadata allocation is allowed to reuse busy extents because metadata changes are also logged.
As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS has allowed modification or complete invalidation of outstanding busy extents for metadata allocations. This implementation assumes that use of the associated extent is imminent, which is not always the case. For example, the trimmed extent might not satisfy the minimum length of the allocation request, or the allocation algorithm might be involved in a search for the optimal result based on locality.
generic/019 reproduces a corruption caused by this scenario. First, a metadata block (usually a bmbt or symlink block) is freed from an inode. A subsequent bmbt split on an unrelated inode attempts a near mode allocation request that invalidates the busy block during the search, but does not ultimately allocate it. Due to the busy state invalidation, the block is no longer considered busy to subsequent allocation. A direct I/O write request immediately allocates the block and writes to it. Finally, the filesystem crashes while in a state where the initial metadata block free had not committed to the on-disk log. After recovery, the original metadata block is in its original location as expected, but has been corrupted by the aforementioned dio.
This demonstrates that it is fundamentally unsafe to modify busy extent state for extents that are not guaranteed to be allocated. This applies to pretty much all of the code paths that currently trim busy extents for one reason or another. Therefore to address this problem, drop the reuse mechanism from the busy extent trim path. This code already knows how to return partial non-busy ranges of the targeted free extent and higher level code tracks the busy state of the allocation attempt. If a block allocation fails where one or more candidate extents is busy, we force the log and retry the allocation.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandanrlinux@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_extent_busy.c | 14 -------------- 1 file changed, 14 deletions(-)
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 5c2695a42de1..a4075685d9eb 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -344,7 +344,6 @@ xfs_extent_busy_trim( ASSERT(*len > 0);
spin_lock(&args->pag->pagb_lock); -restart: fbno = *bno; flen = *len; rbp = args->pag->pagb_tree.rb_node; @@ -363,19 +362,6 @@ xfs_extent_busy_trim( continue; }
- /* - * If this is a metadata allocation, try to reuse the busy - * extent instead of trimming the allocation. - */ - if (!(args->datatype & XFS_ALLOC_USERDATA) && - !(busyp->flags & XFS_EXTENT_BUSY_DISCARDED)) { - if (!xfs_extent_busy_update_extent(args->mp, args->pag, - busyp, fbno, flen, - false)) - goto restart; - continue; - } - if (bbno <= fbno) { /* start overlap */
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit 732436ef916b4f338d672ea56accfdb11e8d0732 category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We're about to make this logic do a bit more, so convert the macro to a static inline function for better typechecking and fewer shouty macros. No functional changes here.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_bmap.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/libxfs/xfs_inode_fork.c fs/xfs/libxfs/xfs_inode_fork.h fs/xfs/scrub/bmap.c fs/xfs/scrub/symlink.c fs/xfs/xfs_inode.c fs/xfs/xfs_ioctl.c fs/xfs/xfs_qm.c fs/xfs/xfs_reflink.c
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr_leaf.c | 2 +- fs/xfs/libxfs/xfs_bmap.c | 70 +++++++++++++++--------------- fs/xfs/libxfs/xfs_bmap_btree.c | 8 ++-- fs/xfs/libxfs/xfs_btree.c | 4 +- fs/xfs/libxfs/xfs_dir2_block.c | 2 +- fs/xfs/libxfs/xfs_dir2_sf.c | 2 +- fs/xfs/libxfs/xfs_inode_fork.c | 16 +++---- fs/xfs/libxfs/xfs_inode_fork.h | 6 --- fs/xfs/libxfs/xfs_symlink_remote.c | 2 +- fs/xfs/scrub/bmap.c | 14 +++--- fs/xfs/scrub/dabtree.c | 2 +- fs/xfs/scrub/dir.c | 2 +- fs/xfs/scrub/quota.c | 2 +- fs/xfs/scrub/symlink.c | 2 +- fs/xfs/xfs_bmap_util.c | 4 +- fs/xfs/xfs_dir2_readdir.c | 2 +- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 6 +-- fs/xfs/xfs_inode.h | 18 ++++++++ fs/xfs/xfs_iomap.c | 4 +- fs/xfs/xfs_qm.c | 2 +- fs/xfs/xfs_reflink.c | 6 +-- 22 files changed, 95 insertions(+), 83 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index d043527d8409..8d5748d5eb58 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -1022,7 +1022,7 @@ xfs_attr_shortform_verify( int64_t size;
ASSERT(ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL); - ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK); + ifp = xfs_ifork_ptr(ip, XFS_ATTR_FORK); sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; size = ifp->if_bytes;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 97dbb8af9fa0..a19549e2920a 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -120,7 +120,7 @@ xfs_bmbt_lookup_first( */ static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
return whichfork != XFS_COW_FORK && ifp->if_format == XFS_DINODE_FMT_EXTENTS && @@ -132,7 +132,7 @@ static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork) */ static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
return whichfork != XFS_COW_FORK && ifp->if_format == XFS_DINODE_FMT_BTREE && @@ -318,7 +318,7 @@ xfs_bmap_check_leaf_extents( int whichfork) /* data or attr fork */ { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *block; /* current btree block */ xfs_fsblock_t bno; /* block # of "block" */ xfs_buf_t *bp; /* buffer for "block" */ @@ -587,7 +587,7 @@ xfs_bmap_btree_to_extents( int *logflagsp, /* inode logging flags */ int whichfork) /* data or attr fork */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_block *rblock = ifp->if_broot; struct xfs_btree_block *cblock;/* child btree block */ @@ -666,7 +666,7 @@ xfs_bmap_extents_to_btree(
mp = ip->i_mount; ASSERT(whichfork != XFS_COW_FORK); - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(ifp->if_format == XFS_DINODE_FMT_EXTENTS);
/* @@ -796,7 +796,7 @@ xfs_bmap_local_to_extents_empty( struct xfs_inode *ip, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
ASSERT(whichfork != XFS_COW_FORK); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL); @@ -838,7 +838,7 @@ xfs_bmap_local_to_extents( * So sending the data fork of a regular inode is invalid. */ ASSERT(!(S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK)); - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL);
if (!ifp->if_bytes) { @@ -1168,7 +1168,7 @@ xfs_iread_bmbt_block( xfs_extnum_t num_recs; xfs_extnum_t j; int whichfork = cur->bc_ino.whichfork; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
block = xfs_btree_get_block(cur, level, &bp);
@@ -1216,7 +1216,7 @@ xfs_iread_extents( int whichfork) { struct xfs_iread_state ir; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_cur *cur; int error; @@ -1264,7 +1264,7 @@ xfs_bmap_first_unused( xfs_fileoff_t *first_unused, /* unused block */ int whichfork) /* data or attr fork */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; struct xfs_iext_cursor icur; xfs_fileoff_t lastaddr = 0; @@ -1313,7 +1313,7 @@ xfs_bmap_last_before( xfs_fileoff_t *last_block, /* last block */ int whichfork) /* data or attr fork */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; struct xfs_iext_cursor icur; int error; @@ -1349,7 +1349,7 @@ xfs_bmap_last_extent( struct xfs_bmbt_irec *rec, int *is_empty) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_iext_cursor icur; int error;
@@ -1417,7 +1417,7 @@ xfs_bmap_last_offset( xfs_fileoff_t *last_block, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec rec; int is_empty; int error; @@ -1448,7 +1448,7 @@ xfs_bmap_one_block( struct xfs_inode *ip, /* incore inode */ int whichfork) /* data or attr fork */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); int rval; /* return value */ struct xfs_bmbt_irec s; /* internal version of extent */ struct xfs_iext_cursor icur; @@ -1483,7 +1483,7 @@ xfs_bmap_add_extent_delay_real( int whichfork) { struct xfs_mount *mp = bma->ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(bma->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); struct xfs_bmbt_irec *new = &bma->got; int error; /* error return value */ int i; /* temp state */ @@ -2049,7 +2049,7 @@ xfs_bmap_add_extent_unwritten_real( *logflagsp = 0;
cur = *curp; - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork);
ASSERT(!isnullstartblock(new->br_startblock));
@@ -2574,7 +2574,7 @@ xfs_bmap_add_extent_hole_delay( int state = xfs_bmap_fork_to_state(whichfork); xfs_filblks_t temp; /* temp for indirect calculations */
- ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(isnullstartblock(new->br_startblock));
/* @@ -2710,7 +2710,7 @@ xfs_bmap_add_extent_hole_real( int *logflagsp, int flags) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_cur *cur = *curp; int error; /* error return value */ @@ -3863,7 +3863,7 @@ xfs_bmapi_read( { struct xfs_mount *mp = ip->i_mount; int whichfork = xfs_bmapi_whichfork(flags); - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; xfs_fileoff_t obno; xfs_fileoff_t end; @@ -3958,7 +3958,7 @@ xfs_bmapi_reserve_delalloc( int eof) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_extlen_t alen; xfs_extlen_t indlen; int error; @@ -4081,7 +4081,7 @@ xfs_bmapi_allocate( { struct xfs_mount *mp = bma->ip->i_mount; int whichfork = xfs_bmapi_whichfork(bma->flags); - struct xfs_ifork *ifp = XFS_IFORK_PTR(bma->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); int tmp_logflags = 0; int error;
@@ -4175,7 +4175,7 @@ xfs_bmapi_convert_unwritten( int flags) { int whichfork = xfs_bmapi_whichfork(flags); - struct xfs_ifork *ifp = XFS_IFORK_PTR(bma->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); int tmp_logflags = 0; int error;
@@ -4252,7 +4252,7 @@ xfs_bmapi_minleft( struct xfs_inode *ip, int fork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, fork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, fork);
if (tp && tp->t_firstblock != NULLFSBLOCK) return 0; @@ -4273,7 +4273,7 @@ xfs_bmapi_finish( int whichfork, int error) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(bma->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork);
if ((bma->logflags & xfs_ilog_fext(whichfork)) && ifp->if_format != XFS_DINODE_FMT_EXTENTS) @@ -4312,7 +4312,7 @@ xfs_bmapi_write( }; struct xfs_mount *mp = ip->i_mount; int whichfork = xfs_bmapi_whichfork(flags); - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_fileoff_t end; /* end of mapped file region */ bool eof = false; /* after the end of extents */ int error; /* error return */ @@ -4495,7 +4495,7 @@ xfs_bmapi_convert_delalloc( struct iomap *iomap, unsigned int *seq) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); struct xfs_bmalloca bma = { NULL }; @@ -4629,7 +4629,7 @@ xfs_bmapi_remap( int whichfork = xfs_bmapi_whichfork(flags); int logflags = 0, error;
- ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(len > 0); ASSERT(len <= (xfs_filblks_t)MAXEXTLEN); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -4788,7 +4788,7 @@ xfs_bmap_del_extent_delay( struct xfs_bmbt_irec *del) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec new; int64_t da_old, da_new, da_diff = 0; xfs_fileoff_t del_endoff, got_endoff; @@ -4915,7 +4915,7 @@ xfs_bmap_del_extent_cow( struct xfs_bmbt_irec *del) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); struct xfs_bmbt_irec new; xfs_fileoff_t del_endoff, got_endoff; int state = BMAP_COWFORK; @@ -5013,7 +5013,7 @@ xfs_bmap_del_extent_real( mp = ip->i_mount; XFS_STATS_INC(mp, xs_del_exlist);
- ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(del->br_blockcount > 0); xfs_iext_get_extent(ifp, icur, &got); ASSERT(got.br_startoff <= del->br_startoff); @@ -5300,7 +5300,7 @@ __xfs_bunmapi(
whichfork = xfs_bmapi_whichfork(flags); ASSERT(whichfork != XFS_COW_FORK); - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp))) return -EFSCORRUPTED; if (XFS_FORCED_SHUTDOWN(mp)) @@ -5659,7 +5659,7 @@ xfs_bmse_merge( struct xfs_btree_cur *cur, int *logflags) /* output */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec new; xfs_filblks_t blockcount; int error, i; @@ -5780,7 +5780,7 @@ xfs_bmap_collapse_extents( { int whichfork = XFS_DATA_FORK; struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got, prev; struct xfs_iext_cursor icur; @@ -5897,7 +5897,7 @@ xfs_bmap_insert_extents( { int whichfork = XFS_DATA_FORK; struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got, next; struct xfs_iext_cursor icur; @@ -5999,7 +5999,7 @@ xfs_bmap_split_extent( xfs_fileoff_t split_fsb) { int whichfork = XFS_DATA_FORK; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got; struct xfs_bmbt_irec new; /* split extent */ diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index ecec604e6e4d..af79506d5f0f 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -302,7 +302,7 @@ xfs_bmbt_get_minrecs( if (level == cur->bc_nlevels - 1) { struct xfs_ifork *ifp;
- ifp = XFS_IFORK_PTR(cur->bc_ino.ip, + ifp = xfs_ifork_ptr(cur->bc_ino.ip, cur->bc_ino.whichfork);
return xfs_bmbt_maxrecs(cur->bc_mp, @@ -320,7 +320,7 @@ xfs_bmbt_get_maxrecs( if (level == cur->bc_nlevels - 1) { struct xfs_ifork *ifp;
- ifp = XFS_IFORK_PTR(cur->bc_ino.ip, + ifp = xfs_ifork_ptr(cur->bc_ino.ip, cur->bc_ino.whichfork);
return xfs_bmbt_maxrecs(cur->bc_mp, @@ -548,7 +548,7 @@ xfs_bmbt_init_cursor( struct xfs_inode *ip, /* inode owning the btree */ int whichfork) /* data or attr fork */ { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur; ASSERT(whichfork != XFS_COW_FORK);
@@ -636,7 +636,7 @@ xfs_bmbt_change_owner(
ASSERT(tp || buffer_list); ASSERT(!(tp && buffer_list)); - ASSERT(XFS_IFORK_PTR(ip, whichfork)->if_format == XFS_DINODE_FMT_BTREE); + ASSERT(xfs_ifork_ptr(ip, whichfork)->if_format == XFS_DINODE_FMT_BTREE);
cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork); if (!cur) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index e925946235ab..ea713f2e57b2 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -650,7 +650,7 @@ xfs_btree_ifork_ptr(
if (cur->bc_flags & XFS_BTREE_STAGING) return cur->bc_ino.ifake->if_fork; - return XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork); + return xfs_ifork_ptr(cur->bc_ino.ip, cur->bc_ino.whichfork); }
/* @@ -3475,7 +3475,7 @@ xfs_btree_kill_iroot( { int whichfork = cur->bc_ino.whichfork; struct xfs_inode *ip = cur->bc_ino.ip; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *block; struct xfs_btree_block *cblock; union xfs_btree_key *kp; diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c index 5b59d3f7746b..f0b33e5cd7a3 100644 --- a/fs/xfs/libxfs/xfs_dir2_block.c +++ b/fs/xfs/libxfs/xfs_dir2_block.c @@ -1071,7 +1071,7 @@ xfs_dir2_sf_to_block( struct xfs_trans *tp = args->trans; struct xfs_inode *dp = args->dp; struct xfs_mount *mp = dp->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(dp, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(dp, XFS_DATA_FORK); struct xfs_da_geometry *geo = args->geo; xfs_dir2_db_t blkno; /* dir-relative block # (0) */ xfs_dir2_data_hdr_t *hdr; /* block header */ diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c index 8c4f76bba88b..8c0ad7435b45 100644 --- a/fs/xfs/libxfs/xfs_dir2_sf.c +++ b/fs/xfs/libxfs/xfs_dir2_sf.c @@ -710,7 +710,7 @@ xfs_dir2_sf_verify( struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); struct xfs_dir2_sf_hdr *sfp; struct xfs_dir2_sf_entry *sfep; struct xfs_dir2_sf_entry *next_sfep; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 8d48716547e5..d7f77303cbd4 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -34,7 +34,7 @@ xfs_init_local_fork( const void *data, int64_t size) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); int mem_size = size, real_size = 0; bool zero_terminate;
@@ -104,7 +104,7 @@ xfs_iformat_extents( int whichfork) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); int state = xfs_bmap_fork_to_state(whichfork); int nex = XFS_DFORK_NEXTENTS(dip, whichfork); int size = nex * sizeof(xfs_bmbt_rec_t); @@ -176,7 +176,7 @@ xfs_iformat_btree( int size; int level;
- ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork); size = XFS_BMAP_BROOT_SPACE(mp, dfp); nrecs = be16_to_cpu(dfp->bb_numrecs); @@ -365,7 +365,7 @@ xfs_iroot_realloc( return; }
- ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); if (rec_diff > 0) { /* * If there wasn't any memory allocated before, just @@ -476,7 +476,7 @@ xfs_idata_realloc( int64_t byte_diff, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); int64_t new_size = ifp->if_bytes + byte_diff;
ASSERT(new_size >= 0); @@ -541,7 +541,7 @@ xfs_iextents_copy( int whichfork) { int state = xfs_bmap_fork_to_state(whichfork); - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_iext_cursor icur; struct xfs_bmbt_irec rec; int64_t copied = 0; @@ -593,7 +593,7 @@ xfs_iflush_fork(
if (!iip) return; - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); /* * This can happen if we gave up in iformat in an error path, * for the attribute fork. @@ -736,7 +736,7 @@ xfs_iext_count_may_overflow( int whichfork, int nr_to_add) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); uint64_t max_exts; uint64_t nr_exts;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 9e2137cd7372..dd2eeb31abdd 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -102,12 +102,6 @@ struct xfs_ifork { #define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0) #define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3))
-#define XFS_IFORK_PTR(ip,w) \ - ((w) == XFS_DATA_FORK ? \ - &(ip)->i_df : \ - ((w) == XFS_ATTR_FORK ? \ - (ip)->i_afp : \ - (ip)->i_cowfp)) #define XFS_IFORK_DSIZE(ip) \ (XFS_IFORK_Q(ip) ? XFS_IFORK_BOFF(ip) : XFS_LITINO((ip)->i_mount)) #define XFS_IFORK_ASIZE(ip) \ diff --git a/fs/xfs/libxfs/xfs_symlink_remote.c b/fs/xfs/libxfs/xfs_symlink_remote.c index 594bc447a7dd..1c1b39ac262e 100644 --- a/fs/xfs/libxfs/xfs_symlink_remote.c +++ b/fs/xfs/libxfs/xfs_symlink_remote.c @@ -204,7 +204,7 @@ xfs_failaddr_t xfs_symlink_shortform_verify( struct xfs_inode *ip) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); char *sfp = (char *)ifp->if_u1.if_data; int size = ifp->if_bytes; char *endp = sfp + size; diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 694af54e83c5..5ebee6aba503 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -398,7 +398,7 @@ xchk_bmapbt_rec( struct xfs_inode *ip = bs->cur->bc_ino.ip; struct xfs_buf *bp = NULL; struct xfs_btree_block *block; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, info->whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, info->whichfork); uint64_t owner; int i;
@@ -447,7 +447,7 @@ xchk_bmap_btree( struct xchk_bmap_info *info) { struct xfs_owner_info oinfo; - struct xfs_ifork *ifp = XFS_IFORK_PTR(sc->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); struct xfs_mount *mp = sc->mp; struct xfs_inode *ip = sc->ip; struct xfs_btree_cur *cur; @@ -499,7 +499,7 @@ xchk_bmap_check_rmap( return 0;
/* Now look up the bmbt record. */ - ifp = XFS_IFORK_PTR(sc->ip, sbcri->whichfork); + ifp = xfs_ifork_ptr(sc->ip, sbcri->whichfork); if (!ifp) { xchk_fblock_set_corrupt(sc, sbcri->whichfork, rec->rm_offset); @@ -587,7 +587,7 @@ xchk_bmap_check_rmaps( struct xfs_scrub *sc, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(sc->ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); xfs_agnumber_t agno; bool zero_size; int error; @@ -601,7 +601,7 @@ xchk_bmap_check_rmaps( if (XFS_IS_REALTIME_INODE(sc->ip) && whichfork == XFS_DATA_FORK) return 0;
- ASSERT(XFS_IFORK_PTR(sc->ip, whichfork) != NULL); + ASSERT(xfs_ifork_ptr(sc->ip, whichfork) != NULL);
/* * Only do this for complex maps that are in btree format, or for @@ -646,7 +646,7 @@ xchk_bmap( struct xchk_bmap_info info = { NULL }; struct xfs_mount *mp = sc->mp; struct xfs_inode *ip = sc->ip; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_fileoff_t endoff; struct xfs_iext_cursor icur; int error = 0; @@ -716,7 +716,7 @@ xchk_bmap(
/* Scrub extent records. */ info.lastoff = 0; - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork); for_each_xfs_iext(ifp, &icur, &irec) { if (xchk_should_terminate(sc, &error) || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) diff --git a/fs/xfs/scrub/dabtree.c b/fs/xfs/scrub/dabtree.c index 9f0dbb47c82c..9f631c3356f6 100644 --- a/fs/xfs/scrub/dabtree.c +++ b/fs/xfs/scrub/dabtree.c @@ -482,7 +482,7 @@ xchk_da_btree( int error;
/* Skip short format data structures; no btree to scan. */ - if (!xfs_ifork_has_extents(XFS_IFORK_PTR(sc->ip, whichfork))) + if (!xfs_ifork_has_extents(xfs_ifork_ptr(sc->ip, whichfork))) return 0;
/* Set up initial da state. */ diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index 178b3455a170..2ee70f38f294 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -663,7 +663,7 @@ xchk_directory_blocks( { struct xfs_bmbt_irec got; struct xfs_da_args args; - struct xfs_ifork *ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK); struct xfs_mount *mp = sc->mp; xfs_fileoff_t leaf_lblk; xfs_fileoff_t free_lblk; diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index ef50fde9b622..b5cfa75819d6 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -186,7 +186,7 @@ xchk_quota_data_fork(
/* Check for data fork problems that apply only to quota files. */ max_dqid_off = ((xfs_dqid_t)-1) / qi->qi_dqperchunk; - ifp = XFS_IFORK_PTR(sc->ip, XFS_DATA_FORK); + ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK); for_each_xfs_iext(ifp, &icur, &irec) { if (xchk_should_terminate(sc, &error)) break; diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c index c08be5ede066..3d5e79af872a 100644 --- a/fs/xfs/scrub/symlink.c +++ b/fs/xfs/scrub/symlink.c @@ -42,7 +42,7 @@ xchk_symlink(
if (!S_ISLNK(VFS_I(ip)->i_mode)) return -ENOENT; - ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); len = ip->i_d.di_size;
/* Plausible size? */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index df004890c2a3..7d444915b8f7 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -212,7 +212,7 @@ xfs_bmap_count_blocks( xfs_filblks_t *count) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur; xfs_extlen_t btblocks = 0; int error; @@ -397,7 +397,7 @@ xfs_getbmap( whichfork = XFS_COW_FORK; else whichfork = XFS_DATA_FORK; - ifp = XFS_IFORK_PTR(ip, whichfork); + ifp = xfs_ifork_ptr(ip, whichfork);
xfs_ilock(ip, XFS_IOLOCK_SHARED); switch (whichfork) { diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c index 66deddd5e296..60ee63c25f62 100644 --- a/fs/xfs/xfs_dir2_readdir.c +++ b/fs/xfs/xfs_dir2_readdir.c @@ -247,7 +247,7 @@ xfs_dir2_leaf_readbuf( struct xfs_inode *dp = args->dp; struct xfs_buf *bp = NULL; struct xfs_da_geometry *geo = args->geo; - struct xfs_ifork *ifp = XFS_IFORK_PTR(dp, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(dp, XFS_DATA_FORK); struct xfs_bmbt_irec map; struct blk_plug plug; xfs_dir2_off_t new_off; diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index cc1e8bd4ae51..aaedc28639d9 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1808,7 +1808,7 @@ xfs_check_delalloc( struct xfs_inode *ip, int whichfork) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; struct xfs_iext_cursor icur;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f8dfb83492cc..78f495046748 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1454,8 +1454,8 @@ xfs_itruncate_clear_reflink_flags(
if (!xfs_is_reflink_inode(ip)) return; - dfork = XFS_IFORK_PTR(ip, XFS_DATA_FORK); - cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK); + dfork = xfs_ifork_ptr(ip, XFS_DATA_FORK); + cfork = xfs_ifork_ptr(ip, XFS_COW_FORK); if (dfork->if_bytes == 0 && cfork->if_bytes == 0) ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; if (cfork->if_bytes == 0) @@ -1802,7 +1802,7 @@ xfs_inode_needs_inactive( struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *cow_ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + struct xfs_ifork *cow_ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
/* * If the inode is already free, then there can be nothing diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index ba0f57dd5392..5424fa756bf3 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -67,6 +67,24 @@ typedef struct xfs_inode { struct list_head i_ioend_list; } xfs_inode_t;
+static inline struct xfs_ifork * +xfs_ifork_ptr( + struct xfs_inode *ip, + int whichfork) +{ + switch (whichfork) { + case XFS_DATA_FORK: + return &ip->i_df; + case XFS_ATTR_FORK: + return ip->i_afp; + case XFS_COW_FORK: + return ip->i_cowfp; + default: + ASSERT(0); + return NULL; + } +} + /* Convert from vfs inode to xfs inode */ static inline struct xfs_inode *XFS_I(struct inode *inode) { diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 31c553a49241..186fba31de71 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -153,7 +153,7 @@ xfs_iomap_eof_align_last_fsb( struct xfs_inode *ip, xfs_fileoff_t end_fsb) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); xfs_extlen_t extsz = xfs_get_extsz_hint(ip); xfs_extlen_t align = xfs_eof_alignment(ip); struct xfs_bmbt_irec irec; @@ -361,7 +361,7 @@ xfs_iomap_prealloc_size( struct xfs_iext_cursor ncur = *icur; struct xfs_bmbt_irec prev, got; struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); int64_t freesp; xfs_fsblock_t qblocks; diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 0a6c8cc8e997..1015a3cf0cea 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1156,7 +1156,7 @@ xfs_qm_dqusage_adjust( ASSERT(ip->i_delayed_blks == 0);
if (XFS_IS_REALTIME_INODE(ip)) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
if (!(ifp->if_flags & XFS_IFEXTENTS)) { error = xfs_iread_extents(tp, ip, XFS_DATA_FORK); diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 5fc128bc7939..5d91c3c6de79 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -449,7 +449,7 @@ xfs_reflink_cancel_cow_blocks( xfs_fileoff_t end_fsb, bool cancel_real) { - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; int error = 0; @@ -590,7 +590,7 @@ xfs_reflink_end_cow_extent( struct xfs_iext_cursor icur; struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; - struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); xfs_filblks_t rlen; unsigned int resblks; int error; @@ -1394,7 +1394,7 @@ xfs_reflink_inode_has_shared_extents( bool found; int error;
- ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); + ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); if (!(ifp->if_flags & XFS_IFEXTENTS)) { error = xfs_iread_extents(tp, ip, XFS_DATA_FORK); if (error)
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit 2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1 category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Syzkaller reported a UAF bug a while back:
================================================================== BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted 5.15.0-0.30.3-20220406_1406 #3 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106 print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459 xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159 xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36 __vfs_getxattr+0xdf/0x13d fs/xattr.c:399 cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300 security_inode_need_killpriv+0x4c/0x97 security/security.c:1408 dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912 dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908 do_truncate+0xc3/0x1e0 fs/open.c:56 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 RIP: 0033:0x7f7ef4bb753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0 RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0 </TASK>
Allocated by task 2953: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467 kasan_slab_alloc include/linux/kasan.h:254 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3213 [inline] slab_alloc mm/slub.c:3221 [inline] kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226 kmem_cache_zalloc include/linux/slab.h:711 [inline] xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287 xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098 xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_setxattr+0x11b/0x177 fs/xattr.c:180 __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214 __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275 vfs_setxattr+0x154/0x33d fs/xattr.c:301 setxattr+0x216/0x29f fs/xattr.c:575 __do_sys_fsetxattr fs/xattr.c:632 [inline] __se_sys_fsetxattr fs/xattr.c:621 [inline] __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0
Freed by task 2949: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track+0x1c/0x21 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:1700 [inline] slab_free_freelist_hook mm/slub.c:1726 [inline] slab_free mm/slub.c:3492 [inline] kmem_cache_free+0xdc/0x3ce mm/slub.c:3508 xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773 xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822 xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413 xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684 xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_removexattr+0x106/0x16a fs/xattr.c:468 cap_inode_killpriv+0x24/0x47 security/commoncap.c:324 security_inode_killpriv+0x54/0xa1 security/security.c:1414 setattr_prepare+0x1a6/0x897 fs/attr.c:146 xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682 xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065 xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093 notify_change+0xae5/0x10a1 fs/attr.c:410 do_truncate+0x134/0x1e0 fs/open.c:64 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0
The buggy address belongs to the object at ffff88802cec9188 which belongs to the cache xfs_ifork of size 40 The buggy address is located 20 bytes inside of 40-byte region [ffff88802cec9188, ffff88802cec91b0) The buggy address belongs to the page: page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2cec9 flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80 raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
^ ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb ==================================================================
The root cause of this bug is the unlocked access to xfs_inode.i_afp from the getxattr code paths while trying to determine which ILOCK mode to use to stabilize the xattr data. Unfortunately, the VFS does not acquire i_rwsem when vfs_getxattr (or listxattr) call into the filesystem, which means that getxattr can race with a removexattr that's tearing down the attr fork and crash:
xfs_attr_set: xfs_attr_get: xfs_attr_fork_remove: xfs_ilock_attr_map_shared:
xfs_idestroy_fork(ip->i_afp); kmem_cache_free(xfs_ifork_cache, ip->i_afp);
if (ip->i_afp &&
ip->i_afp = NULL;
xfs_need_iread_extents(ip->i_afp)) <KABOOM>
ip->i_forkoff = 0;
Regrettably, the VFS is much more lax about i_rwsem and getxattr than is immediately obvious -- not only does it not guarantee that we hold i_rwsem, it actually doesn't guarantee that we *don't* hold it either. The getxattr system call won't acquire the lock before calling XFS, but the file capabilities code calls getxattr with and without i_rwsem held to determine if the "security.capabilities" xattr is set on the file.
Fixing the VFS locking requires a treewide investigation into every code path that could touch an xattr and what i_rwsem state it expects or sets up. That could take years or even prove impossible; fortunately, we can fix this UAF problem inside XFS.
An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to ensure that i_forkoff is always zeroed before i_afp is set to null and changed the read paths to use smp_rmb before accessing i_forkoff and i_afp, which avoided these UAF problems. However, the patch author was too busy dealing with other problems in the meantime, and by the time he came back to this issue, the situation had changed a bit.
On a modern system with selinux, each inode will always have at least one xattr for the selinux label, so it doesn't make much sense to keep incurring the extra pointer dereference. Furthermore, Allison's upcoming parent pointer patchset will also cause nearly every inode in the filesystem to have extended attributes. Therefore, make the inode attribute fork structure part of struct xfs_inode, at a cost of 40 more bytes.
This patch adds a clunky if_present field where necessary to maintain the existing logic of xattr fork null pointer testing in the existing codebase. The next patch switches the logic over to XFS_IFORK_Q and it all goes away.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_attr.c fs/xfs/libxfs/xfs_attr.h fs/xfs/libxfs/xfs_attr_leaf.c fs/xfs/libxfs/xfs_bmap.c fs/xfs/libxfs/xfs_inode_buf.c fs/xfs/libxfs/xfs_inode_fork.c fs/xfs/libxfs/xfs_inode_fork.h fs/xfs/xfs_attr_inactive.c fs/xfs/xfs_attr_list.c fs/xfs/xfs_icache.c fs/xfs/xfs_inode.c fs/xfs/xfs_inode.h fs/xfs/xfs_inode_item.c fs/xfs/xfs_itable.c
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr.c | 29 +++++++++++--------- fs/xfs/libxfs/xfs_attr_leaf.c | 25 +++++++++--------- fs/xfs/libxfs/xfs_bmap.c | 9 +++---- fs/xfs/libxfs/xfs_inode_buf.c | 6 ++--- fs/xfs/libxfs/xfs_inode_fork.c | 48 +++++++++++++++++++++++++--------- fs/xfs/libxfs/xfs_inode_fork.h | 5 ++++ fs/xfs/xfs_attr_inactive.c | 9 +++---- fs/xfs/xfs_attr_list.c | 10 +++---- fs/xfs/xfs_bmap_util.c | 8 +++--- fs/xfs/xfs_icache.c | 10 ++++--- fs/xfs/xfs_inode.c | 15 ++++++----- fs/xfs/xfs_inode.h | 6 +++-- fs/xfs/xfs_inode_item.c | 46 ++++++++++++++++---------------- fs/xfs/xfs_ioctl.c | 2 +- fs/xfs/xfs_iomap.c | 4 +-- fs/xfs/xfs_itable.c | 2 +- 16 files changed, 132 insertions(+), 102 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 5ce192d2a426..5dc6650f6513 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -63,9 +63,12 @@ int xfs_inode_hasattr( struct xfs_inode *ip) { - if (!XFS_IFORK_Q(ip) || - (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS && - ip->i_afp->if_nextents == 0)) + if (!XFS_IFORK_Q(ip)) + return 0; + if (!ip->i_af.if_present) + return 0; + if (ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS && + ip->i_af.if_nextents == 0) return 0; return 1; } @@ -87,7 +90,7 @@ xfs_attr_get_ilocked( if (!xfs_inode_hasattr(args->dp)) return -ENOATTR;
- if (args->dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) + if (args->dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) return xfs_attr_shortform_getvalue(args); if (xfs_bmap_one_block(args->dp, XFS_ATTR_FORK)) return xfs_attr_leaf_get(args); @@ -183,7 +186,7 @@ xfs_attr_try_sf_addname( /* * Build initial attribute list (if required). */ - if (dp->i_afp->if_format == XFS_DINODE_FMT_EXTENTS) + if (dp->i_af.if_format == XFS_DINODE_FMT_EXTENTS) xfs_attr_shortform_create(args);
error = xfs_attr_shortform_addname(args); @@ -211,9 +214,9 @@ static inline bool xfs_attr_is_shortform( struct xfs_inode *ip) { - return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL || - (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS && - ip->i_afp->if_nextents == 0); + return ip->i_af.if_format == XFS_DINODE_FMT_LOCAL || + (ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS && + ip->i_af.if_nextents == 0); }
/* @@ -342,8 +345,8 @@ xfs_has_attr( if (!xfs_inode_hasattr(dp)) return -ENOATTR;
- if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) { - ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); + if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) { + ASSERT(dp->i_af.if_flags & XFS_IFINLINE); return xfs_attr_sf_findname(args, NULL, NULL); }
@@ -371,8 +374,8 @@ xfs_attr_remove_args(
if (!xfs_inode_hasattr(dp)) { error = -ENOATTR; - } else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) { - ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); + } else if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) { + ASSERT(dp->i_af.if_flags & XFS_IFINLINE); error = xfs_attr_shortform_remove(args); } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) { error = xfs_attr_leaf_removename(args); @@ -527,7 +530,7 @@ static inline int xfs_attr_sf_totsize(struct xfs_inode *dp) { struct xfs_attr_shortform *sf;
- sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data; + sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data; return be16_to_cpu(sf->hdr.totsize); }
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 8d5748d5eb58..378767bdf833 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -645,7 +645,7 @@ xfs_attr_shortform_create( struct xfs_da_args *args) { struct xfs_inode *dp = args->dp; - struct xfs_ifork *ifp = dp->i_afp; + struct xfs_ifork *ifp = &dp->i_af; struct xfs_attr_sf_hdr *hdr;
trace_xfs_attr_sf_create(args); @@ -687,7 +687,7 @@ xfs_attr_sf_findname( int end; int i;
- sf = (struct xfs_attr_shortform *)args->dp->i_afp->if_u1.if_data; + sf = (struct xfs_attr_shortform *)args->dp->i_af.if_u1.if_data; sfe = &sf->list[0]; end = sf->hdr.count; for (i = 0; i < end; sfe = xfs_attr_sf_nextentry(sfe), @@ -732,7 +732,7 @@ xfs_attr_shortform_add( mp = dp->i_mount; dp->i_d.di_forkoff = forkoff;
- ifp = dp->i_afp; + ifp = &dp->i_af; ASSERT(ifp->if_flags & XFS_IFINLINE); sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data; if (xfs_attr_sf_findname(args, &sfe, NULL) == -EEXIST) @@ -765,11 +765,10 @@ xfs_attr_fork_remove( struct xfs_inode *ip, struct xfs_trans *tp) { - ASSERT(ip->i_afp->if_nextents == 0); + ASSERT(ip->i_af.if_nextents == 0);
- xfs_idestroy_fork(ip->i_afp); - kmem_cache_free(xfs_ifork_zone, ip->i_afp); - ip->i_afp = NULL; + xfs_idestroy_fork(&ip->i_af); + xfs_ifork_zap_attr(ip); ip->i_d.di_forkoff = 0; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } @@ -793,7 +792,7 @@ xfs_attr_shortform_remove(
dp = args->dp; mp = dp->i_mount; - sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data; + sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
error = xfs_attr_sf_findname(args, &sfe, &base); if (error != -EEXIST) @@ -850,7 +849,7 @@ xfs_attr_shortform_lookup(xfs_da_args_t *args)
trace_xfs_attr_sf_lookup(args);
- ifp = args->dp->i_afp; + ifp = &args->dp->i_af; ASSERT(ifp->if_flags & XFS_IFINLINE); sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data; sfe = &sf->list[0]; @@ -878,8 +877,8 @@ xfs_attr_shortform_getvalue( struct xfs_attr_sf_entry *sfe; int i;
- ASSERT(args->dp->i_afp->if_flags == XFS_IFINLINE); - sf = (struct xfs_attr_shortform *)args->dp->i_afp->if_u1.if_data; + ASSERT(args->dp->i_af.if_flags == XFS_IFINLINE); + sf = (struct xfs_attr_shortform *)args->dp->i_af.if_u1.if_data; sfe = &sf->list[0]; for (i = 0; i < sf->hdr.count; sfe = xfs_attr_sf_nextentry(sfe), i++) { @@ -913,7 +912,7 @@ xfs_attr_shortform_to_leaf( trace_xfs_attr_sf_to_leaf(args);
dp = args->dp; - ifp = dp->i_afp; + ifp = &dp->i_af; sf = (struct xfs_attr_shortform *)ifp->if_u1.if_data; size = be16_to_cpu(sf->hdr.totsize); tmpbuffer = kmem_alloc(size, 0); @@ -1021,7 +1020,7 @@ xfs_attr_shortform_verify( int i; int64_t size;
- ASSERT(ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL); + ASSERT(ip->i_af.if_format == XFS_DINODE_FMT_LOCAL); ifp = xfs_ifork_ptr(ip, XFS_ATTR_FORK); sfp = (struct xfs_attr_shortform *)ifp->if_u1.if_data; size = ifp->if_bytes; diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index a19549e2920a..eaee170fc990 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1089,13 +1089,10 @@ xfs_bmap_add_attrfork( error = xfs_bmap_set_attrforkoff(ip, size, &version); if (error) goto trans_cancel; - ASSERT(ip->i_afp == NULL); + ASSERT(!ip->i_af.if_present);
- ip->i_afp = kmem_cache_zalloc(xfs_ifork_zone, - GFP_KERNEL | __GFP_NOFAIL); - - ip->i_afp->if_format = XFS_DINODE_FMT_EXTENTS; - ip->i_afp->if_flags = XFS_IFEXTENTS; + xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); + ip->i_af.if_flags = XFS_IFEXTENTS; logflags = 0; switch (ip->i_df.if_format) { case XFS_DINODE_FMT_LOCAL: diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 37cf8909fada..fd354f847954 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -198,7 +198,7 @@ xfs_inode_from_disk( xfs_failaddr_t fa;
ASSERT(ip->i_cowfp == NULL); - ASSERT(ip->i_afp == NULL); + ASSERT(!ip->i_af.if_present);
fa = xfs_dinode_verify(ip->i_mount, ip->i_ino, from); if (fa) { @@ -327,9 +327,9 @@ xfs_inode_to_disk( to->di_nblocks = cpu_to_be64(from->di_nblocks); to->di_extsize = cpu_to_be32(from->di_extsize); to->di_nextents = cpu_to_be32(xfs_ifork_nextents(&ip->i_df)); - to->di_anextents = cpu_to_be16(xfs_ifork_nextents(ip->i_afp)); + to->di_anextents = cpu_to_be16(xfs_ifork_nextents(&ip->i_af)); to->di_forkoff = from->di_forkoff; - to->di_aformat = xfs_ifork_format(ip->i_afp); + to->di_aformat = xfs_ifork_format(&ip->i_af); to->di_dmevmask = cpu_to_be32(from->di_dmevmask); to->di_dmstate = cpu_to_be16(from->di_dmstate); to->di_flags = cpu_to_be16(from->di_flags); diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index d7f77303cbd4..1a2aa3f8308a 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -281,6 +281,32 @@ xfs_dfork_attr_shortform_size( return be16_to_cpu(atp->hdr.totsize); }
+void +xfs_ifork_init_attr( + struct xfs_inode *ip, + enum xfs_dinode_fmt format, + xfs_extnum_t nextents) +{ + ASSERT(!ip->i_af.if_present); + + ip->i_af.if_present = 1; + ip->i_af.if_format = format; + ip->i_af.if_nextents = nextents; +} + +void +xfs_ifork_zap_attr( + struct xfs_inode *ip) +{ + ASSERT(ip->i_af.if_present); + ASSERT(ip->i_af.if_broot == NULL); + ASSERT(ip->i_af.if_u1.if_data == NULL); + ASSERT(ip->i_af.if_height == 0); + + memset(&ip->i_af, 0, sizeof(struct xfs_ifork)); + ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; +} + int xfs_iformat_attr_fork( struct xfs_inode *ip, @@ -292,13 +318,10 @@ xfs_iformat_attr_fork( * Initialize the extent count early, as the per-format routines may * depend on it. */ - ip->i_afp = kmem_cache_zalloc(xfs_ifork_zone, GFP_NOFS | __GFP_NOFAIL); - ip->i_afp->if_format = dip->di_aformat; - if (unlikely(ip->i_afp->if_format == 0)) /* pre IRIX 6.2 file system */ - ip->i_afp->if_format = XFS_DINODE_FMT_EXTENTS; - ip->i_afp->if_nextents = be16_to_cpu(dip->di_anextents); + xfs_ifork_init_attr(ip, dip->di_aformat, + be16_to_cpu(dip->di_anextents));
- switch (ip->i_afp->if_format) { + switch (ip->i_af.if_format) { case XFS_DINODE_FMT_LOCAL: error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, xfs_dfork_attr_shortform_size(dip)); @@ -318,10 +341,8 @@ xfs_iformat_attr_fork( break; }
- if (error) { - kmem_cache_free(xfs_ifork_zone, ip->i_afp); - ip->i_afp = NULL; - } + if (error) + xfs_ifork_zap_attr(ip); return error; }
@@ -660,7 +681,7 @@ xfs_iext_state_to_fork( if (state & BMAP_COWFORK) return ip->i_cowfp; else if (state & BMAP_ATTRFORK) - return ip->i_afp; + return &ip->i_af; return &ip->i_df; }
@@ -676,6 +697,7 @@ xfs_ifork_init_cow(
ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_zone, GFP_NOFS | __GFP_NOFAIL); + ip->i_cowfp->if_present = 1; ip->i_cowfp->if_flags = XFS_IFEXTENTS; ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS; } @@ -712,10 +734,10 @@ int xfs_ifork_verify_local_attr( struct xfs_inode *ip) { - struct xfs_ifork *ifp = ip->i_afp; + struct xfs_ifork *ifp = &ip->i_af; xfs_failaddr_t fa;
- if (!ifp) + if (!ifp->if_present) fa = __this_address; else fa = xfs_attr_shortform_verify(ip); diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index dd2eeb31abdd..5c87ab4638fd 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -25,6 +25,7 @@ struct xfs_ifork { unsigned char if_flags; /* per-fork flags */ int8_t if_format; /* format of this fork */ xfs_extnum_t if_nextents; /* # of extents in this fork */ + int8_t if_present; /* 1 if present */ };
/* @@ -135,6 +136,10 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp) return ifp->if_format; }
+ +void xfs_ifork_zap_attr(struct xfs_inode *ip); +void xfs_ifork_init_attr(struct xfs_inode *ip, enum xfs_dinode_fmt format, + xfs_extnum_t nextents); struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
int xfs_iformat_data_fork(struct xfs_inode *, struct xfs_dinode *); diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 3e78091d4255..72cdf9b58dbd 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -365,7 +365,7 @@ xfs_attr_inactive( * removal below. */ if (xfs_inode_hasattr(dp) && - dp->i_afp->if_format != XFS_DINODE_FMT_LOCAL) { + dp->i_af.if_format != XFS_DINODE_FMT_LOCAL) { error = xfs_attr3_root_inactive(&trans, dp); if (error) goto out_cancel; @@ -386,10 +386,9 @@ xfs_attr_inactive( xfs_trans_cancel(trans); out_destroy_fork: /* kill the in-core attr fork before we drop the inode lock */ - if (dp->i_afp) { - xfs_idestroy_fork(dp->i_afp); - kmem_cache_free(xfs_ifork_zone, dp->i_afp); - dp->i_afp = NULL; + if (dp->i_af.if_present) { + xfs_idestroy_fork(&dp->i_af); + xfs_ifork_zap_attr(dp); } if (lock_mode) xfs_iunlock(dp, lock_mode); diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c index 8f8837fe21cf..391d9d4558ba 100644 --- a/fs/xfs/xfs_attr_list.c +++ b/fs/xfs/xfs_attr_list.c @@ -60,8 +60,8 @@ xfs_attr_shortform_list( int sbsize, nsbuf, count, i; int error = 0;
- ASSERT(dp->i_afp != NULL); - sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data; + ASSERT(dp->i_af.if_present); + sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data; ASSERT(sf != NULL); if (!sf->hdr.count) return 0; @@ -79,7 +79,7 @@ xfs_attr_shortform_list( */ if (context->bufsize == 0 || (XFS_ISRESET_CURSOR(cursor) && - (dp->i_afp->if_bytes + sf->hdr.count * 16) < context->bufsize)) { + (dp->i_af.if_bytes + sf->hdr.count * 16) < context->bufsize)) { for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) { if (XFS_IS_CORRUPT(context->dp->i_mount, !xfs_attr_namecheck(sfe->nameval, @@ -120,7 +120,7 @@ xfs_attr_shortform_list( for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) { if (unlikely( ((char *)sfe < (char *)sf) || - ((char *)sfe >= ((char *)sf + dp->i_afp->if_bytes)))) { + ((char *)sfe >= ((char *)sf + dp->i_af.if_bytes)))) { XFS_CORRUPTION_ERROR("xfs_attr_shortform_list", XFS_ERRLEVEL_LOW, context->dp->i_mount, sfe, @@ -512,7 +512,7 @@ xfs_attr_list_ilocked( */ if (!xfs_inode_hasattr(dp)) return 0; - if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) + if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) return xfs_attr_shortform_list(context); if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) return xfs_attr_leaf_list(context); diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 7d444915b8f7..fc883ee63b8d 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1439,15 +1439,15 @@ xfs_swap_extent_forks( /* * Count the number of extended attribute blocks */ - if (XFS_IFORK_Q(ip) && ip->i_afp->if_nextents > 0 && - ip->i_afp->if_format != XFS_DINODE_FMT_LOCAL) { + if (XFS_IFORK_Q(ip) && ip->i_af.if_nextents > 0 && + ip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &junk, &aforkblks); if (error) return error; } - if (XFS_IFORK_Q(tip) && tip->i_afp->if_nextents > 0 && - tip->i_afp->if_format != XFS_DINODE_FMT_LOCAL) { + if (XFS_IFORK_Q(tip) && tip->i_af.if_nextents > 0 && + tip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK, &junk, &taforkblks); if (error) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index aaedc28639d9..94dc0eddc6e6 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -105,9 +105,11 @@ xfs_inode_alloc( ip->i_ino = ino; ip->i_mount = mp; memset(&ip->i_imap, 0, sizeof(struct xfs_imap)); - ip->i_afp = NULL; ip->i_cowfp = NULL; + memset(&ip->i_af, 0, sizeof(ip->i_af)); + ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; memset(&ip->i_df, 0, sizeof(ip->i_df)); + ip->i_df.if_present = 1; ip->i_flags = 0; ip->i_delayed_blks = 0; memset(&ip->i_d, 0, sizeof(ip->i_d)); @@ -135,9 +137,9 @@ xfs_inode_free_callback( break; }
- if (ip->i_afp) { - xfs_idestroy_fork(ip->i_afp); - kmem_cache_free(xfs_ifork_zone, ip->i_afp); + if (ip->i_af.if_present) { + xfs_idestroy_fork(&ip->i_af); + xfs_ifork_zap_attr(ip); } if (ip->i_cowfp) { xfs_idestroy_fork(ip->i_cowfp); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 78f495046748..bac87fd204b7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -124,9 +124,9 @@ xfs_ilock_attr_map_shared( { uint lock_mode = XFS_ILOCK_SHARED;
- if (ip->i_afp && - ip->i_afp->if_format == XFS_DINODE_FMT_BTREE && - (ip->i_afp->if_flags & XFS_IFEXTENTS) == 0) + if (ip->i_af.if_present && + ip->i_af.if_format == XFS_DINODE_FMT_BTREE && + (ip->i_af.if_flags & XFS_IFEXTENTS) == 0) lock_mode = XFS_ILOCK_EXCL; xfs_ilock(ip, lock_mode); return lock_mode; @@ -1927,7 +1927,7 @@ xfs_inactive( goto out; }
- ASSERT(!ip->i_afp); + ASSERT(!ip->i_af.if_present); ASSERT(ip->i_d.di_forkoff == 0);
/* @@ -3607,13 +3607,13 @@ xfs_iflush( goto flush_out; } } - if (XFS_TEST_ERROR(ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp) > + if (XFS_TEST_ERROR(ip->i_df.if_nextents + xfs_ifork_nextents(&ip->i_af) > ip->i_d.di_nblocks, mp, XFS_ERRTAG_IFLUSH_5)) { xfs_alert_tag(mp, XFS_PTAG_IFLUSH, "%s: detected corrupt incore inode %Lu, " "total extents = %d, nblocks = %Ld, ptr "PTR_FMT, __func__, ip->i_ino, - ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp), + ip->i_df.if_nextents + xfs_ifork_nextents(&ip->i_af), ip->i_d.di_nblocks, ip); goto flush_out; } @@ -3644,7 +3644,8 @@ xfs_iflush( if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_data(ip)) goto flush_out; - if (ip->i_afp && ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL && + if (ip->i_af.if_present && + ip->i_af.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_attr(ip)) goto flush_out;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 5424fa756bf3..e4e5a8dda0f3 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -33,9 +33,9 @@ typedef struct xfs_inode { struct xfs_imap i_imap; /* location for xfs_imap() */
/* Extent information. */ - struct xfs_ifork *i_afp; /* attribute fork pointer */ struct xfs_ifork *i_cowfp; /* copy on write extents */ struct xfs_ifork i_df; /* data fork */ + struct xfs_ifork i_af; /* attribute fork */
/* Transaction and locking information. */ struct xfs_inode_log_item *i_itemp; /* logging information */ @@ -76,7 +76,9 @@ xfs_ifork_ptr( case XFS_DATA_FORK: return &ip->i_df; case XFS_ATTR_FORK: - return ip->i_afp; + if (!ip->i_af.if_present) + return NULL; + return &ip->i_af; case XFS_COW_FORK: return ip->i_cowfp; default: diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index fec0a75e8121..2d54498e0150 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -91,11 +91,11 @@ xfs_inode_item_attr_fork_size( { struct xfs_inode *ip = iip->ili_inode;
- switch (ip->i_afp->if_format) { + switch (ip->i_af.if_format) { case XFS_DINODE_FMT_EXTENTS: if ((iip->ili_fields & XFS_ILOG_AEXT) && - ip->i_afp->if_nextents > 0 && - ip->i_afp->if_bytes > 0) { + ip->i_af.if_nextents > 0 && + ip->i_af.if_bytes > 0) { /* worst case, doesn't subtract unused space */ *nbytes += XFS_IFORK_ASIZE(ip); *nvecs += 1; @@ -103,15 +103,15 @@ xfs_inode_item_attr_fork_size( break; case XFS_DINODE_FMT_BTREE: if ((iip->ili_fields & XFS_ILOG_ABROOT) && - ip->i_afp->if_broot_bytes > 0) { - *nbytes += ip->i_afp->if_broot_bytes; + ip->i_af.if_broot_bytes > 0) { + *nbytes += ip->i_af.if_broot_bytes; *nvecs += 1; } break; case XFS_DINODE_FMT_LOCAL: if ((iip->ili_fields & XFS_ILOG_ADATA) && - ip->i_afp->if_bytes > 0) { - *nbytes += roundup(ip->i_afp->if_bytes, 4); + ip->i_af.if_bytes > 0) { + *nbytes += roundup(ip->i_af.if_bytes, 4); *nvecs += 1; } break; @@ -241,18 +241,18 @@ xfs_inode_item_format_attr_fork( struct xfs_inode *ip = iip->ili_inode; size_t data_bytes;
- switch (ip->i_afp->if_format) { + switch (ip->i_af.if_format) { case XFS_DINODE_FMT_EXTENTS: iip->ili_fields &= ~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT);
if ((iip->ili_fields & XFS_ILOG_AEXT) && - ip->i_afp->if_nextents > 0 && - ip->i_afp->if_bytes > 0) { + ip->i_af.if_nextents > 0 && + ip->i_af.if_bytes > 0) { struct xfs_bmbt_rec *p;
- ASSERT(xfs_iext_count(ip->i_afp) == - ip->i_afp->if_nextents); + ASSERT(xfs_iext_count(&ip->i_af) == + ip->i_af.if_nextents);
p = xlog_prepare_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_EXT); data_bytes = xfs_iextents_copy(ip, p, XFS_ATTR_FORK); @@ -269,13 +269,13 @@ xfs_inode_item_format_attr_fork( ~(XFS_ILOG_ADATA | XFS_ILOG_AEXT);
if ((iip->ili_fields & XFS_ILOG_ABROOT) && - ip->i_afp->if_broot_bytes > 0) { - ASSERT(ip->i_afp->if_broot != NULL); + ip->i_af.if_broot_bytes > 0) { + ASSERT(ip->i_af.if_broot != NULL);
xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_BROOT, - ip->i_afp->if_broot, - ip->i_afp->if_broot_bytes); - ilf->ilf_asize = ip->i_afp->if_broot_bytes; + ip->i_af.if_broot, + ip->i_af.if_broot_bytes); + ilf->ilf_asize = ip->i_af.if_broot_bytes; ilf->ilf_size++; } else { iip->ili_fields &= ~XFS_ILOG_ABROOT; @@ -286,16 +286,16 @@ xfs_inode_item_format_attr_fork( ~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT);
if ((iip->ili_fields & XFS_ILOG_ADATA) && - ip->i_afp->if_bytes > 0) { + ip->i_af.if_bytes > 0) { /* * Round i_bytes up to a word boundary. * The underlying memory is guaranteed * to be there by xfs_idata_realloc(). */ - data_bytes = roundup(ip->i_afp->if_bytes, 4); - ASSERT(ip->i_afp->if_u1.if_data != NULL); + data_bytes = roundup(ip->i_af.if_bytes, 4); + ASSERT(ip->i_af.if_u1.if_data != NULL); xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL, - ip->i_afp->if_u1.if_data, + ip->i_af.if_u1.if_data, data_bytes); ilf->ilf_asize = (unsigned)data_bytes; ilf->ilf_size++; @@ -360,9 +360,9 @@ xfs_inode_to_log_dinode( to->di_nblocks = from->di_nblocks; to->di_extsize = from->di_extsize; to->di_nextents = xfs_ifork_nextents(&ip->i_df); - to->di_anextents = xfs_ifork_nextents(ip->i_afp); + to->di_anextents = xfs_ifork_nextents(&ip->i_af); to->di_forkoff = from->di_forkoff; - to->di_aformat = xfs_ifork_format(ip->i_afp); + to->di_aformat = xfs_ifork_format(&ip->i_af); to->di_dmevmask = from->di_dmevmask; to->di_dmstate = from->di_dmstate; to->di_flags = from->di_flags; diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index f1de12df880e..79cf806f4e3e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1113,7 +1113,7 @@ xfs_fill_fsxattr( bool attr, struct fsxattr *fa) { - struct xfs_ifork *ifp = attr ? ip->i_afp : &ip->i_df; + struct xfs_ifork *ifp = attr ? &ip->i_af : &ip->i_df;
simple_fill_fsxattr(fa, xfs_ip2xflags(ip)); fa->fsx_extsize = ip->i_d.di_extsize << ip->i_mount->m_sb.sb_blocklog; diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 186fba31de71..b371b67cc945 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1282,12 +1282,12 @@ xfs_xattr_iomap_begin( lockmode = xfs_ilock_attr_map_shared(ip);
/* if there are no attribute fork or extents, return ENOENT */ - if (!XFS_IFORK_Q(ip) || !ip->i_afp->if_nextents) { + if (!XFS_IFORK_Q(ip) || !ip->i_af.if_nextents) { error = -ENOENT; goto out_unlock; }
- ASSERT(ip->i_afp->if_format != XFS_DINODE_FMT_LOCAL); + ASSERT(ip->i_af.if_format != XFS_DINODE_FMT_LOCAL); error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, &nimaps, XFS_BMAPI_ATTRFORK); out_unlock: diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 3a1e45b64bfa..20f12506e9f4 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -107,7 +107,7 @@ xfs_bulkstat_one_int( buf->bs_extsize_blks = dic->di_extsize; buf->bs_extents = xfs_ifork_nextents(&ip->i_df); xfs_bulkstat_health(ip, buf); - buf->bs_aextents = xfs_ifork_nextents(ip->i_afp); + buf->bs_aextents = xfs_ifork_nextents(&ip->i_af); buf->bs_forkoff = XFS_IFORK_BOFF(ip); buf->bs_version = XFS_BULKSTAT_VERSION_V5;
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit e45d7cb2356e6b59fe64da28324025cc6fcd3fbd category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the attribute fork but i_forkoff is zero. This eliminates the ambiguity between i_forkoff and i_af.if_present, which should make it easier to understand the lifetime of attr forks.
While we're at it, remove the if_present checks around calls to xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr forks that have already been torn down.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_attr.h fs/xfs/libxfs/xfs_inode_fork.c fs/xfs/libxfs/xfs_inode_fork.h fs/xfs/xfs_icache.c fs/xfs/xfs_inode.c
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr.c | 2 -- fs/xfs/libxfs/xfs_bmap.c | 1 - fs/xfs/libxfs/xfs_inode_buf.c | 1 - fs/xfs/libxfs/xfs_inode_fork.c | 7 +------ fs/xfs/libxfs/xfs_inode_fork.h | 1 - fs/xfs/xfs_attr_inactive.c | 11 ++++------- fs/xfs/xfs_attr_list.c | 1 - fs/xfs/xfs_icache.c | 8 +++----- fs/xfs/xfs_inode.c | 5 ++--- fs/xfs/xfs_inode.h | 2 +- 10 files changed, 11 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 5dc6650f6513..cbb8bec5291f 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -65,8 +65,6 @@ xfs_inode_hasattr( { if (!XFS_IFORK_Q(ip)) return 0; - if (!ip->i_af.if_present) - return 0; if (ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS && ip->i_af.if_nextents == 0) return 0; diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index eaee170fc990..90f4b99291ed 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1089,7 +1089,6 @@ xfs_bmap_add_attrfork( error = xfs_bmap_set_attrforkoff(ip, size, &version); if (error) goto trans_cancel; - ASSERT(!ip->i_af.if_present);
xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); ip->i_af.if_flags = XFS_IFEXTENTS; diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index fd354f847954..012af77edf36 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -198,7 +198,6 @@ xfs_inode_from_disk( xfs_failaddr_t fa;
ASSERT(ip->i_cowfp == NULL); - ASSERT(!ip->i_af.if_present);
fa = xfs_dinode_verify(ip->i_mount, ip->i_ino, from); if (fa) { diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 1a2aa3f8308a..3460c3d8a959 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -287,9 +287,6 @@ xfs_ifork_init_attr( enum xfs_dinode_fmt format, xfs_extnum_t nextents) { - ASSERT(!ip->i_af.if_present); - - ip->i_af.if_present = 1; ip->i_af.if_format = format; ip->i_af.if_nextents = nextents; } @@ -298,7 +295,6 @@ void xfs_ifork_zap_attr( struct xfs_inode *ip) { - ASSERT(ip->i_af.if_present); ASSERT(ip->i_af.if_broot == NULL); ASSERT(ip->i_af.if_u1.if_data == NULL); ASSERT(ip->i_af.if_height == 0); @@ -697,7 +693,6 @@ xfs_ifork_init_cow(
ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_zone, GFP_NOFS | __GFP_NOFAIL); - ip->i_cowfp->if_present = 1; ip->i_cowfp->if_flags = XFS_IFEXTENTS; ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS; } @@ -737,7 +732,7 @@ xfs_ifork_verify_local_attr( struct xfs_ifork *ifp = &ip->i_af; xfs_failaddr_t fa;
- if (!ifp->if_present) + if (!XFS_IFORK_Q(ip)) fa = __this_address; else fa = xfs_attr_shortform_verify(ip); diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 5c87ab4638fd..8466923d450e 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -25,7 +25,6 @@ struct xfs_ifork { unsigned char if_flags; /* per-fork flags */ int8_t if_format; /* format of this fork */ xfs_extnum_t if_nextents; /* # of extents in this fork */ - int8_t if_present; /* 1 if present */ };
/* diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 72cdf9b58dbd..c882c734f7b5 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -360,12 +360,11 @@ xfs_attr_inactive(
/* * Invalidate and truncate the attribute fork extents. Make sure the - * fork actually has attributes as otherwise the invalidation has no + * fork actually has xattr blocks as otherwise the invalidation has no * blocks to read and returns an error. In this case, just do the fork * removal below. */ - if (xfs_inode_hasattr(dp) && - dp->i_af.if_format != XFS_DINODE_FMT_LOCAL) { + if (dp->i_af.if_nextents > 0) { error = xfs_attr3_root_inactive(&trans, dp); if (error) goto out_cancel; @@ -386,10 +385,8 @@ xfs_attr_inactive( xfs_trans_cancel(trans); out_destroy_fork: /* kill the in-core attr fork before we drop the inode lock */ - if (dp->i_af.if_present) { - xfs_idestroy_fork(&dp->i_af); - xfs_ifork_zap_attr(dp); - } + xfs_idestroy_fork(&dp->i_af); + xfs_ifork_zap_attr(dp); if (lock_mode) xfs_iunlock(dp, lock_mode); return error; diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c index 391d9d4558ba..b43e48fdf120 100644 --- a/fs/xfs/xfs_attr_list.c +++ b/fs/xfs/xfs_attr_list.c @@ -60,7 +60,6 @@ xfs_attr_shortform_list( int sbsize, nsbuf, count, i; int error = 0;
- ASSERT(dp->i_af.if_present); sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data; ASSERT(sf != NULL); if (!sf->hdr.count) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 94dc0eddc6e6..82708548f0c4 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -109,7 +109,6 @@ xfs_inode_alloc( memset(&ip->i_af, 0, sizeof(ip->i_af)); ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; memset(&ip->i_df, 0, sizeof(ip->i_df)); - ip->i_df.if_present = 1; ip->i_flags = 0; ip->i_delayed_blks = 0; memset(&ip->i_d, 0, sizeof(ip->i_d)); @@ -137,10 +136,9 @@ xfs_inode_free_callback( break; }
- if (ip->i_af.if_present) { - xfs_idestroy_fork(&ip->i_af); - xfs_ifork_zap_attr(ip); - } + xfs_idestroy_fork(&ip->i_af); + xfs_ifork_zap_attr(ip); + if (ip->i_cowfp) { xfs_idestroy_fork(ip->i_cowfp); kmem_cache_free(xfs_ifork_zone, ip->i_cowfp); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index bac87fd204b7..28db32ed0de6 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -124,7 +124,7 @@ xfs_ilock_attr_map_shared( { uint lock_mode = XFS_ILOCK_SHARED;
- if (ip->i_af.if_present && + if (XFS_IFORK_Q(ip) && ip->i_af.if_format == XFS_DINODE_FMT_BTREE && (ip->i_af.if_flags & XFS_IFEXTENTS) == 0) lock_mode = XFS_ILOCK_EXCL; @@ -1927,7 +1927,6 @@ xfs_inactive( goto out; }
- ASSERT(!ip->i_af.if_present); ASSERT(ip->i_d.di_forkoff == 0);
/* @@ -3644,7 +3643,7 @@ xfs_iflush( if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_data(ip)) goto flush_out; - if (ip->i_af.if_present && + if (XFS_IFORK_Q(ip) && ip->i_af.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_attr(ip)) goto flush_out; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index e4e5a8dda0f3..4f5b3764c4d9 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -76,7 +76,7 @@ xfs_ifork_ptr( case XFS_DATA_FORK: return &ip->i_df; case XFS_ATTR_FORK: - if (!ip->i_af.if_present) + if (!XFS_IFORK_Q(ip)) return NULL; return &ip->i_af; case XFS_COW_FORK:
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit 932b42c66cb5d0ca9800b128415b4ad6b1952b3e category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Replace this shouty macro with a real C function that has a more descriptive name.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_attr.h fs/xfs/libxfs/xfs_inode_fork.c fs/xfs/scrub/btree.c fs/xfs/xfs_inode.c
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr.c | 4 ++-- fs/xfs/libxfs/xfs_bmap.c | 4 ++-- fs/xfs/libxfs/xfs_inode_fork.c | 2 +- fs/xfs/libxfs/xfs_inode_fork.h | 5 ++--- fs/xfs/xfs_attr_inactive.c | 4 ++-- fs/xfs/xfs_bmap_util.c | 10 +++++----- fs/xfs/xfs_inode.c | 10 +++++----- fs/xfs/xfs_inode.h | 7 ++++++- fs/xfs/xfs_inode_item.c | 4 ++-- fs/xfs/xfs_iomap.c | 2 +- fs/xfs/xfs_iops.c | 2 +- 11 files changed, 29 insertions(+), 25 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index cbb8bec5291f..cc016776d21e 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -63,7 +63,7 @@ int xfs_inode_hasattr( struct xfs_inode *ip) { - if (!XFS_IFORK_Q(ip)) + if (!xfs_inode_has_attr_fork(ip)) return 0; if (ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS && ip->i_af.if_nextents == 0) @@ -428,7 +428,7 @@ xfs_attr_set( * If the inode doesn't have an attribute fork, add one. * (inode must not be locked when we call this routine) */ - if (XFS_IFORK_Q(dp) == 0) { + if (xfs_inode_has_attr_fork(dp) == 0) { int sf_size = sizeof(struct xfs_attr_sf_hdr) + xfs_attr_sf_entsize_byname(args->namelen, args->valuelen); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 90f4b99291ed..90ac1dd5d632 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1071,7 +1071,7 @@ xfs_bmap_add_attrfork( int logflags; /* logging flags */ int error; /* error return value */
- ASSERT(XFS_IFORK_Q(ip) == 0); + ASSERT(xfs_inode_has_attr_fork(ip) == 0);
mp = ip->i_mount; ASSERT(!XFS_NOT_DQATTACHED(mp, ip)); @@ -1082,7 +1082,7 @@ xfs_bmap_add_attrfork( rsvd, &tp); if (error) return error; - if (XFS_IFORK_Q(ip)) + if (xfs_inode_has_attr_fork(ip)) goto trans_cancel;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 3460c3d8a959..1c67421c1602 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -732,7 +732,7 @@ xfs_ifork_verify_local_attr( struct xfs_ifork *ifp = &ip->i_af; xfs_failaddr_t fa;
- if (!XFS_IFORK_Q(ip)) + if (!xfs_inode_has_attr_fork(ip)) fa = __this_address; else fa = xfs_attr_shortform_verify(ip); diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 8466923d450e..c54da38bde65 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -99,13 +99,12 @@ struct xfs_ifork { * Fork handling. */
-#define XFS_IFORK_Q(ip) ((ip)->i_d.di_forkoff != 0) #define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3))
#define XFS_IFORK_DSIZE(ip) \ - (XFS_IFORK_Q(ip) ? XFS_IFORK_BOFF(ip) : XFS_LITINO((ip)->i_mount)) + (xfs_inode_has_attr_fork(ip) ? XFS_IFORK_BOFF(ip) : XFS_LITINO((ip)->i_mount)) #define XFS_IFORK_ASIZE(ip) \ - (XFS_IFORK_Q(ip) ? XFS_LITINO((ip)->i_mount) - XFS_IFORK_BOFF(ip) : 0) + (xfs_inode_has_attr_fork(ip) ? XFS_LITINO((ip)->i_mount) - XFS_IFORK_BOFF(ip) : 0) #define XFS_IFORK_SIZE(ip,w) \ ((w) == XFS_DATA_FORK ? \ XFS_IFORK_DSIZE(ip) : \ diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index c882c734f7b5..10cc601b9e51 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -336,7 +336,7 @@ xfs_attr_inactive( ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
xfs_ilock(dp, lock_mode); - if (!XFS_IFORK_Q(dp)) + if (!xfs_inode_has_attr_fork(dp)) goto out_destroy_fork; xfs_iunlock(dp, lock_mode);
@@ -349,7 +349,7 @@ xfs_attr_inactive( lock_mode = XFS_ILOCK_EXCL; xfs_ilock(dp, lock_mode);
- if (!XFS_IFORK_Q(dp)) + if (!xfs_inode_has_attr_fork(dp)) goto out_cancel;
/* diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index fc883ee63b8d..48d1b5e5851a 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -402,7 +402,7 @@ xfs_getbmap( xfs_ilock(ip, XFS_IOLOCK_SHARED); switch (whichfork) { case XFS_ATTR_FORK: - if (!XFS_IFORK_Q(ip)) + if (!xfs_inode_has_attr_fork(ip)) goto out_unlock_iolock;
max_len = 1LL << 32; @@ -1259,7 +1259,7 @@ xfs_swap_extents_check_format( * extent format... */ if (tifp->if_format == XFS_DINODE_FMT_BTREE) { - if (XFS_IFORK_Q(ip) && + if (xfs_inode_has_attr_fork(ip) && XFS_BMAP_BMDR_SPACE(tifp->if_broot) > XFS_IFORK_BOFF(ip)) return -EINVAL; if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) @@ -1268,7 +1268,7 @@ xfs_swap_extents_check_format(
/* Reciprocal target->temp btree format checks */ if (ifp->if_format == XFS_DINODE_FMT_BTREE) { - if (XFS_IFORK_Q(tip) && + if (xfs_inode_has_attr_fork(tip) && XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > XFS_IFORK_BOFF(tip)) return -EINVAL; if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) @@ -1439,14 +1439,14 @@ xfs_swap_extent_forks( /* * Count the number of extended attribute blocks */ - if (XFS_IFORK_Q(ip) && ip->i_af.if_nextents > 0 && + if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_nextents > 0 && ip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &junk, &aforkblks); if (error) return error; } - if (XFS_IFORK_Q(tip) && tip->i_af.if_nextents > 0 && + if (xfs_inode_has_attr_fork(tip) && tip->i_af.if_nextents > 0 && tip->i_af.if_format != XFS_DINODE_FMT_LOCAL) { error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK, &junk, &taforkblks); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 28db32ed0de6..43edc7dbe6c5 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -124,7 +124,7 @@ xfs_ilock_attr_map_shared( { uint lock_mode = XFS_ILOCK_SHARED;
- if (XFS_IFORK_Q(ip) && + if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_format == XFS_DINODE_FMT_BTREE && (ip->i_af.if_flags & XFS_IFEXTENTS) == 0) lock_mode = XFS_ILOCK_EXCL; @@ -656,7 +656,7 @@ xfs_ip2xflags( { struct xfs_icdinode *dic = &ip->i_d;
- return _xfs_dic2xflags(dic->di_flags, dic->di_flags2, XFS_IFORK_Q(ip)); + return _xfs_dic2xflags(dic->di_flags, dic->di_flags2, xfs_inode_has_attr_fork(ip)); }
/* @@ -1921,7 +1921,7 @@ xfs_inactive( * now. The code calls a routine that recursively deconstructs the * attribute fork. If also blows away the in-core attribute fork. */ - if (XFS_IFORK_Q(ip)) { + if (xfs_inode_has_attr_fork(ip)) { error = xfs_attr_inactive(ip); if (error) goto out; @@ -3643,7 +3643,7 @@ xfs_iflush( if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_data(ip)) goto flush_out; - if (XFS_IFORK_Q(ip) && + if (xfs_inode_has_attr_fork(ip) && ip->i_af.if_format == XFS_DINODE_FMT_LOCAL && xfs_ifork_verify_local_attr(ip)) goto flush_out; @@ -3660,7 +3660,7 @@ xfs_iflush( ip->i_d.di_flushiter = 0;
xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK); - if (XFS_IFORK_Q(ip)) + if (xfs_inode_has_attr_fork(ip)) xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK);
/* diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 4f5b3764c4d9..74998e64ac83 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -67,6 +67,11 @@ typedef struct xfs_inode { struct list_head i_ioend_list; } xfs_inode_t;
+static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip) +{ + return (ip)->i_d.di_forkoff > 0; +} + static inline struct xfs_ifork * xfs_ifork_ptr( struct xfs_inode *ip, @@ -76,7 +81,7 @@ xfs_ifork_ptr( case XFS_DATA_FORK: return &ip->i_df; case XFS_ATTR_FORK: - if (!XFS_IFORK_Q(ip)) + if (!xfs_inode_has_attr_fork(ip)) return NULL; return &ip->i_af; case XFS_COW_FORK: diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 2d54498e0150..1f65de0c4436 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -142,7 +142,7 @@ xfs_inode_item_size( xfs_log_dinode_size(ip->i_mount);
xfs_inode_item_data_fork_size(iip, nvecs, nbytes); - if (XFS_IFORK_Q(ip)) + if (xfs_inode_has_attr_fork(ip)) xfs_inode_item_attr_fork_size(iip, nvecs, nbytes); }
@@ -449,7 +449,7 @@ xfs_inode_item_format(
xfs_inode_item_format_core(ip, lv, &vecp); xfs_inode_item_format_data_fork(iip, ilf, lv, &vecp); - if (XFS_IFORK_Q(ip)) { + if (xfs_inode_has_attr_fork(ip)) { xfs_inode_item_format_attr_fork(iip, ilf, lv, &vecp); } else { iip->ili_fields &= diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index b371b67cc945..564d26f9c325 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1282,7 +1282,7 @@ xfs_xattr_iomap_begin( lockmode = xfs_ilock_attr_map_shared(ip);
/* if there are no attribute fork or extents, return ENOENT */ - if (!XFS_IFORK_Q(ip) || !ip->i_af.if_nextents) { + if (!xfs_inode_has_attr_fork(ip) || !ip->i_af.if_nextents) { error = -ENOENT; goto out_unlock; } diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 2bcd5b4c7b73..88e814bb2476 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1341,7 +1341,7 @@ xfs_setup_inode( * If there is no attribute fork no ACL can exist on this inode, * and it can't have any file capabilities attached to it either. */ - if (!XFS_IFORK_Q(ip)) { + if (!xfs_inode_has_attr_fork(ip)) { inode_has_no_xattr(inode); cache_no_acl(inode); }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit c01147d929899f02a0a8b15e406d12784768ca72 category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Replace the shouty macros here with typechecked helper functions.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_attr_leaf.c fs/xfs/libxfs/xfs_bmap.c fs/xfs/libxfs/xfs_bmap_btree.c fs/xfs/libxfs/xfs_dir2.c fs/xfs/libxfs/xfs_inode_fork.h fs/xfs/scrub/symlink.c fs/xfs/xfs_itable.c fs/xfs/xfs_symlink.c fs/xfs/xfs_trace.h
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr_leaf.c | 3 ++- fs/xfs/libxfs/xfs_bmap.c | 6 +++--- fs/xfs/libxfs/xfs_bmap_btree.c | 2 +- fs/xfs/libxfs/xfs_dir2.c | 3 ++- fs/xfs/libxfs/xfs_dir2_block.c | 4 ++-- fs/xfs/libxfs/xfs_dir2_sf.c | 6 +++--- fs/xfs/libxfs/xfs_inode_fork.c | 10 +++++----- fs/xfs/libxfs/xfs_inode_fork.h | 15 +-------------- fs/xfs/scrub/symlink.c | 4 ++-- fs/xfs/xfs_bmap_util.c | 4 ++-- fs/xfs/xfs_inode.h | 35 ++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.c | 4 ++-- fs/xfs/xfs_itable.c | 2 +- fs/xfs/xfs_symlink.c | 2 +- fs/xfs/xfs_trace.h | 2 +- 15 files changed, 63 insertions(+), 39 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 378767bdf833..689f1100451c 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -559,7 +559,8 @@ xfs_attr_shortform_bytesfit( * to real extents, or the delalloc conversion will take care of the * literal area rebalancing. */ - if (bytes <= XFS_IFORK_ASIZE(dp)) + if (bytes <= xfs_inode_attr_fork_size(dp)) + return dp->i_d.di_forkoff;
/* diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 90ac1dd5d632..f7a6c212de7d 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -932,7 +932,7 @@ xfs_bmap_add_attrfork_btree( int stat; /* newroot status */
mp = ip->i_mount; - if (ip->i_df.if_broot_bytes <= XFS_IFORK_DSIZE(ip)) + if (ip->i_df.if_broot_bytes <= xfs_inode_data_fork_size(ip)) *flags |= XFS_ILOG_DBROOT; else { cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK); @@ -972,7 +972,7 @@ xfs_bmap_add_attrfork_extents( int error; /* error return value */
if (ip->i_df.if_nextents * sizeof(struct xfs_bmbt_rec) <= - XFS_IFORK_DSIZE(ip)) + xfs_inode_data_fork_size(ip)) return 0; cur = NULL; error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0, flags, @@ -1003,7 +1003,7 @@ xfs_bmap_add_attrfork_local( { struct xfs_da_args dargs; /* args for dir/attr code */
- if (ip->i_df.if_bytes <= XFS_IFORK_DSIZE(ip)) + if (ip->i_df.if_bytes <= xfs_inode_data_fork_size(ip)) return 0;
if (S_ISDIR(VFS_I(ip)->i_mode)) { diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index af79506d5f0f..c71741e2857c 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -566,7 +566,7 @@ xfs_bmbt_init_cursor( if (xfs_sb_version_hascrc(&mp->m_sb)) cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
- cur->bc_ino.forksize = XFS_IFORK_SIZE(ip, whichfork); + cur->bc_ino.forksize = xfs_inode_fork_size(ip, whichfork); cur->bc_ino.ip = ip; cur->bc_ino.allocated = 0; cur->bc_ino.flags = 0; diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index 612a9c5e41b1..74ccf7857e11 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -181,7 +181,8 @@ xfs_dir_isempty( ASSERT(S_ISDIR(VFS_I(dp)->i_mode)); if (dp->i_d.di_size == 0) /* might happen during shutdown. */ return 1; - if (dp->i_d.di_size > XFS_IFORK_DSIZE(dp)) + if (dp->i_d.di_size > xfs_inode_data_fork_size(dp)) + return 0; sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; return !sfp->count; diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c index f0b33e5cd7a3..56c079707dee 100644 --- a/fs/xfs/libxfs/xfs_dir2_block.c +++ b/fs/xfs/libxfs/xfs_dir2_block.c @@ -842,7 +842,7 @@ xfs_dir2_block_removename( * See if the size as a shortform is good enough. */ size = xfs_dir2_block_sfsize(dp, hdr, &sfh); - if (size > XFS_IFORK_DSIZE(dp)) + if (size > xfs_inode_data_fork_size(dp)) return 0;
/* @@ -1055,7 +1055,7 @@ xfs_dir2_leaf_to_block( * Now see if the resulting block can be shrunken to shortform. */ size = xfs_dir2_block_sfsize(dp, hdr, &sfh); - if (size > XFS_IFORK_DSIZE(dp)) + if (size > xfs_inode_data_fork_size(dp)) return 0;
return xfs_dir2_block_to_sf(args, dbp, size, &sfh); diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c index 8c0ad7435b45..affef6ae7b59 100644 --- a/fs/xfs/libxfs/xfs_dir2_sf.c +++ b/fs/xfs/libxfs/xfs_dir2_sf.c @@ -237,7 +237,7 @@ xfs_dir2_block_sfsize( (i8count ? /* inumber */ count * XFS_INO64_SIZE : count * XFS_INO32_SIZE); - if (size > XFS_IFORK_DSIZE(dp)) + if (size > xfs_inode_data_fork_size(dp)) return size; /* size value is a failure */ } /* @@ -406,7 +406,7 @@ xfs_dir2_sf_addname( * Won't fit as shortform any more (due to size), * or the pick routine says it won't (due to offset values). */ - if (new_isize > XFS_IFORK_DSIZE(dp) || + if (new_isize > xfs_inode_data_fork_size(dp) || (pick = xfs_dir2_sf_addname_pick(args, objchange, &sfep, &offset)) == 0) { /* @@ -1033,7 +1033,7 @@ xfs_dir2_sf_replace_needblock( newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF;
return inum > XFS_DIR2_MAX_SHORT_INUM && - sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp); + sfp->i8count == 0 && newsize > xfs_inode_data_fork_size(dp); }
/* diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 1c67421c1602..53b1b2547f6b 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -412,7 +412,7 @@ xfs_iroot_realloc( (int)new_size); ifp->if_broot_bytes = (int)new_size; ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= - XFS_IFORK_SIZE(ip, whichfork)); + xfs_inode_fork_size(ip, whichfork)); memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t)); return; } @@ -467,7 +467,7 @@ xfs_iroot_realloc( ifp->if_broot_bytes = (int)new_size; if (ifp->if_broot) ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= - XFS_IFORK_SIZE(ip, whichfork)); + xfs_inode_fork_size(ip, whichfork)); return; }
@@ -497,7 +497,7 @@ xfs_idata_realloc( int64_t new_size = ifp->if_bytes + byte_diff;
ASSERT(new_size >= 0); - ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork)); + ASSERT(new_size <= xfs_inode_fork_size(ip, whichfork));
if (byte_diff == 0) return; @@ -626,7 +626,7 @@ xfs_iflush_fork( if ((iip->ili_fields & dataflag[whichfork]) && (ifp->if_bytes > 0)) { ASSERT(ifp->if_u1.if_data != NULL); - ASSERT(ifp->if_bytes <= XFS_IFORK_SIZE(ip, whichfork)); + ASSERT(ifp->if_bytes <= xfs_inode_fork_size(ip, whichfork)); memcpy(cp, ifp->if_u1.if_data, ifp->if_bytes); } break; @@ -647,7 +647,7 @@ xfs_iflush_fork( (ifp->if_broot_bytes > 0)) { ASSERT(ifp->if_broot != NULL); ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= - XFS_IFORK_SIZE(ip, whichfork)); + xfs_inode_fork_size(ip, whichfork)); xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes, (xfs_bmdr_block_t *)cp, XFS_DFORK_SIZE(dip, mp, whichfork)); diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index c54da38bde65..64465472f0f0 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -98,21 +98,8 @@ struct xfs_ifork { /* * Fork handling. */ - -#define XFS_IFORK_BOFF(ip) ((int)((ip)->i_d.di_forkoff << 3)) - -#define XFS_IFORK_DSIZE(ip) \ - (xfs_inode_has_attr_fork(ip) ? XFS_IFORK_BOFF(ip) : XFS_LITINO((ip)->i_mount)) -#define XFS_IFORK_ASIZE(ip) \ - (xfs_inode_has_attr_fork(ip) ? XFS_LITINO((ip)->i_mount) - XFS_IFORK_BOFF(ip) : 0) -#define XFS_IFORK_SIZE(ip,w) \ - ((w) == XFS_DATA_FORK ? \ - XFS_IFORK_DSIZE(ip) : \ - ((w) == XFS_ATTR_FORK ? \ - XFS_IFORK_ASIZE(ip) : \ - 0)) #define XFS_IFORK_MAXEXT(ip, w) \ - (XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t)) + (xfs_inode_fork_size(ip, w) / sizeof(xfs_bmbt_rec_t))
static inline bool xfs_ifork_has_extents(struct xfs_ifork *ifp) { diff --git a/fs/xfs/scrub/symlink.c b/fs/xfs/scrub/symlink.c index 3d5e79af872a..289dda8d7840 100644 --- a/fs/xfs/scrub/symlink.c +++ b/fs/xfs/scrub/symlink.c @@ -53,8 +53,8 @@ xchk_symlink(
/* Inline symlink? */ if (ifp->if_flags & XFS_IFINLINE) { - if (len > XFS_IFORK_DSIZE(ip) || - len > strnlen(ifp->if_u1.if_data, XFS_IFORK_DSIZE(ip))) + if (len > xfs_inode_data_fork_size(ip) || + len > strnlen(ifp->if_u1.if_data, xfs_inode_data_fork_size(ip))) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); goto out; } diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 48d1b5e5851a..539660e62e11 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1260,7 +1260,7 @@ xfs_swap_extents_check_format( */ if (tifp->if_format == XFS_DINODE_FMT_BTREE) { if (xfs_inode_has_attr_fork(ip) && - XFS_BMAP_BMDR_SPACE(tifp->if_broot) > XFS_IFORK_BOFF(ip)) + XFS_BMAP_BMDR_SPACE(tifp->if_broot) > xfs_inode_fork_boff(ip)) return -EINVAL; if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) return -EINVAL; @@ -1269,7 +1269,7 @@ xfs_swap_extents_check_format( /* Reciprocal target->temp btree format checks */ if (ifp->if_format == XFS_DINODE_FMT_BTREE) { if (xfs_inode_has_attr_fork(tip) && - XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > XFS_IFORK_BOFF(tip)) + XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > xfs_inode_fork_boff(tip)) return -EINVAL; if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) return -EINVAL; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 74998e64ac83..b8a60cc0c251 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -92,6 +92,41 @@ xfs_ifork_ptr( } }
+static inline unsigned int xfs_inode_fork_boff(struct xfs_inode *ip) +{ + return (ip)->i_d.di_forkoff << 3; +} + +static inline unsigned int xfs_inode_data_fork_size(struct xfs_inode *ip) +{ + if (xfs_inode_has_attr_fork(ip)) + return xfs_inode_fork_boff(ip); + + return XFS_LITINO(ip->i_mount); +} + +static inline unsigned int xfs_inode_attr_fork_size(struct xfs_inode *ip) +{ + if (xfs_inode_has_attr_fork(ip)) + return XFS_LITINO(ip->i_mount) - xfs_inode_fork_boff(ip); + return 0; +} + +static inline unsigned int +xfs_inode_fork_size( + struct xfs_inode *ip, + int whichfork) +{ + switch (whichfork) { + case XFS_DATA_FORK: + return xfs_inode_data_fork_size(ip); + case XFS_ATTR_FORK: + return xfs_inode_attr_fork_size(ip); + default: + return 0; + } +} + /* Convert from vfs inode to xfs inode */ static inline struct xfs_inode *XFS_I(struct inode *inode) { diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 1f65de0c4436..cc206dac5908 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -56,7 +56,7 @@ xfs_inode_item_data_fork_size( ip->i_df.if_nextents > 0 && ip->i_df.if_bytes > 0) { /* worst case, doesn't subtract delalloc extents */ - *nbytes += XFS_IFORK_DSIZE(ip); + *nbytes += xfs_inode_data_fork_size(ip); *nvecs += 1; } break; @@ -97,7 +97,7 @@ xfs_inode_item_attr_fork_size( ip->i_af.if_nextents > 0 && ip->i_af.if_bytes > 0) { /* worst case, doesn't subtract unused space */ - *nbytes += XFS_IFORK_ASIZE(ip); + *nbytes += xfs_inode_attr_fork_size(ip); *nvecs += 1; } break; diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 20f12506e9f4..68c3a2b847ea 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -108,7 +108,7 @@ xfs_bulkstat_one_int( buf->bs_extents = xfs_ifork_nextents(&ip->i_df); xfs_bulkstat_health(ip, buf); buf->bs_aextents = xfs_ifork_nextents(&ip->i_af); - buf->bs_forkoff = XFS_IFORK_BOFF(ip); + buf->bs_forkoff = xfs_inode_fork_boff(ip); buf->bs_version = XFS_BULKSTAT_VERSION_V5;
if (xfs_sb_version_has_v3inode(&mp->m_sb)) { diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 7b9cdb1a41ff..447b0fd8f942 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -245,7 +245,7 @@ xfs_symlink( /* * If the symlink will fit into the inode, write it inline. */ - if (pathlen <= XFS_IFORK_DSIZE(ip)) { + if (pathlen <= xfs_inode_data_fork_size(ip)) { xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
ip->i_d.di_size = pathlen; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 3f4cc7e3915c..68fea812d190 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2124,7 +2124,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class, __entry->format = ip->i_df.if_format; __entry->nex = ip->i_df.if_nextents; __entry->broot_size = ip->i_df.if_broot_bytes; - __entry->fork_off = XFS_IFORK_BOFF(ip); + __entry->fork_off = xfs_inode_fork_boff(ip); ), TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %d, " "broot size %d, fork offset %d",
From: Dan Carpenter dan.carpenter@oracle.com
mainline inclusion from mainline-v5.19-rc5 commit 3f52e016af600982989b5dee958d313c52483c92 category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
These NULL check are no long needed after commit 2ed5b09b3e8f ("xfs: make inode attribute forks a permanent part of struct xfs_inode").
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_inode_fork.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 53b1b2547f6b..520448e5025d 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -739,8 +739,7 @@ xfs_ifork_verify_local_attr(
if (fa) { xfs_inode_verifier_error(ip, -EFSCORRUPTED, "attr fork", - ifp ? ifp->if_u1.if_data : NULL, - ifp ? ifp->if_bytes : 0, fa); + ifp->if_u1.if_data, ifp->if_bytes, fa); return -EFSCORRUPTED; }
From: "Darrick J. Wong" djwong@kernel.org
mainline inclusion from mainline-v5.19-rc5 commit c78c2d0903183a41beb90c56a923e30f90fa91b9 category: bugfix bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
I observed the following evidence of a memory leak while running xfs/399 from the xfs fsck test suite (edited for brevity):
XFS (sde): Metadata corruption detected at xfs_attr_shortform_verify_struct.part.0+0x7b/0xb0 [xfs], inode 0x1172 attr fork XFS: Assertion failed: ip->i_af.if_u1.if_data == NULL, file: fs/xfs/libxfs/xfs_inode_fork.c, line: 315 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 91635 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 2 PID: 91635 Comm: xfs_scrub Tainted: G W 5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Call Trace: <TASK> xfs_ifork_zap_attr+0x7c/0xb0 xfs_iformat_attr_fork+0x86/0x110 xfs_inode_from_disk+0x41d/0x480 xfs_iget+0x389/0xd70 xfs_bulkstat_one_int+0x5b/0x540 xfs_bulkstat_iwalk+0x1e/0x30 xfs_iwalk_ag_recs+0xd1/0x160 xfs_iwalk_run_callbacks+0xb9/0x180 xfs_iwalk_ag+0x1d8/0x2e0 xfs_iwalk+0x141/0x220 xfs_bulkstat+0x105/0x180 xfs_ioc_bulkstat.constprop.0.isra.0+0xc5/0x130 xfs_file_ioctl+0xa5f/0xef0 __x64_sys_ioctl+0x82/0xa0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
This newly-added assertion checks that there aren't any incore data structures hanging off the incore fork when we're trying to reset its contents. From the call trace, it is evident that iget was trying to construct an incore inode from the ondisk inode, but the attr fork verifier failed and we were trying to undo all the memory allocations that we had done earlier.
The three assertions in xfs_ifork_zap_attr check that the caller has already called xfs_idestroy_fork, which clearly has not been done here. As the zap function then zeroes the pointers, we've effectively leaked the memory.
The shortest change would have been to insert an extra call to xfs_idestroy_fork, but it makes more sense to bundle the _idestroy_fork call into _zap_attr, since all other callsites call _idestroy_fork immediately prior to calling _zap_attr. IOWs, it eliminates one way to fail.
Note: This change only applies cleanly to 2ed5b09b3e8f, since we just reworked the attr fork lifetime. However, I think this memory leak has existed since 0f45a1b20cd8, since the chain xfs_iformat_attr_fork -> xfs_iformat_local -> xfs_init_local_fork will allocate ifp->if_u1.if_data, but if xfs_ifork_verify_local_attr fails, xfs_iformat_attr_fork will free i_afp without freeing any of the stuff hanging off i_afp. The solution for older kernels I think is to add the missing call to xfs_idestroy_fork just prior to calling kmem_cache_free.
Found by fuzzing a.sfattr.hdr.totsize = lastbit in xfs/399.
Fixes: 2ed5b09b3e8f ("xfs: make inode attribute forks a permanent part of struct xfs_inode") Probably-Fixes: 0f45a1b20cd8 ("xfs: improve local fork verification") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com
conflicts: fs/xfs/libxfs/xfs_attr_leaf.c
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/libxfs/xfs_attr_leaf.c | 1 - fs/xfs/libxfs/xfs_inode_fork.c | 5 +---- fs/xfs/xfs_attr_inactive.c | 1 - fs/xfs/xfs_icache.c | 1 - 4 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 689f1100451c..b5799a217894 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -768,7 +768,6 @@ xfs_attr_fork_remove( { ASSERT(ip->i_af.if_nextents == 0);
- xfs_idestroy_fork(&ip->i_af); xfs_ifork_zap_attr(ip); ip->i_d.di_forkoff = 0; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 520448e5025d..11000cb8129a 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -295,10 +295,7 @@ void xfs_ifork_zap_attr( struct xfs_inode *ip) { - ASSERT(ip->i_af.if_broot == NULL); - ASSERT(ip->i_af.if_u1.if_data == NULL); - ASSERT(ip->i_af.if_height == 0); - + xfs_idestroy_fork(&ip->i_af); memset(&ip->i_af, 0, sizeof(struct xfs_ifork)); ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; } diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 10cc601b9e51..0be9a3567c95 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -385,7 +385,6 @@ xfs_attr_inactive( xfs_trans_cancel(trans); out_destroy_fork: /* kill the in-core attr fork before we drop the inode lock */ - xfs_idestroy_fork(&dp->i_af); xfs_ifork_zap_attr(dp); if (lock_mode) xfs_iunlock(dp, lock_mode); diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 82708548f0c4..ab220c5dd20c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -136,7 +136,6 @@ xfs_inode_free_callback( break; }
- xfs_idestroy_fork(&ip->i_af); xfs_ifork_zap_attr(ip);
if (ip->i_cowfp) {
From: John Garry john.garry@huawei.com
mainline inclusion from mainline-v5.11-rc5 commit e1dc20995cb9fa04b46e8f37113a7203c906d2bf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------------------------
The current check of nvec < minvec for nvec returned from platform_irq_count() will not detect a negative error code in nvec.
This is because minvec is unsigned, and, as such, nvec is promoted to unsigned in that check, which will make it a huge number (if it contained -EPROBE_DEFER).
In practice, an error should not occur in nvec for the only in-tree user, but add a check anyway.
Fixes: e15f2fa959f2 ("driver core: platform: Add devm_platform_get_irqs_affinity()") Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/1608561055-231244-1-git-send-email-john.garry@huaw... Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/base/platform.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c index ea8add164b89..74c97b65048c 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -351,6 +351,8 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev, return -ERANGE;
nvec = platform_irq_count(dev); + if (nvec < 0) + return nvec;
if (nvec < minvec) return -ENOSPC;
From: Saravana Kannan saravanak@google.com
mainline inclusion from mainline-v5.11-rc1 commit 7008e58c63bc8468e8d16154e25d780198b3ecfc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------------
There's a potential deadlock with the following cycle: wfs_lock --> device_links_lock --> kn->count
Fix this by simply dropping the lock around a list_empty() check that's just exported to a sysfs file. The sysfs file output is an instantaneous check anyway and the lock doesn't really add any protection.
Lockdep log:
[ 48.808132] [ 48.808132] the existing dependency chain (in reverse order) is: [ 48.809069] [ 48.809069] -> #2 (kn->count){++++}: [ 48.809707] __kernfs_remove.llvm.7860393000964815146+0x2d4/0x460 [ 48.810537] kernfs_remove_by_name_ns+0x54/0x9c [ 48.811171] sysfs_remove_file_ns+0x18/0x24 [ 48.811762] device_del+0x2b8/0x5a8 [ 48.812269] __device_link_del+0x98/0xb8 [ 48.812829] device_links_driver_bound+0x210/0x2d8 [ 48.813496] driver_bound+0x44/0xf8 [ 48.814000] really_probe+0x340/0x6e0 [ 48.814526] driver_probe_device+0xb8/0x100 [ 48.815117] device_driver_attach+0x78/0xb8 [ 48.815708] __driver_attach+0xe0/0x194 [ 48.816255] bus_for_each_dev+0xa8/0x11c [ 48.816816] driver_attach+0x24/0x30 [ 48.817331] bus_add_driver+0x100/0x1e0 [ 48.817880] driver_register+0x78/0x114 [ 48.818427] __platform_driver_register+0x44/0x50 [ 48.819089] 0xffffffdbb3227038 [ 48.819551] do_one_initcall+0xd8/0x1e0 [ 48.820099] do_init_module+0xd8/0x298 [ 48.820636] load_module+0x3afc/0x44c8 [ 48.821173] __arm64_sys_finit_module+0xbc/0xf0 [ 48.821807] el0_svc_common+0xbc/0x1d0 [ 48.822344] el0_svc_handler+0x74/0x98 [ 48.822882] el0_svc+0x8/0xc [ 48.823310] [ 48.823310] -> #1 (device_links_lock){+.+.}: [ 48.824036] __mutex_lock_common+0xe0/0xe44 [ 48.824626] mutex_lock_nested+0x28/0x34 [ 48.825185] device_link_add+0xd4/0x4ec [ 48.825734] of_link_to_suppliers+0x158/0x204 [ 48.826347] of_fwnode_add_links+0x50/0x64 [ 48.826928] device_link_add_missing_supplier_links+0x90/0x11c [ 48.827725] fw_devlink_resume+0x58/0x130 [ 48.828296] of_platform_default_populate_init+0xb4/0xd0 [ 48.829030] do_one_initcall+0xd8/0x1e0 [ 48.829578] do_initcall_level+0xb8/0xcc [ 48.830137] do_basic_setup+0x60/0x7c [ 48.830662] kernel_init_freeable+0x128/0x1ac [ 48.831275] kernel_init+0x18/0x29c [ 48.831781] ret_from_fork+0x10/0x18 [ 48.832297] [ 48.832297] -> #0 (wfs_lock){+.+.}: [ 48.832922] __lock_acquire+0xe04/0x2e20 [ 48.833480] lock_acquire+0xbc/0xec [ 48.833984] __mutex_lock_common+0xe0/0xe44 [ 48.834577] mutex_lock_nested+0x28/0x34 [ 48.835136] waiting_for_supplier_show+0x3c/0x98 [ 48.835781] dev_attr_show+0x48/0xb4 [ 48.836295] sysfs_kf_seq_show+0xe8/0x184 [ 48.836864] kernfs_seq_show+0x48/0x8c [ 48.837401] seq_read+0x1c8/0x600 [ 48.837884] kernfs_fop_read+0x68/0x204 [ 48.838431] __vfs_read+0x60/0x214 [ 48.838925] vfs_read+0xbc/0x15c [ 48.839397] ksys_read+0x78/0xe4 [ 48.839869] __arm64_sys_read+0x1c/0x28 [ 48.840416] el0_svc_common+0xbc/0x1d0 [ 48.840953] el0_svc_handler+0x74/0x98 [ 48.841490] el0_svc+0x8/0xc [ 48.841917] [ 48.841917] other info that might help us debug this: [ 48.841917] [ 48.842920] Chain exists of: [ 48.842920] wfs_lock --> device_links_lock --> kn->count [ 48.842920] [ 48.844152] Possible unsafe locking scenario: [ 48.844152] [ 48.844895] CPU0 CPU1 [ 48.845463] ---- ---- [ 48.846032] lock(kn->count); [ 48.846417] lock(device_links_lock); [ 48.847203] lock(kn->count); [ 48.847902] lock(wfs_lock); [ 48.848276] [ 48.848276] *** DEADLOCK ***
Reported-by: Cheng-Jui.Wang@mediatek.com Signed-off-by: Saravana Kannan saravanak@google.com Link: https://lore.kernel.org/r/20201104205431.3795207-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/base/core.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c index 9a874a58d690..af0024a9aa7c 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1092,10 +1092,8 @@ static ssize_t waiting_for_supplier_show(struct device *dev, bool val;
device_lock(dev); - mutex_lock(&wfs_lock); val = !list_empty(&dev->links.needs_suppliers) && dev->links.need_for_probe; - mutex_unlock(&wfs_lock); device_unlock(dev); return sysfs_emit(buf, "%u\n", val); }
From: Mikulas Patocka mpatocka@redhat.com
mainline inclusion from mainline-v6.3-rc4 commit fb294b1c0ba982144ca467a75e7d01ff26304e2b category: bugfix bugzilla: 188393, https://gitee.com/openeuler/kernel/issues/I6JPSH
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
The loop in dmcrypt_write may be running for unbounded amount of time, thus we need cond_resched() in it.
This commit fixes the following warning:
[ 3391.153255][ C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897] ... [ 3391.387210][ C12] Call trace: [ 3391.390338][ C12] blk_attempt_bio_merge.part.6+0x38/0x158 [ 3391.395970][ C12] blk_attempt_plug_merge+0xc0/0x1b0 [ 3391.401085][ C12] blk_mq_submit_bio+0x398/0x550 [ 3391.405856][ C12] submit_bio_noacct+0x308/0x380 [ 3391.410630][ C12] dmcrypt_write+0x1e4/0x208 [dm_crypt] [ 3391.416005][ C12] kthread+0x130/0x138 [ 3391.419911][ C12] ret_from_fork+0x10/0x18
Reported-by: yangerkun yangerkun@huawei.com Fixes: dc2676210c42 ("dm crypt: offload writes to thread") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/dm-crypt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 3d975db86434..17ddca293965 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data) io = crypt_io_from_node(rb_first(&write_tree)); rb_erase(&io->rb_node, &write_tree); kcryptd_io_write(io); + cond_resched(); } while (!RB_EMPTY_ROOT(&write_tree)); blk_finish_plug(&plug); }
From: Filipe Manana fdmanana@suse.com
mainline inclusion from mainline-v6.2-rc8 commit 2f1a6be12ab6c8470d5776e68644726c94257c54 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6PQCT CVE: CVE-2023-1611
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The quota assign ioctl can currently run in parallel with a quota disable ioctl call. The assign ioctl uses the quota root, while the disable ioctl frees that root, and therefore we can have a use-after-free triggered in the assign ioctl, leading to a trace like the following when KASAN is enabled:
[672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0 [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736 [672.724][T736] [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37 [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [672.727][T736] Call Trace: [672.728][T736] <TASK> [672.728][T736] dump_stack_lvl+0xd9/0x150 [672.725][T736] print_report+0xc1/0x5e0 [672.720][T736] ? __virt_addr_valid+0x61/0x2e0 [672.727][T736] ? __phys_addr+0xc9/0x150 [672.725][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.722][T736] kasan_report+0xc0/0xf0 [672.729][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.724][T736] btrfs_search_slot+0x2962/0x2db0 [672.723][T736] ? fs_reclaim_acquire+0xba/0x160 [672.722][T736] ? split_leaf+0x13d0/0x13d0 [672.726][T736] ? rcu_is_watching+0x12/0xb0 [672.723][T736] ? kmem_cache_alloc+0x338/0x3c0 [672.722][T736] update_qgroup_status_item+0xf7/0x320 [672.724][T736] ? add_qgroup_rb+0x3d0/0x3d0 [672.739][T736] ? do_raw_spin_lock+0x12d/0x2b0 [672.730][T736] ? spin_bug+0x1d0/0x1d0 [672.737][T736] btrfs_run_qgroups+0x5de/0x840 [672.730][T736] ? btrfs_qgroup_rescan_worker+0xa70/0xa70 [672.738][T736] ? __del_qgroup_relation+0x4ba/0xe00 [672.738][T736] btrfs_ioctl+0x3d58/0x5d80 [672.735][T736] ? tomoyo_path_number_perm+0x16a/0x550 [672.737][T736] ? tomoyo_execute_permission+0x4a0/0x4a0 [672.731][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.737][T736] ? __sanitizer_cov_trace_switch+0x54/0x90 [672.734][T736] ? do_vfs_ioctl+0x132/0x1660 [672.730][T736] ? vfs_fileattr_set+0xc40/0xc40 [672.730][T736] ? _raw_spin_unlock_irq+0x2e/0x50 [672.732][T736] ? sigprocmask+0xf2/0x340 [672.737][T736] ? __fget_files+0x26a/0x480 [672.732][T736] ? bpf_lsm_file_ioctl+0x9/0x10 [672.738][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.736][T736] __x64_sys_ioctl+0x198/0x210 [672.736][T736] do_syscall_64+0x39/0xb0 [672.731][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.739][T736] RIP: 0033:0x4556ad [672.742][T736] </TASK> [672.743][T736] [672.748][T736] Allocated by task 27677: [672.743][T736] kasan_save_stack+0x22/0x40 [672.741][T736] kasan_set_track+0x25/0x30 [672.741][T736] __kasan_kmalloc+0xa4/0xb0 [672.749][T736] btrfs_alloc_root+0x48/0x90 [672.746][T736] btrfs_create_tree+0x146/0xa20 [672.744][T736] btrfs_quota_enable+0x461/0x1d20 [672.743][T736] btrfs_ioctl+0x4a1c/0x5d80 [672.747][T736] __x64_sys_ioctl+0x198/0x210 [672.749][T736] do_syscall_64+0x39/0xb0 [672.744][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.756][T736] [672.757][T736] Freed by task 27677: [672.759][T736] kasan_save_stack+0x22/0x40 [672.759][T736] kasan_set_track+0x25/0x30 [672.756][T736] kasan_save_free_info+0x2e/0x50 [672.751][T736] ____kasan_slab_free+0x162/0x1c0 [672.758][T736] slab_free_freelist_hook+0x89/0x1c0 [672.752][T736] __kmem_cache_free+0xaf/0x2e0 [672.752][T736] btrfs_put_root+0x1ff/0x2b0 [672.759][T736] btrfs_quota_disable+0x80a/0xbc0 [672.752][T736] btrfs_ioctl+0x3e5f/0x5d80 [672.756][T736] __x64_sys_ioctl+0x198/0x210 [672.753][T736] do_syscall_64+0x39/0xb0 [672.765][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.769][T736] [672.768][T736] The buggy address belongs to the object at ffff888022ec0000 [672.768][T736] which belongs to the cache kmalloc-4k of size 4096 [672.769][T736] The buggy address is located 520 bytes inside of [672.769][T736] freed 4096-byte region [ffff888022ec0000, ffff888022ec1000) [672.760][T736] [672.764][T736] The buggy address belongs to the physical page: [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0 [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002 [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 [672.771][T736] page dumped because: kasan: bad access detected [672.778][T736] page_owner tracks the page as allocated [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88 [672.779][T736] get_page_from_freelist+0x119c/0x2d50 [672.779][T736] __alloc_pages+0x1cb/0x4a0 [672.776][T736] alloc_pages+0x1aa/0x270 [672.773][T736] allocate_slab+0x260/0x390 [672.771][T736] ___slab_alloc+0xa9a/0x13e0 [672.778][T736] __slab_alloc.constprop.0+0x56/0xb0 [672.771][T736] __kmem_cache_alloc_node+0x136/0x320 [672.789][T736] __kmalloc+0x4e/0x1a0 [672.783][T736] tomoyo_realpath_from_path+0xc3/0x600 [672.781][T736] tomoyo_path_perm+0x22f/0x420 [672.782][T736] tomoyo_path_unlink+0x92/0xd0 [672.780][T736] security_path_unlink+0xdb/0x150 [672.788][T736] do_unlinkat+0x377/0x680 [672.788][T736] __x64_sys_unlink+0xca/0x110 [672.789][T736] do_syscall_64+0x39/0xb0 [672.783][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.784][T736] page last free stack trace: [672.787][T736] free_pcp_prepare+0x4e5/0x920 [672.787][T736] free_unref_page+0x1d/0x4e0 [672.784][T736] __unfreeze_partials+0x17c/0x1a0 [672.797][T736] qlist_free_all+0x6a/0x180 [672.796][T736] kasan_quarantine_reduce+0x189/0x1d0 [672.797][T736] __kasan_slab_alloc+0x64/0x90 [672.793][T736] kmem_cache_alloc+0x17c/0x3c0 [672.799][T736] getname_flags.part.0+0x50/0x4e0 [672.799][T736] getname_flags+0x9e/0xe0 [672.792][T736] vfs_fstatat+0x77/0xb0 [672.791][T736] __do_sys_newlstat+0x84/0x100 [672.798][T736] do_syscall_64+0x39/0xb0 [672.796][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.790][T736] [672.791][T736] Memory state around the buggy address: [672.799][T736] ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.805][T736] ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ^ [672.809][T736] ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex before calling btrfs_run_qgroups(), which is what all qgroup ioctls should call.
Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8w... CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/btrfs/ioctl.c | 2 ++ fs/btrfs/qgroup.c | 11 ++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b5e9bfe884c4..186eaab58722 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4274,7 +4274,9 @@ static long btrfs_ioctl_qgroup_assign(struct file *file, void __user *arg) }
/* update qgroup status and info */ + mutex_lock(&fs_info->qgroup_ioctl_lock); err = btrfs_run_qgroups(trans); + mutex_unlock(&fs_info->qgroup_ioctl_lock); if (err < 0) btrfs_handle_fs_error(fs_info, err, "failed to update qgroup status and info"); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index a02e38fb696c..dc512a86d7c7 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2747,13 +2747,22 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) }
/* - * called from commit_transaction. Writes all changed qgroups to disk. + * Writes all changed qgroups to disk. + * Called by the transaction commit path and the qgroup assign ioctl. */ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; int ret = 0;
+ /* + * In case we are called from the qgroup assign ioctl, assert that we + * are holding the qgroup_ioctl_lock, otherwise we can race with a quota + * disable operation (ioctl) and access a freed quota root. + */ + if (trans->transaction->state != TRANS_STATE_COMMIT_DOING) + lockdep_assert_held(&fs_info->qgroup_ioctl_lock); + if (!fs_info->quota_root) return ret;
From: Zhihao Cheng chengzhihao1@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV CVE: NA
Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html
--------------------------------
Following process makes ext4 load stale buffer heads from last failed mounting in a new mounting operation: mount_bdev ext4_fill_super | ext4_load_and_init_journal | ext4_load_journal | jbd2_journal_load | load_superblock | journal_get_superblock | set_buffer_verified(bh) // buffer head is verified | jbd2_journal_recover // failed caused by EIO | goto failed_mount3a // skip 'sb->s_root' initialization deactivate_locked_super kill_block_super generic_shutdown_super if (sb->s_root) // false, skip ext4_put_super->invalidate_bdev-> // invalidate_mapping_pages->mapping_evict_folio-> // filemap_release_folio->try_to_free_buffers, which // cannot drop buffer head. blkdev_put blkdev_put_whole if (atomic_dec_and_test(&bdev->bd_openers)) // false, systemd-udev happens to open the device. Then // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> // truncate_inode_folio->truncate_cleanup_folio-> // folio_invalidate->block_invalidate_folio-> // filemap_release_folio->try_to_free_buffers will be skipped, // dropping buffer head is missed again.
Second mount: ext4_fill_super ext4_load_and_init_journal ext4_load_journal ext4_get_journal jbd2_journal_init_inode journal_init_common bh = getblk_unmovable bh = __find_get_block // Found stale bh in last failed mounting journal->j_sb_buffer = bh jbd2_journal_load load_superblock journal_get_superblock if (buffer_verified(bh)) // true, skip journal->j_format_version = 2, value is 0 jbd2_journal_recover do_one_pass next_log_block += count_tags(journal, bh) // According to journal_tag_bytes(), 'tag_bytes' calculating is // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() // returns false because 'j->j_format_version >= 2' is not true, // then we get wrong next_log_block. The do_one_pass may exit // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare reading block device(eg. systemd-udev), so we can fix it by invalidating bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") Cc: stable@vger.kernel.org # v3.5 Conflicts: fs/ext4/super.c [ a7a79c292ac37("ext4: unify the ext4 super block loading operation") is not applied. 7edfd85b1ffd3("ext4: Completely separate options parsing and sb setup") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ext4/super.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3278d46a7d65..ac7e7396be5a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1144,6 +1144,12 @@ static void ext4_blkdev_remove(struct ext4_sb_info *sbi) struct block_device *bdev; bdev = sbi->s_journal_bdev; if (bdev) { + /* + * Invalidate the journal device's buffers. We don't want them + * floating about in memory - the physical journal device may + * hotswapped, and it breaks the `ro-after' testing code. + */ + invalidate_bdev(bdev); ext4_blkdev_put(bdev); sbi->s_journal_bdev = NULL; } @@ -1284,13 +1290,7 @@ static void ext4_put_super(struct super_block *sb) sync_blockdev(sb->s_bdev); invalidate_bdev(sb->s_bdev); if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { - /* - * Invalidate the journal device's buffers. We don't want them - * floating about in memory - the physical journal device may - * hotswapped, and it breaks the `ro-after' testing code. - */ sync_blockdev(sbi->s_journal_bdev); - invalidate_bdev(sbi->s_journal_bdev); ext4_blkdev_remove(sbi); }
@@ -5279,6 +5279,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) brelse(bh); ext4_blkdev_remove(sbi); out_fail: + invalidate_bdev(sb->s_bdev); sb->s_fs_info = NULL; kfree(sbi->s_blockgroup_lock); out_free_base:
From: Zhihao Cheng chengzhihao1@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV CVE: NA
Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html
--------------------------------
As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always become true if sbi->s_journal_bdev exists. Filesystem block device and journal block device are both opened with 'FMODE_EXCL' mode, so these two devices can't be same one. Then we can remove the redundant checking 'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.
[1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com...
Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ac7e7396be5a..d87a435adc46 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1289,7 +1289,7 @@ static void ext4_put_super(struct super_block *sb)
sync_blockdev(sb->s_bdev); invalidate_bdev(sb->s_bdev); - if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) { + if (sbi->s_journal_bdev) { sync_blockdev(sbi->s_journal_bdev); ext4_blkdev_remove(sbi); }
From: Zhihao Cheng chengzhihao1@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SMBI CVE: NA
Reference: https://www.spinics.net/lists/linux-ext4/msg88386.html
--------------------------------
Following process makes i_disksize exceed i_size:
generic_perform_write copied = iov_iter_copy_from_user_atomic(len) // copied < len ext4_da_write_end | ext4_update_i_disksize | new_i_size = pos + copied; | WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize | generic_write_end | copied = block_write_end(copied, len) // copied = 0 | if (unlikely(copied < len)) | if (!PageUptodate(page)) | copied = 0; | if (pos + copied > inode->i_size) // return false if (unlikely(copied == 0)) goto again; if (unlikely(iov_iter_fault_in_readable(i, bytes))) { status = -EFAULT; break; }
We get i_disksize greater than i_size here, which could trigger WARNING check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:
ext4_dio_write_iter iomap_dio_rw __iomap_dio_rw // return err, length is not aligned to 512 ext4_handle_inode_extension WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops
WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2 RIP: 0010:ext4_file_write_iter+0xbc7 Call Trace: vfs_write+0x3b1 ksys_write+0x77 do_syscall_64+0x39
Fix it by updating 'copied' value before updating i_disksize just like ext4_write_inline_data_end() does.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209 Fixes: 64769240bd07 ("ext4: Add delayed allocation support in data=writeback mode") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ext4/inode.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a0bf0253485a..fe2e81737ff1 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3084,6 +3084,9 @@ static int ext4_da_write_end(struct file *file, ext4_has_inline_data(inode)) return ext4_write_inline_data_end(inode, pos, len, copied, page);
+ if (unlikely(copied < len) && !PageUptodate(page)) + copied = 0; + start = pos & (PAGE_SIZE - 1); end = start + copied - 1;
From: Zhong Jinghua zhongjinghua@huawei.com
hulk inclusion category: bugfix bugzilla: 188586, https://gitee.com/openeuler/kernel/issues/I6TFPJ CVE: NA
----------------------------------------
We found that in loop_control_ioctl, the kernel panic can be easily caused:
1. syscall(__NR_ioctl, r[1], 0x4c80, 0x80000200000ul); Create a loop device 0x80000200000ul. In fact, in the code, it is used as the first_minor number, and the first_minor number is 0. So the created loop device number is 7:0.
2. syscall(__NR_ioctl, r[2], 0x4c80, 0ul); Create a loop device 0x0ul. Since the 7:0 device has been created in 1, add_disk will fail because the major and first_minor numbers are consistent.
3. syscall(__NR_ioctl, r[5], 0x4c81, 0ul); Delete the device that failed to create, the kernel panics.
Panic like below: BUG: KASAN: null-ptr-deref in device_del+0xb3/0x840 drivers/base/core.c:3107 Call Trace: kill_device drivers/base/core.c:3079 [inline] device_del+0xb3/0x840 drivers/base/core.c:3107 del_gendisk+0x463/0x5f0 block/genhd.c:971 loop_remove drivers/block/loop.c:2190 [inline] loop_control_ioctl drivers/block/loop.c:2289 [inline]
The stack like below: Create loop device: loop_control_ioctl loop_add add_disk device_add_disk bdi_register bdi_register_va device_create device_create_groups_vargs device_add kfree(dev->p); dev->p = NULL;
Remove loop device: loop_control_ioctl loop_remove del_gendisk device_del kill_device if (dev->p->dead) // p is null
Fix it by adding a check for parm.
Fixes: 770fe30a46a1 ("loop: add management interface for on-demand device allocation") Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/block/loop.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 76b96c42f417..60f2a31c4a24 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2084,6 +2084,17 @@ static int loop_add(struct loop_device **l, int i) struct gendisk *disk; int err;
+ /* + * i << part_shift is actually used as the first_minor. + * So here should avoid i << part_shift overflow. + * And, MKDEV() expect that the max bits of + * first_minor is 20. + */ + if (i > 0 && i > MINORMASK >> part_shift) { + err = -EINVAL; + goto out; + } + err = -ENOMEM; lo = kzalloc(sizeof(*lo), GFP_KERNEL); if (!lo) @@ -2097,7 +2108,8 @@ static int loop_add(struct loop_device **l, int i) if (err == -ENOSPC) err = -EEXIST; } else { - err = idr_alloc(&loop_index_idr, lo, 0, 0, GFP_KERNEL); + err = idr_alloc(&loop_index_idr, lo, 0, + (MINORMASK >> part_shift) + 1, GFP_KERNEL); } if (err < 0) goto out_free_dev;
From: Zheng Yejian zhengyejian1@huawei.com
mainline inclusion from mainline-v6.3-rc6 commit 6455b6163d8c680366663cdb8c679514d55fc30c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TJ97
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When user reads file 'trace_pipe', kernel keeps printing following logs that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in rb_get_reader_page(). It just looks like there's an infinite loop in tracing_read_pipe(). This problem occurs several times on arm64 platform when testing v5.10 and below.
Call trace: rb_get_reader_page+0x248/0x1300 rb_buffer_peek+0x34/0x160 ring_buffer_peek+0xbc/0x224 peek_next_entry+0x98/0xbc __find_next_entry+0xc4/0x1c0 trace_find_next_entry_inc+0x30/0x94 tracing_read_pipe+0x198/0x304 vfs_read+0xb4/0x1e0 ksys_read+0x74/0x100 __arm64_sys_read+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180
Then I dump the vmcore and look into the problematic per_cpu ring_buffer, I found that tail_page/commit_page/reader_page are on the same page while reader_page->read is obviously abnormal: tail_page == commit_page == reader_page == { .write = 0x100d20, .read = 0x8f9f4805, // Far greater than 0xd20, obviously abnormal!!! .entries = 0x10004c, .real_end = 0x0, .page = { .time_stamp = 0x857257416af0, .commit = 0xd20, // This page hasn't been full filled. // .data[0...0xd20] seems normal. } }
The root cause is most likely the race that reader and writer are on the same page while reader saw an event that not fully committed by writer.
To fix this, add memory barriers to make sure the reader can see the content of what is committed. Since commit a0fcaaed0c46 ("ring-buffer: Fix race between reset page and reading page") has added the read barrier in rb_get_reader_page(), here we just need to add the write barrier.
Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyej...
Cc: stable@vger.kernel.org Fixes: 77ae365eca89 ("ring-buffer: make lockless") Suggested-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/trace/ring_buffer.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c64a654e213e..a6dee283fba8 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2887,6 +2887,10 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer) if (RB_WARN_ON(cpu_buffer, rb_is_reader_page(cpu_buffer->tail_page))) return; + /* + * No need for a memory barrier here, as the update + * of the tail_page did it for this page. + */ local_set(&cpu_buffer->commit_page->page->commit, rb_page_write(cpu_buffer->commit_page)); rb_inc_page(cpu_buffer, &cpu_buffer->commit_page); @@ -2896,6 +2900,8 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer) while (rb_commit_index(cpu_buffer) != rb_page_write(cpu_buffer->commit_page)) {
+ /* Make sure the readers see the content of what is committed. */ + smp_wmb(); local_set(&cpu_buffer->commit_page->page->commit, rb_page_write(cpu_buffer->commit_page)); RB_WARN_ON(cpu_buffer, @@ -4322,7 +4328,12 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
/* * Make sure we see any padding after the write update - * (see rb_reset_tail()) + * (see rb_reset_tail()). + * + * In addition, a writer may be writing on the reader page + * if the page has not been fully filled, so the read barrier + * is also needed to make sure we see the content of what is + * committed by the writer (see rb_set_commit_to_write()). */ smp_rmb();
From: Zheng Wang zyytlz.wz@163.com
stable inclusion from stable-v5.10.176 commit bfeeb3aaad4ee8eaaefe5d9edd9b2ccb5d9b7505 category: bugfix bugzilla: 188641, https://gitee.com/src-openeuler/kernel/issues/I6R4MM CVE: CVE-2023-1670
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e8d20c3ded59a092532513c9bd030d1ea66f5f44 ]
In xirc2ps_probe, the local->tx_timeout_task was bounded with xirc2ps_tx_timeout_task. When timeout occurs, it will call xirc_tx_timeout->schedule_work to start the work.
When we call xirc2ps_detach to remove the driver, there may be a sequence as follows:
Stop responding to timeout tasks and complete scheduled tasks before cleanup in xirc2ps_detach, which will fix the problem.
CPU0 CPU1
|xirc2ps_tx_timeout_task xirc2ps_detach | free_netdev | kfree(dev); | | | do_reset | //use dev
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zheng Wang zyytlz.wz@163.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/ethernet/xircom/xirc2ps_cs.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/xircom/xirc2ps_cs.c b/drivers/net/ethernet/xircom/xirc2ps_cs.c index 3e337142b516..56cef59c1c87 100644 --- a/drivers/net/ethernet/xircom/xirc2ps_cs.c +++ b/drivers/net/ethernet/xircom/xirc2ps_cs.c @@ -503,6 +503,11 @@ static void xirc2ps_detach(struct pcmcia_device *link) { struct net_device *dev = link->priv; + struct local_info *local = netdev_priv(dev); + + netif_carrier_off(dev); + netif_tx_disable(dev); + cancel_work_sync(&local->tx_timeout_task);
dev_dbg(&link->dev, "detach\n");
From: Yang Jihong yangjihong1@huawei.com
mainline inclusion from mainline-v6.3-rc3 commit eb81a2ed4f52be831c9fb879752d89645a312c13 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6ODHQ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
syzkaller reportes a KASAN issue with stack-out-of-bounds. The call trace is as follows: dump_stack+0x9c/0xd3 print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 __perf_event_header__init_id+0x34/0x290 perf_event_header__init_id+0x48/0x60 perf_output_begin+0x4a4/0x560 perf_event_bpf_output+0x161/0x1e0 perf_iterate_sb_cpu+0x29e/0x340 perf_iterate_sb+0x4c/0xc0 perf_event_bpf_event+0x194/0x2c0 __bpf_prog_put.constprop.0+0x55/0xf0 __cls_bpf_delete_prog+0xea/0x120 [cls_bpf] cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf] process_one_work+0x3c2/0x730 worker_thread+0x93/0x650 kthread+0x1b8/0x210 ret_from_fork+0x1f/0x30
commit 267fb27352b6 ("perf: Reduce stack usage of perf_output_begin()") use on-stack struct perf_sample_data of the caller function.
However, perf_event_bpf_output uses incorrect parameter to convert small-sized data (struct perf_bpf_event) into large-sized data (struct perf_sample_data), which causes memory overwriting occurs in __perf_event_header__init_id.
Fixes: 267fb27352b6 ("perf: Reduce stack usage of perf_output_begin()") Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.com Signed-off-by: Yang Jihong yangjihong1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 855197077c73..a3e581c0169d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8665,7 +8665,7 @@ static void perf_event_bpf_output(struct perf_event *event, void *data)
perf_event_header__init_id(&bpf_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, data, event, + ret = perf_output_begin(&handle, &sample, event, bpf_event->event_id.header.size); if (ret) return;
From: Zheng Yejian zhengyejian1@huawei.com
mainline inclusion from mainline-v6.3-rc6 commit 2a2d8c51defb446e8d89a83f42f8e5cd529111e9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TQ89 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().
Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but not restored if error happened on calling ftrace_modify_direct_caller(). Then it can no longer find 'direct' by that 'old_addr'.
To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.
Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyej...
Cc: stable@vger.kernel.org Cc: mhiramat@kernel.org Cc: mark.rutland@arm.com Cc: ast@kernel.org Cc: daniel@iogearbox.net Fixes: 77ab77854be8 ("ftrace: Fix modify_ftrace_direct.") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/trace/ftrace.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index b09a9a9b49b4..76bcffd6916a 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5390,12 +5390,15 @@ int modify_ftrace_direct(unsigned long ip, ret = 0; }
- if (unlikely(ret && new_direct)) { - direct->count++; - list_del_rcu(&new_direct->next); - synchronize_rcu_tasks(); - kfree(new_direct); - ftrace_direct_func_count--; + if (ret) { + direct->addr = old_addr; + if (unlikely(new_direct)) { + direct->count++; + list_del_rcu(&new_direct->next); + synchronize_rcu_tasks(); + kfree(new_direct); + ftrace_direct_func_count--; + } }
out_unlock:
From: George Kennedy george.kennedy@oracle.com
stable inclusion from stable-v5.10.173 commit 846bfba34175c23b13cc2023c2d67b96e8c14c43 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PMQI
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 1b42b1a36fc946f0d7088425b90d491b4257ca3e ]
Ensure that the VID header offset + VID header size does not exceed the allocated area to avoid slab OOB.
BUG: KASAN: slab-out-of-bounds in crc32_body lib/crc32.c:111 [inline] BUG: KASAN: slab-out-of-bounds in crc32_le_generic lib/crc32.c:179 [inline] BUG: KASAN: slab-out-of-bounds in crc32_le_base+0x58c/0x626 lib/crc32.c:197 Read of size 4 at addr ffff88802bb36f00 by task syz-executor136/1555
CPU: 2 PID: 1555 Comm: syz-executor136 Tainted: G W 6.0.0-1868 #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x85/0xad lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold.13+0xb6/0x6bb mm/kasan/report.c:433 kasan_report+0xa7/0x11b mm/kasan/report.c:495 crc32_body lib/crc32.c:111 [inline] crc32_le_generic lib/crc32.c:179 [inline] crc32_le_base+0x58c/0x626 lib/crc32.c:197 ubi_io_write_vid_hdr+0x1b7/0x472 drivers/mtd/ubi/io.c:1067 create_vtbl+0x4d5/0x9c4 drivers/mtd/ubi/vtbl.c:317 create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline] ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812 ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601 ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965 ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0x0 RIP: 0033:0x7f96d5cf753d Code: RSP: 002b:00007fffd72206f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96d5cf753d RDX: 0000000020000080 RSI: 0000000040186f40 RDI: 0000000000000003 RBP: 0000000000400cd0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400be0 R13: 00007fffd72207e0 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Allocated by task 1555: kasan_save_stack+0x20/0x3d mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] ____kasan_kmalloc mm/kasan/common.c:516 [inline] __kasan_kmalloc+0x88/0xa3 mm/kasan/common.c:525 kasan_kmalloc include/linux/kasan.h:234 [inline] __kmalloc+0x138/0x257 mm/slub.c:4429 kmalloc include/linux/slab.h:605 [inline] ubi_alloc_vid_buf drivers/mtd/ubi/ubi.h:1093 [inline] create_vtbl+0xcc/0x9c4 drivers/mtd/ubi/vtbl.c:295 create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline] ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812 ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601 ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965 ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0x0
The buggy address belongs to the object at ffff88802bb36e00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 0 bytes to the right of 256-byte region [ffff88802bb36e00, ffff88802bb36f00)
The buggy address belongs to the physical page: page:00000000ea4d1263 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2bb36 head:00000000ea4d1263 order:1 compound_mapcount:0 compound_pincount:0 flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0010200 ffffea000066c300 dead000000000003 ffff888100042b40 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88802bb36e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88802bb36e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88802bb36f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88802bb36f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88802bb37000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 801c135ce73d ("UBI: Unsorted Block Images") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: George Kennedy george.kennedy@oracle.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Wang Hai wanghai38@huawei.com Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/mtd/ubi/build.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 62cdbf5bf562..e7a4923958c3 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -666,6 +666,12 @@ static int io_init(struct ubi_device *ubi, int max_beb_per1024) ubi->ec_hdr_alsize = ALIGN(UBI_EC_HDR_SIZE, ubi->hdrs_min_io_size); ubi->vid_hdr_alsize = ALIGN(UBI_VID_HDR_SIZE, ubi->hdrs_min_io_size);
+ if (ubi->vid_hdr_offset && ((ubi->vid_hdr_offset + UBI_VID_HDR_SIZE) > + ubi->vid_hdr_alsize)) { + ubi_err(ubi, "VID header offset %d too large.", ubi->vid_hdr_offset); + return -EINVAL; + } + dbg_gen("min_io_size %d", ubi->min_io_size); dbg_gen("max_write_size %d", ubi->max_write_size); dbg_gen("hdrs_min_io_size %d", ubi->hdrs_min_io_size);
From: Zhihao Cheng chengzhihao1@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6U6XK
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?i...
--------------------------------
Following process will make ubi attaching failed since commit 1b42b1a36fc946 ("ubi: ensure that VID header offset ... size"):
ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB modprobe nandsim id_bytes=$ID flash_eraseall /dev/mtd0 modprobe ubi mtd="0,2048" # set vid_hdr offset as 2048 (one page) (dmesg): ubi0 error: ubi_attach_mtd_dev [ubi]: VID header offset 2048 too large. UBI error: cannot attach mtd0 UBI error: cannot initialize UBI, error -22
Rework original solution, the key point is making sure 'vid_hdr_shift + UBI_VID_HDR_SIZE < ubi->vid_hdr_alsize', so we should check vid_hdr_shift rather not vid_hdr_offset. Then, ubi still support (sub)page aligined VID header offset.
Fixes: 1b42b1a36fc946 ("ubi: ensure that VID header offset ... size") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Tested-by: Nicolas Schichan nschichan@freebox.fr Tested-by: Miquel Raynal miquel.raynal@bootlin.com # v5.10, v4.19 Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/mtd/ubi/build.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index e7a4923958c3..8b879d4f7c79 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -666,12 +666,6 @@ static int io_init(struct ubi_device *ubi, int max_beb_per1024) ubi->ec_hdr_alsize = ALIGN(UBI_EC_HDR_SIZE, ubi->hdrs_min_io_size); ubi->vid_hdr_alsize = ALIGN(UBI_VID_HDR_SIZE, ubi->hdrs_min_io_size);
- if (ubi->vid_hdr_offset && ((ubi->vid_hdr_offset + UBI_VID_HDR_SIZE) > - ubi->vid_hdr_alsize)) { - ubi_err(ubi, "VID header offset %d too large.", ubi->vid_hdr_offset); - return -EINVAL; - } - dbg_gen("min_io_size %d", ubi->min_io_size); dbg_gen("max_write_size %d", ubi->max_write_size); dbg_gen("hdrs_min_io_size %d", ubi->hdrs_min_io_size); @@ -689,6 +683,21 @@ static int io_init(struct ubi_device *ubi, int max_beb_per1024) ubi->vid_hdr_aloffset; }
+ /* + * Memory allocation for VID header is ubi->vid_hdr_alsize + * which is described in comments in io.c. + * Make sure VID header shift + UBI_VID_HDR_SIZE not exceeds + * ubi->vid_hdr_alsize, so that all vid header operations + * won't access memory out of bounds. + */ + if ((ubi->vid_hdr_shift + UBI_VID_HDR_SIZE) > ubi->vid_hdr_alsize) { + ubi_err(ubi, "Invalid VID header offset %d, VID header shift(%d)" + " + VID header size(%zu) > VID header aligned size(%d).", + ubi->vid_hdr_offset, ubi->vid_hdr_shift, + UBI_VID_HDR_SIZE, ubi->vid_hdr_alsize); + return -EINVAL; + } + /* Similar for the data offset */ ubi->leb_start = ubi->vid_hdr_offset + UBI_VID_HDR_SIZE; ubi->leb_start = ALIGN(ubi->leb_start, ubi->min_io_size);
From: Zheng Wang zyytlz.wz@163.com
maillist inclusion category: bugfix bugzilla: 188655, https://gitee.com/src-openeuler/kernel/issues/I6T36H CVE: CVE-2023-1859
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
----------------------------------------
In xen_9pfs_front_probe, it calls xen_9pfs_front_alloc_dataring to init priv->rings and bound &ring->work with p9_xen_response.
When it calls xen_9pfs_front_event_handler to handle IRQ requests, it will finally call schedule_work to start the work.
When we call xen_9pfs_front_remove to remove the driver, there may be a sequence as follows:
Fix it by finishing the work before cleanup in xen_9pfs_front_free.
Note that, this bug is found by static analysis, which might be false positive.
CPU0 CPU1
|p9_xen_response xen_9pfs_front_remove| xen_9pfs_front_free| kfree(priv) | //free priv | |p9_tag_lookup |//use priv->client
Fixes: 71ebd71921e4 ("xen/9pfs: connect to the backend") Signed-off-by: Zheng Wang zyytlz.wz@163.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Signed-off-by: Eric Van Hensbergen ericvh@kernel.org Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/9p/trans_xen.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index 432ac5a16f2e..7e27f733869b 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -291,6 +291,10 @@ static void xen_9pfs_front_free(struct xen_9pfs_front_priv *priv) write_unlock(&xen_9pfs_lock);
for (i = 0; i < priv->num_rings; i++) { + struct xen_9pfs_dataring *ring = &priv->rings[i]; + + cancel_work_sync(&ring->work); + if (!priv->rings[i].intf) break; if (priv->rings[i].irq > 0)
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.1-rc1 commit 21985f43376cee092702d6cb963ff97a9d2ede68 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6 socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is changed to udp_prot and ->destroy() never cleans it up, resulting in a memory leak.
This is due to the discrepancy between inet6_destroy_sock() and IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM to remove the difference.
However, this is not enough for now because rxpmtu can be changed without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless sendmsg() support"). We will fix this case in the following patch.
Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy() in the future.
Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/net/ipv6.h | 1 + net/ipv6/af_inet6.c | 6 ++++++ net/ipv6/ipv6_sockglue.c | 20 ++++++++------------ 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 292bc81e7515..7f6091f4b8dd 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1109,6 +1109,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err, __be16 port, void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info); void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
+void inet6_cleanup_sock(struct sock *sk); int inet6_release(struct socket *sock); int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); int inet6_getname(struct socket *sock, struct sockaddr *uaddr, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 2524825e5157..3f1131889e29 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -507,6 +507,12 @@ void inet6_destroy_sock(struct sock *sk) } EXPORT_SYMBOL_GPL(inet6_destroy_sock);
+void inet6_cleanup_sock(struct sock *sk) +{ + inet6_destroy_sock(sk); +} +EXPORT_SYMBOL_GPL(inet6_cleanup_sock); + /* * This does both peername and sockname. */ diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index ce4e0da4ab9b..72391b5321af 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -429,9 +429,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, if (optlen < sizeof(int)) goto e_inval; if (val == PF_INET) { - struct ipv6_txoptions *opt; - struct sk_buff *pktopt; - if (sk->sk_type == SOCK_RAW) break;
@@ -462,7 +459,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; }
- fl6_free_socklist(sk); __ipv6_sock_mc_close(sk); __ipv6_sock_ac_close(sk);
@@ -500,14 +496,14 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sk->sk_socket->ops = &inet_dgram_ops; sk->sk_family = PF_INET; } - opt = xchg((__force struct ipv6_txoptions **)&np->opt, - NULL); - if (opt) { - atomic_sub(opt->tot_len, &sk->sk_omem_alloc); - txopt_put(opt); - } - pktopt = xchg(&np->pktoptions, NULL); - kfree_skb(pktopt); + + /* Disable all options not to allocate memory anymore, + * but there is still a race. See the lockless path + * in udpv6_sendmsg() and ipv6_local_rxpmtu(). + */ + np->rxopt.all = 0; + + inet6_cleanup_sock(sk);
/* * ... and add it to the refcnt debug socks count
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.1-rc1 commit d38afeec26ed4739c640bf286c270559aab2ba5f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case
- __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next.
Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/net/ipv6.h | 1 + include/net/udp.h | 2 +- include/net/udplite.h | 8 -------- net/ipv4/udp.c | 9 ++++++--- net/ipv4/udplite.c | 8 ++++++++ net/ipv6/af_inet6.c | 8 +++++++- net/ipv6/udp.c | 15 ++++++++++++++- net/ipv6/udp_impl.h | 1 + net/ipv6/udplite.c | 9 ++++++++- 9 files changed, 46 insertions(+), 15 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 7f6091f4b8dd..9a5b617ab199 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1110,6 +1110,7 @@ void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info); void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
void inet6_cleanup_sock(struct sock *sk); +void inet6_sock_destruct(struct sock *sk); int inet6_release(struct socket *sock); int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); int inet6_getname(struct socket *sock, struct sockaddr *uaddr, diff --git a/include/net/udp.h b/include/net/udp.h index 388e68c7bca0..e2550a4547a7 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -268,7 +268,7 @@ static inline bool udp_sk_bound_dev_eq(struct net *net, int bound_dev_if, }
/* net/ipv4/udp.c */ -void udp_destruct_sock(struct sock *sk); +void udp_destruct_common(struct sock *sk); void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len); int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb); void udp_skb_destructor(struct sock *sk, struct sk_buff *skb); diff --git a/include/net/udplite.h b/include/net/udplite.h index 9185e45b997f..c59ba86668af 100644 --- a/include/net/udplite.h +++ b/include/net/udplite.h @@ -24,14 +24,6 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset, return copy_from_iter_full(to, len, &msg->msg_iter) ? 0 : -EFAULT; }
-/* Designate sk as UDP-Lite socket */ -static inline int udplite_sk_init(struct sock *sk) -{ - udp_init_sock(sk); - udp_sk(sk)->pcflag = UDPLITE_BIT; - return 0; -} - /* * Checksumming routines */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index beccdd959d10..6d2add967358 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1582,7 +1582,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(__udp_enqueue_schedule_skb);
-void udp_destruct_sock(struct sock *sk) +void udp_destruct_common(struct sock *sk) { /* reclaim completely the forward allocated memory */ struct udp_sock *up = udp_sk(sk); @@ -1595,10 +1595,14 @@ void udp_destruct_sock(struct sock *sk) kfree_skb(skb); } udp_rmem_release(sk, total, 0, true); +} +EXPORT_SYMBOL_GPL(udp_destruct_common);
+static void udp_destruct_sock(struct sock *sk) +{ + udp_destruct_common(sk); inet_sock_destruct(sk); } -EXPORT_SYMBOL_GPL(udp_destruct_sock);
int udp_init_sock(struct sock *sk) { @@ -1606,7 +1610,6 @@ int udp_init_sock(struct sock *sk) sk->sk_destruct = udp_destruct_sock; return 0; } -EXPORT_SYMBOL_GPL(udp_init_sock);
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len) { diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index bd8773b49e72..cfb36655a5fd 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -17,6 +17,14 @@ struct udp_table udplite_table __read_mostly; EXPORT_SYMBOL(udplite_table);
+/* Designate sk as UDP-Lite socket */ +static int udplite_sk_init(struct sock *sk) +{ + udp_init_sock(sk); + udp_sk(sk)->pcflag = UDPLITE_BIT; + return 0; +} + static int udplite_rcv(struct sk_buff *skb) { return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 3f1131889e29..4197c311ab7b 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -109,6 +109,12 @@ static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk) return (struct ipv6_pinfo *)(((u8 *)sk) + offset); }
+void inet6_sock_destruct(struct sock *sk) +{ + inet6_cleanup_sock(sk); + inet_sock_destruct(sk); +} + static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern) { @@ -201,7 +207,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, inet->hdrincl = 1; }
- sk->sk_destruct = inet_sock_destruct; + sk->sk_destruct = inet6_sock_destruct; sk->sk_family = PF_INET6; sk->sk_protocol = protocol;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 08a2ceb6b999..750ba787e18e 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -54,6 +54,19 @@ #include <trace/events/skb.h> #include "udp_impl.h"
+static void udpv6_destruct_sock(struct sock *sk) +{ + udp_destruct_common(sk); + inet6_sock_destruct(sk); +} + +int udpv6_init_sock(struct sock *sk) +{ + skb_queue_head_init(&udp_sk(sk)->reader_queue); + sk->sk_destruct = udpv6_destruct_sock; + return 0; +} + static u32 udp6_ehashfn(const struct net *net, const struct in6_addr *laddr, const u16 lport, @@ -1699,7 +1712,7 @@ struct proto udpv6_prot = { .connect = ip6_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, - .init = udp_init_sock, + .init = udpv6_init_sock, .destroy = udpv6_destroy_sock, .setsockopt = udpv6_setsockopt, .getsockopt = udpv6_getsockopt, diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h index b2fcc46c1630..e49776819441 100644 --- a/net/ipv6/udp_impl.h +++ b/net/ipv6/udp_impl.h @@ -12,6 +12,7 @@ int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int); int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int, __be32, struct udp_table *);
+int udpv6_init_sock(struct sock *sk); int udp_v6_get_port(struct sock *sk, unsigned short snum); void udp_v6_rehash(struct sock *sk);
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c index fbb700d3f437..b6482e04dad0 100644 --- a/net/ipv6/udplite.c +++ b/net/ipv6/udplite.c @@ -12,6 +12,13 @@ #include <linux/proc_fs.h> #include "udp_impl.h"
+static int udplitev6_sk_init(struct sock *sk) +{ + udpv6_init_sock(sk); + udp_sk(sk)->pcflag = UDPLITE_BIT; + return 0; +} + static int udplitev6_rcv(struct sk_buff *skb) { return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE); @@ -38,7 +45,7 @@ struct proto udplitev6_prot = { .connect = ip6_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, - .init = udplite_sk_init, + .init = udplitev6_sk_init, .destroy = udpv6_destroy_sock, .setsockopt = udpv6_setsockopt, .getsockopt = udpv6_getsockopt,
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.2-rc1 commit b5fc29233d28be7a3322848ebe73ac327559cdb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources.
Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy().
DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Conflicts: net/ipv6/ping.c Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv6/ping.c | 6 ------ net/ipv6/raw.c | 2 -- net/ipv6/tcp_ipv6.c | 8 +------- net/ipv6/udp.c | 2 -- net/l2tp/l2tp_ip6.c | 2 -- net/mptcp/protocol.c | 7 ------- 6 files changed, 1 insertion(+), 26 deletions(-)
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 135e3a060caa..6ac88fe24a8e 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -22,11 +22,6 @@ #include <linux/proc_fs.h> #include <net/ping.h>
-static void ping_v6_destroy(struct sock *sk) -{ - inet6_destroy_sock(sk); -} - /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -171,7 +166,6 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, - .destroy = ping_v6_destroy, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, .setsockopt = ipv6_setsockopt, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index c9cda334fa12..e5c748ef27b9 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -1177,8 +1177,6 @@ static void raw6_destroy(struct sock *sk) lock_sock(sk); ip6_flush_pending_frames(sk); release_sock(sk); - - inet6_destroy_sock(sk); }
static int rawv6_init_sk(struct sock *sk) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 928be701fefb..fa2672e91c70 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1939,12 +1939,6 @@ static int tcp_v6_init_sock(struct sock *sk) return 0; }
-static void tcp_v6_destroy_sock(struct sock *sk) -{ - tcp_v4_destroy_sock(sk); - inet6_destroy_sock(sk); -} - #ifdef CONFIG_PROC_FS /* Proc filesystem TCPv6 sock list dumping. */ static void get_openreq6(struct seq_file *seq, @@ -2137,7 +2131,7 @@ struct proto tcpv6_prot = { .accept = inet_csk_accept, .ioctl = tcp_ioctl, .init = tcp_v6_init_sock, - .destroy = tcp_v6_destroy_sock, + .destroy = tcp_v4_destroy_sock, .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 750ba787e18e..f3cef94fd2a8 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1627,8 +1627,6 @@ void udpv6_destroy_sock(struct sock *sk) udp_encap_disable(); } } - - inet6_destroy_sock(sk); }
/* diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index d54dbd01d86f..382124d6f764 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -255,8 +255,6 @@ static void l2tp_ip6_destroy_sock(struct sock *sk)
if (tunnel) l2tp_tunnel_delete(tunnel); - - inet6_destroy_sock(sk); }
static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e61c85873ea2..72d944e6a641 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2863,12 +2863,6 @@ static const struct proto_ops mptcp_v6_stream_ops = {
static struct proto mptcp_v6_prot;
-static void mptcp_v6_destroy(struct sock *sk) -{ - mptcp_destroy(sk); - inet6_destroy_sock(sk); -} - static struct inet_protosw mptcp_v6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_MPTCP, @@ -2884,7 +2878,6 @@ int __init mptcp_proto_v6_init(void) mptcp_v6_prot = mptcp_prot; strcpy(mptcp_v6_prot.name, "MPTCPv6"); mptcp_v6_prot.slab = NULL; - mptcp_v6_prot.destroy = mptcp_v6_destroy; mptcp_v6_prot.obj_size = sizeof(struct mptcp6_sock);
err = proto_register(&mptcp_v6_prot, 1);
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.10.171 commit 3e4bbd1f38a8d35bd2d3aaffdb5f6ada546b669a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 62ec33b44e0f7168ff2886520fec6fb62d03b5a3 upstream.
Christoph Paasch reported that commit b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2] Also, we can reproduce it by a program in [3].
In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy() to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in inet_csk_destroy_sock().
The same check has been in inet_sock_destruct() from at least v2.6, we can just remove the WARN_ON_ONCE(). However, among the users of sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct(). Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().
[0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.c... [1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341 [2]: WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0 Modules linked in: CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9 RSP: 0018:ffff88810570fc68 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005 RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488 R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458 FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0 Call Trace: <TASK> inet_csk_destroy_sock+0x1a1/0x320 __tcp_close+0xab6/0xe90 tcp_close+0x30/0xc0 inet_release+0xe9/0x1f0 inet6_release+0x4c/0x70 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f7fdf7ae28d Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000 R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8 </TASK>
[3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/
Fixes: b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") Reported-by: syzbot syzkaller@googlegroups.com Reported-by: Christoph Paasch christophpaasch@icloud.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/caif/caif_socket.c | 1 + net/core/stream.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index 9d26c5e9da05..d35ea927ca8a 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -1020,6 +1020,7 @@ static void caif_sock_destructor(struct sock *sk) return; } sk_stream_kill_queues(&cf_sk->sk); + WARN_ON(sk->sk_forward_alloc); caif_free_client(&cf_sk->layer); }
diff --git a/net/core/stream.c b/net/core/stream.c index a166a32b411f..0e4ab373b2a6 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -202,7 +202,6 @@ void sk_stream_kill_queues(struct sock *sk) sk_mem_reclaim(sk);
WARN_ON(sk->sk_wmem_queued); - WARN_ON(sk->sk_forward_alloc);
/* It is _impossible_ for the backlog to contain anything * when we get here. All user references to this socket
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.2-rc1 commit 1651951ebea54970e0bda60c638fc2eee7a6218f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources.
DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and DCCPv6 socket shares it by calling the same init function via dccp_v6_init_sock().
To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we export it and set dccp_v6_sk_destruct() in the init function.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/dccp/dccp.h | 1 + net/dccp/ipv6.c | 15 ++++++++------- net/dccp/proto.c | 8 +++++++- net/ipv6/af_inet6.c | 1 + 4 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 5183e627468d..0218eb169891 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -283,6 +283,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb, int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct dccp_hdr *dh, const unsigned int len);
+void dccp_destruct_common(struct sock *sk); int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); void dccp_destroy_sock(struct sock *sk);
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 2be5c69824f9..0be808f38070 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -993,6 +993,12 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = { .sockaddr_len = sizeof(struct sockaddr_in6), };
+static void dccp_v6_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); + inet6_sock_destruct(sk); +} + /* NOTE: A lot of things set to zero explicitly by call to * sk_alloc() so need not be done here. */ @@ -1005,17 +1011,12 @@ static int dccp_v6_init_sock(struct sock *sk) if (unlikely(!dccp_v6_ctl_sock_initialized)) dccp_v6_ctl_sock_initialized = 1; inet_csk(sk)->icsk_af_ops = &dccp_ipv6_af_ops; + sk->sk_destruct = dccp_v6_sk_destruct; }
return err; }
-static void dccp_v6_destroy_sock(struct sock *sk) -{ - dccp_destroy_sock(sk); - inet6_destroy_sock(sk); -} - static struct timewait_sock_ops dccp6_timewait_sock_ops = { .twsk_obj_size = sizeof(struct dccp6_timewait_sock), }; @@ -1038,7 +1039,7 @@ static struct proto dccp_v6_prot = { .accept = inet_csk_accept, .get_port = inet_csk_get_port, .shutdown = dccp_shutdown, - .destroy = dccp_v6_destroy_sock, + .destroy = dccp_destroy_sock, .orphan_count = &dccp_orphan_count, .max_header = MAX_DCCP_HEADER, .obj_size = sizeof(struct dccp6_sock), diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 65e81e0199b0..e946211758c0 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -171,12 +171,18 @@ const char *dccp_packet_name(const int type)
EXPORT_SYMBOL_GPL(dccp_packet_name);
-static void dccp_sk_destruct(struct sock *sk) +void dccp_destruct_common(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk);
ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); dp->dccps_hc_tx_ccid = NULL; +} +EXPORT_SYMBOL_GPL(dccp_destruct_common); + +static void dccp_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); inet_sock_destruct(sk); }
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 4197c311ab7b..823f32c8a2a6 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -114,6 +114,7 @@ void inet6_sock_destruct(struct sock *sk) inet6_cleanup_sock(sk); inet_sock_destruct(sk); } +EXPORT_SYMBOL_GPL(inet6_sock_destruct);
static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern)
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.2-rc1 commit 6431b0f6ff1633ae598667e4cdd93830074a03e8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources.
SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and SCTPv6 socket reuses it as the init function.
To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we set sctp_v6_destruct_sock() in a new init function.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sctp/socket.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index e9b4ea3d934f..0f0def3b1082 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4991,13 +4991,17 @@ static void sctp_destroy_sock(struct sock *sk) }
/* Triggered when there are no references on the socket anymore */ -static void sctp_destruct_sock(struct sock *sk) +static void sctp_destruct_common(struct sock *sk) { struct sctp_sock *sp = sctp_sk(sk);
/* Free up the HMAC transform. */ crypto_free_shash(sp->hmac); +}
+static void sctp_destruct_sock(struct sock *sk) +{ + sctp_destruct_common(sk); inet_sock_destruct(sk); }
@@ -9191,7 +9195,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk, sctp_sk(newsk)->reuse = sp->reuse;
newsk->sk_shutdown = sk->sk_shutdown; - newsk->sk_destruct = sctp_destruct_sock; + newsk->sk_destruct = sk->sk_destruct; newsk->sk_family = sk->sk_family; newsk->sk_protocol = IPPROTO_SCTP; newsk->sk_backlog_rcv = sk->sk_prot->backlog_rcv; @@ -9423,11 +9427,20 @@ struct proto sctp_prot = {
#if IS_ENABLED(CONFIG_IPV6)
-#include <net/transp_v6.h> -static void sctp_v6_destroy_sock(struct sock *sk) +static void sctp_v6_destruct_sock(struct sock *sk) +{ + sctp_destruct_common(sk); + inet6_sock_destruct(sk); +} + +static int sctp_v6_init_sock(struct sock *sk) { - sctp_destroy_sock(sk); - inet6_destroy_sock(sk); + int ret = sctp_init_sock(sk); + + if (!ret) + sk->sk_destruct = sctp_v6_destruct_sock; + + return ret; }
struct proto sctpv6_prot = { @@ -9437,8 +9450,8 @@ struct proto sctpv6_prot = { .disconnect = sctp_disconnect, .accept = sctp_accept, .ioctl = sctp_ioctl, - .init = sctp_init_sock, - .destroy = sctp_v6_destroy_sock, + .init = sctp_v6_init_sock, + .destroy = sctp_destroy_sock, .shutdown = sctp_shutdown, .setsockopt = sctp_setsockopt, .getsockopt = sctp_getsockopt,