his patch set fix some mics xfs issue:
Long Li (4): xfs: fix a UAF when inode item push xfs: fix a UAF in xfs_iflush_abort_clean xfs: don't verify agf length when log recovery xfs: xfs_trans_cancel() path must check for log shutdown
fs/xfs/libxfs/xfs_alloc.c | 3 ++- fs/xfs/xfs_buf.c | 5 +++++ fs/xfs/xfs_inode_item.c | 8 +++++++- fs/xfs/xfs_trans.c | 2 +- 4 files changed, 15 insertions(+), 3 deletions(-)
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LHTR CVE: NA
--------------------------------
KASAN reported a UAF bug while fault injection test:
================================================================== BUG: KASAN: use-after-free in xfs_inode_item_push+0x2db/0x2f0 Read of size 8 at addr ffff888022f74788 by task xfsaild/sda/479
CPU: 0 PID: 479 Comm: xfsaild/sda Not tainted 6.2.0-rc7-00003-ga8a43e2eb5f6 #89 Call Trace: <TASK> dump_stack_lvl+0x51/0x6a print_report+0x171/0x4a6 kasan_report+0xb7/0x130 xfs_inode_item_push+0x2db/0x2f0 xfsaild+0x729/0x1f70 kthread+0x290/0x340 ret_from_fork+0x1f/0x30 </TASK>
Allocated by task 494: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x58/0x70 kmem_cache_alloc+0x197/0x5d0 xfs_inode_item_init+0x62/0x170 xfs_trans_ijoin+0x15e/0x240 xfs_init_new_inode+0x573/0x1820 xfs_create+0x6a1/0x1020 xfs_generic_create+0x544/0x5d0 vfs_mkdir+0x5d0/0x980 do_mkdirat+0x14e/0x220 __x64_sys_mkdir+0x6a/0x80 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 14: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x40 __kasan_slab_free+0x114/0x1b0 kmem_cache_free+0xee/0x4e0 xfs_inode_free_callback+0x187/0x2a0 rcu_do_batch+0x317/0xce0 rcu_core+0x686/0xa90 __do_softirq+0x1b6/0x626
The buggy address belongs to the object at ffff888022f74758 which belongs to the cache xfs_ili of size 200 The buggy address is located 48 bytes inside of 200-byte region [ffff888022f74758, ffff888022f74820)
The buggy address belongs to the physical page: page:ffffea00008bdd00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22f74 head:ffffea00008bdd00 order:1 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffff888010ed4040 ffffea00008b2510 ffffea00008bde10 raw: 0000000000000000 00000000001a001a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888022f74680: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc ffff888022f74700: fc fc fc fc fc fc fc fc fc fc fc fa fb fb fb fb
ffff888022f74780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888022f74800: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888022f74880: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ==================================================================
When push inode item in xfsaild, it will race with reclaim inodes task. Consider the following call graph, both tasks deal with the same inode. During flushing the cluster, it will enter xfs_iflush_abort() in shutdown conditions, inode's XFS_IFLUSHING flag will be cleared and lip->li_buf set to null. Concurrently, inode will be reclaimed in shutdown conditions, there is no need to wait xfs buf lock because of lip->li_buf is null at this time, inode will be freed via rcu callback if xfsaild task schedule out during flushing the cluster. so, it is unsafe to reference lip after flushing the cluster in xfs_inode_item_push().
<log item is in AIL> <filesystem shutdown> spin_lock(&ailp->ail_lock) xfs_inode_item_push(lip) xfs_buf_trylock(bp) spin_unlock(&lip->li_ailp->ail_lock) xfs_iflush_cluster(bp) if (xfs_is_shutdown()) xfs_iflush_abort(ip) xfs_trans_ail_delete(ip) spin_lock(&ailp->ail_lock) spin_unlock(&ailp->ail_lock) xfs_iflush_abort_clean(ip) error = -EIO <log item removed from AIL> <log item li_buf set to null> if (error) xfs_force_shutdown() xlog_shutdown_wait(mp->m_log) might_sleep() xfs_reclaim_inode(ip) if (shutdown) xfs_iflush_shutdown_abort(ip) if (!bp) xfs_iflush_abort(ip) return __xfs_inode_free(ip) call_rcu(ip, xfs_inode_free_callback) ...... <rcu grace period expires> <rcu free callbacks run somewhere> xfs_inode_free_callback(ip) kmem_cache_free(ip->i_itemp) ...... <starts running again> xfs_buf_ioend_fail(bp); xfs_buf_ioend(bp) xfs_buf_relse(bp); return error spin_lock(&lip->li_ailp->ail_lock) <UAF on log item>
Fix the uaf by add XFS_ILOCK_SHARED lock in xfs_inode_item_push(), this prevents race conditions between inode item push and inode reclaim.
Fixes: 90c60e164012 ("xfs: xfs_iflush() is no longer necessary") Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_inode_item.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 127b2410eb20..c3897c5417f2 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -711,9 +711,14 @@ xfs_inode_item_push( if (xfs_iflags_test(ip, XFS_IFLUSHING)) return XFS_ITEM_FLUSHING;
- if (!xfs_buf_trylock(bp)) + if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) return XFS_ITEM_LOCKED;
+ if (!xfs_buf_trylock(bp)) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + return XFS_ITEM_LOCKED; + } + spin_unlock(&lip->li_ailp->ail_lock);
/* @@ -739,6 +744,7 @@ xfs_inode_item_push( }
spin_lock(&lip->li_ailp->ail_lock); + xfs_iunlock(ip, XFS_ILOCK_SHARED); return rval; }
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LHTR CVE: NA
--------------------------------
KASAN reported a UAF bug while fault injection test:
================================================================== BUG: KASAN: slab-use-after-free in __list_del_entry_valid_or_report+0x63/0x200 Read of size 8 at addr ffff888102aae0e8 by task kworker/2:1/48
CPU: 2 PID: 48 Comm: kworker/2:1 Not tainted 6.6.0-01347-g0684c49b89e7-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Workqueue: xfs-reclaim/sda xfs_reclaim_worker Call Trace: <TASK> dump_stack_lvl+0x7f/0xb0 print_report+0x12b/0x930 kasan_report+0xc2/0x120 __asan_load8+0x9d/0x140 __list_del_entry_valid_or_report+0x63/0x200 xfs_iflush_abort_clean+0x8e/0x100 xfs_iflush_abort+0xa0/0x170 xfs_iflush_shutdown_abort+0x17a/0x220 xfs_icwalk_ag+0xa4b/0xed0 xfs_icwalk+0x97/0xf0 xfs_reclaim_worker+0x25/0x40 process_scheduled_works+0x3a8/0x950 worker_thread+0x302/0x710 kthread+0x1f1/0x270 ret_from_fork+0x52/0x70 ret_from_fork_asm+0x11/0x20 </TASK>
Allocated by task 733: kasan_save_stack+0x2a/0x60 kasan_set_track+0x2d/0x40 kasan_save_alloc_info+0x23/0x40 __kasan_slab_alloc+0x92/0xb0 kmem_cache_alloc+0x229/0xaf0 _xfs_buf_alloc+0x55/0x600 xfs_buf_get_map+0xc29/0x1b30 xfs_trans_get_buf_map+0x1bd/0x4b0 xfs_ialloc_inode_init+0x2e6/0x690 xfs_ialloc_ag_alloc+0x3ae/0xb00 xfs_dialloc+0x786/0xda0 xfs_create+0x4b7/0xaf0 xfs_generic_create+0x538/0x6f0 xfs_vn_create+0x1f/0x30 path_openat+0x13e2/0x2150 do_filp_open+0x16a/0x240 do_sys_openat2+0x3be/0x4e0 do_sys_open+0xa6/0x100 __x64_sys_open+0x52/0x60 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 0: kasan_save_stack+0x2a/0x60 kasan_set_track+0x2d/0x40 kasan_save_free_info+0x33/0x60 __kasan_slab_free+0x171/0x2b0 kmem_cache_free+0x313/0x8d0 xfs_buf_free_callback+0x64/0x80 rcu_do_batch+0x393/0xb10 rcu_core+0x599/0x930 rcu_core_si+0x16/0x20 __do_softirq+0x127/0x5a0
Last potentially related work creation: kasan_save_stack+0x2a/0x60 __kasan_record_aux_stack+0xba/0x100 kasan_record_aux_stack_noalloc+0x13/0x20 __call_rcu_common.constprop.0+0xb2/0xdd0 call_rcu+0x16/0x20 xfs_buf_free+0x70/0x1d0 xfs_buf_rele+0x44d/0xc00 xfs_buf_ioend+0x2f2/0x1350 xfs_buf_ioend_fail+0x77/0x190 __xfs_buf_submit+0x4cd/0x4e0 xfs_buf_delwri_submit_buffers+0x241/0x8b0 xfs_buf_delwri_submit_nowait+0x18/0x30 xfsaild+0x861/0x1720 kthread+0x1f1/0x270 ret_from_fork+0x52/0x70 ret_from_fork_asm+0x11/0x20
Second to last potentially related work creation: kasan_save_stack+0x2a/0x60 __kasan_record_aux_stack+0xba/0x100 kasan_record_aux_stack_noalloc+0x13/0x20 insert_work+0x2d/0x160 __queue_work+0x7b1/0x9d0 queue_work_on+0x91/0xa0 xfs_buf_bio_end_io+0x191/0x1a0 bio_endio+0x403/0x440 blk_update_request+0x228/0xa90 scsi_end_request+0x59/0x310 scsi_io_completion+0xec/0xbb0 scsi_finish_command+0x18d/0x2b0 scsi_complete+0xd2/0x1f0 blk_complete_reqs+0x9e/0xc0 blk_done_softirq+0x25/0x30 __do_softirq+0x127/0x5a0
The buggy address belongs to the object at ffff888102aae000 which belongs to the cache xfs_buf of size 376 The buggy address is located 232 bytes inside of freed 376-byte region [ffff888102aae000, ffff888102aae178)
The buggy address belongs to the physical page: page:000000005e670ef6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x102aae head:000000005e670ef6 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x2fffff80000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 002fffff80000840 ffff8881009c32c0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080120012 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888102aadf80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888102aae000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888102aae080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888102aae100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc ffff888102aae180: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ================================================================== This is a low probability problem, it tooks me a long time to find the process that the problem occurred:
1. When creating a new file, if there are no free inodes, we need to allocate a new chunk. The buf item and inode items associated with inode will be submitted to CIL independently. If all goes well, both the buf item and the inode item will be inserted into the AIL, and the buf item will be in front of the inode item.
2. At the first time, xfsaild only pushed buf item. If an error occurs while writing back the inode buffer, the inode item will be set XFS_LI_FAILED in xfs_buf_inode_io_fail() when buf io end, and the buf item will remain in the AIL.
3. At the second time, xfsaild only pushed buf item again, while writing back the inode buffer and the log has shut down, the inode buffer will be set XBF_STALE and the buf item is removed from AIL when buf io end. Because of inode is not flushed, ili_last_fields in xfs_inode is still 0, so inode item will left in AIL.
4. Concurrently, a new transaction log inode that in the same cluster as the previous inode, it will get the same inode buffer in xfs_buf_find(), _XBF_INODES flag will be cleared in xfs_buf_find() due to buffer is staled.
5. At the third time, xfsaild push the inode item that has marked XFS_LI_FAILED, AIL will resubmit the inode item in xfsaild_resubmit_item(). It will go to the wrong code path due to inode buffer missing _XBF_INODES flag, all inode items that in bp->b_li_list will be reduced the references to buffer, and inode item's li_buf set to null, but inode item still in bp->b_li_list. After all reference count decreasing the inode buffer will be freed.
6. When xfs reclaim inode, remove inode item from bp->b_li_list will cause a uaf xfs_iflush_abort_clean().
Fix it by add xfs shutdown condition check in xfs_buf_find_lock(), if it has been shutdown, it is useless to get the buffer. While the inode item is still reference to the inode buffer, the _XBF_INODES flag will not be missing.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_buf.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index c1ece4a08ff4..34f8fcf72e33 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -554,6 +554,11 @@ xfs_buf_find_lock( XFS_STATS_INC(bp->b_mount, xb_get_locked_waited); }
+ if (xlog_is_shutdown(bp->b_mount->m_log)) { + xfs_buf_unlock(bp); + return -EIO; + } + /* * if the buffer is stale, clear all the external state associated with * it. We need to keep flags such as how we allocated the buffer memory
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LHTR CVE: NA
--------------------------------
The following corruption is reported when log recovery:
XFS (loop0): Metadata corruption detected at xfs_agf_verify+0x386/0xb70, xfs_agf block 0x64001 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 41 47 46 00 00 00 01 00 00 00 02 00 00 4b 88 XAGF..........K. 00000010: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 01 ................ 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ 00000030: 00 00 00 04 00 00 4b 7e 00 00 4b 7e 00 00 00 00 ......K~..K~.... 00000040: 3a 9d 97 6d b5 a0 42 13 a3 b3 7f 28 2a ac 3f e8 :..m..B....(*.?. 00000050: 00 00 00 00 00 00 00 01 00 00 00 05 00 00 00 01 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0xbc9/0xe60 (fs/xfs/xfs_buf.c:1593). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed
This problem occurs during agf write verify, where the AG's agf_length is smaller than sb_agblocks, and it is not the last AG. Consider the following situation, the file system has 3 AG, and the last AG is not full size, now the file system is growfs twice. In the first growfs, only AG2 is increased some blocks. In the second growfs, AG2 is extent to the full AG size and an AG3 is added.
pre growfs:
|----------|----------|-----|
First growfs:
|----------|----------|--------|
Second growfs:
|----------|----------|----------|-------| AG0 AG1 AG2 AG3
During each growfs, agf in AG2 and sb will be modified, if sb has already written to metadata and agf has not yet, then a shutdown occurs. The next time we mount the disk, the sb changes from the second growfs will have taken effect in memory, when we recovery the agf from the first growfs will report the above problem.
Fixes: edd8276dd702 ("xfs: AGF length has never been bounds checked") Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/libxfs/xfs_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 3069194527dd..0246dc3e8bab 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2980,7 +2980,8 @@ xfs_validate_ag_length( * Only the last AG in the filesystem is allowed to be shorter * than the AG size recorded in the superblock. */ - if (length != mp->m_sb.sb_agblocks) { + if (length != mp->m_sb.sb_agblocks && + !(bp->b_flags & _XBF_LOGRECOVERY)) { /* * During growfs, the new last AG can get here before we * have updated the superblock. Give it a pass on the seqno
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8LHTR CVE: NA
--------------------------------
The following error occurred when do IO fault injection test:
XFS: Assertion failed: xlog_is_shutdown(lip->li_log), file: fs/xfs/xfs_inode_item.c, line: 748
commit "3c4cb76bce43 xfs: xfs_trans_commit() path must check for log shutdown" fix a problem that dirty transaction was canceled before log shutdown, because of the log is still running, it result dirty and unlogged inode item that isn't in the AIL in memory that can be flushed to disk via writeback clustering.
xfs_trans_cancel() has the same problem, if a shut down races with xfs_trans_cancel() and we have shut down the filesystem but not the log, we will still cancel the transaction before log shutdown. So xfs_trans_cancel() needs to check log state for shutdown, not mount.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 8c0bfc9a33b1..29f6ac670135 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -1100,7 +1100,7 @@ xfs_trans_cancel( * progress, so we only need to check against the mount shutdown state * here. */ - if (dirty && !xfs_is_shutdown(mp)) { + if (dirty && !(xfs_is_shutdown(mp) && xlog_is_shutdown(log))) { XFS_ERROR_REPORT("xfs_trans_cancel", XFS_ERRLEVEL_LOW, mp); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); }
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/3363 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/3363 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Y...