[PATCH OLK-6.6] xfs: fix possible bugon in xfs_trans_unreserve_and_mod_sb()
From: Ye Bin <yebin10@huawei.com> hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9228 CVE: NA -------------------------------- Recently, I encountered a problem where a BUG was triggered in the write-back process. The detailed problem information is as follows: xfs_bmap_extents_to_btree: ip=0xffff888148ecad00 wasdel=0 sde: writeback error on inode 68, offset 61440, sector 1400 XFS (sde): Corruption of in-memory data (0x8) detected at xfs_trans_mod_sb+0xaa6/0xc60 (fs/xfs/xfs_trans.c:351). Shutting. XFS (sde): Please unmount the filesystem and rectify the problem(s) XFS: Assertion failed: tp->t_blk_res || tp->t_fdblocks_delta >= 0, file: fs/xfs/xfs_trans.c, line: 610 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 5 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 7.0.0-rc6-next-20260402-00028-g56f243e5f8ea-dirty #360 P Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: writeback wb_workfn (flush-8:64) RIP: 0010:assfail+0x9f/0xb0 Code: fe 84 db 75 20 e8 41 2e 33 fe 0f 0b 5b 5d 41 5c 41 5d e9 94 34 a5 06 48 c7 c7 58 ae 2b 8d e8 f8 72 a0 RSP: 0018:ffffc900000dedd0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff838c91b6 RDX: ffff88810425d880 RSI: ffffffff838c91df RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10e3b14901 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8a956520 R13: 0000000000000262 R14: 0000000000000000 R15: ffffffffffffffff FS: 0000000000000000(0000) GS:ffff88878bdc5000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fccb3c788f0 CR3: 000000000c98a000 CR4: 00000000000006f0 Call Trace: <TASK> xfs_trans_unreserve_and_mod_sb+0xb86/0xd00 __xfs_trans_commit+0x38b/0xe00 xfs_trans_commit+0xeb/0x1a0 xfs_bmapi_convert_one_delalloc+0xba9/0x12d0 xfs_bmapi_convert_delalloc+0x101/0x350 xfs_writeback_range+0x76c/0x12d0 iomap_writeback_folio+0x9ed/0x2100 iomap_writepages+0x13c/0x2a0 xfs_vm_writepages+0x278/0x330 do_writepages+0x247/0x5c0 __writeback_single_inode+0x123/0x1370 writeback_sb_inodes+0x71e/0x1b90 __writeback_inodes_wb+0xc3/0x280 wb_writeback+0x730/0xb80 wb_workfn+0x8b0/0xbc0 process_one_work+0xa08/0x1d00 worker_thread+0x698/0xeb0 kthread+0x408/0x540 ret_from_fork+0xa4d/0xdd0 ret_from_fork_asm+0x1a/0x30 </TASK> After analyzing the above issues, the possible triggering process is as follows: xfs_bmapi_convert_delalloc xfs_bmapi_convert_one_delalloc xfs_bmapi_allocate xfs_bmap_add_extent_delay_real da_old = startblockval(PREV.br_startblock); // da_old = 5 case BMAP_LEFT_FILLING: ifp->if_nextents++; // 21 + 1 = 22 if (xfs_bmap_needs_btree(bma->ip, whichfork)) // 22 > 21 xfs_bmap_extents_to_btree // convert to btree cur->bc_ino.allocated++; // da_new = 5 - 1 = 4 da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock) - (bma->cur ? bma->cur->bc_ino.allocated : 0)) //xfs_bmapi_convert_one_delalloc() return PREV.br_startblock = nullstartblock(da_new); xfs_bmap_del_extent_real case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: ifp->if_nextents--; // 22 - 1 = 21 if (xfs_bmap_needs_btree(ip, whichfork)) xfs_bmap_extents_to_btree(); else // convert to extents xfs_bmap_btree_to_extents(); ... // Alternate a few times in the middle. da_old = 4 da_old = 3 da_old = 2 da_old = 1 ... xfs_bmapi_convert_delalloc xfs_bmapi_convert_one_delalloc // Both blocks and rtextents are 0 error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, XFS_TRANS_RESERVE, &tp); tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL); error = xfs_trans_reserve(tp, resp, blocks, rtextents); if (blocks > 0) error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); // The value of blocks is 0, so the value of tp->t_blk_res is 0 tp->t_blk_res += blocks; xfs_bmapi_allocate xfs_bmap_add_extent_delay_real da_old = startblockval(PREV.br_startblock); // da_old = 0 // The current delay extent is just exhausted. case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING ifp->if_nextents++; // 21 + 1 + 22 if (xfs_bmap_needs_btree(bma->ip, whichfork)) // 22 > 21 // Converted to btree. da_old > 0 is false. error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, da_old > 0, &tmp_logflags, whichfork); args.wasdel = wasdel; // wasdel is false error = xfs_alloc_vextent_start_ag(&args, XFS_INO_TO_FSB(mp, ip->i_ino)); xfs_alloc_vextent_finish(args, minimum_agno, error, true); xfs_ag_resv_alloc_extent(args->pag, args->resv, args); case XFS_AG_RESV_NONE: field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS : XFS_TRANS_SB_FDBLOCKS; //args->wasdel == false xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len); case XFS_TRANS_SB_FDBLOCKS: if (delta < 0) tp->t_blk_res_used += (uint)-delta; if (tp->t_blk_res_used > tp->t_blk_res) // ***tp->t_blk_res is 0, thus triggering xfs_force_shutdown()*** xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); The logic that triggers the issue above was designed by me to facilitate the construction of the problem. Besides the scenario where BTREE and EXTENTS are converted back and forth, there is also the scenario of btree splitting. The core reason for the issue is that in xfs_bmapi_convert_delalloc(), the call to xfs_bmap_worst_indlen() calculates the worst-case number of reserved blocks, which is the number of additional blocks required after a complete conversion of the entire delayed extent. It assumes that the entire conversion process is atomic. However, the current process cannot guarantee such atomicity. In the case of a fragmented filesystem, the most extreme scenario is that every block conversion triggers a full btree split, in which case the reserved blocks are far from sufficient. When this issue is triggered, the filesystem fragmentation in the environment is indeed quite severe. Further analysis of this abnormal model shows that because the reserved blocks are continuously consumed, they may eventually exceed the reserved amount. When the space is nearly exhausted, xfs_bmap_extents_to_btree() may fail to allocate blocks, triggering a warning. This failure to allocate additional blocks can lead to issues with normal block allocation. Since a single delay extent cannot guarantee a one-time completion of the conversion, the 'inlen' of the delay extent should be maintained at the value calculated by xfs_bmap_worst_indlen(). Commit d69bee6a35d3 ("xfs: fix xfs_bmap_add_extent_delay_real for partial conversions") addressed the issue of potentially not reserving enough space in emergency situations. Based on this modification, we can recalculate the worst-case 'inlen' required for the remaining delay extents after the conversion in xfs_bmap_add_extent_delay_real(), instead of using the remaining value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_bmap.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 9ed22d82c000..3d655c1bef7d 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1656,8 +1656,7 @@ xfs_bmap_add_extent_delay_real( */ old = LEFT; temp = PREV.br_blockcount - new->br_blockcount; - da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), - startblockval(PREV.br_startblock)); + da_new = xfs_bmap_worst_indlen(bma->ip, temp); LEFT.br_blockcount += new->br_blockcount; @@ -1724,9 +1723,7 @@ xfs_bmap_add_extent_delay_real( } temp = PREV.br_blockcount - new->br_blockcount; - da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), - startblockval(PREV.br_startblock) - - (bma->cur ? bma->cur->bc_ino.allocated : 0)); + da_new = xfs_bmap_worst_indlen(bma->ip, temp); PREV.br_startoff = new_endoff; PREV.br_blockcount = temp; @@ -1763,8 +1760,7 @@ xfs_bmap_add_extent_delay_real( } temp = PREV.br_blockcount - new->br_blockcount; - da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), - startblockval(PREV.br_startblock)); + da_new = xfs_bmap_worst_indlen(bma->ip, temp); PREV.br_blockcount = temp; PREV.br_startblock = nullstartblock(da_new); @@ -1812,9 +1808,7 @@ xfs_bmap_add_extent_delay_real( } temp = PREV.br_blockcount - new->br_blockcount; - da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), - startblockval(PREV.br_startblock) - - (bma->cur ? bma->cur->bc_ino.allocated : 0)); + da_new = xfs_bmap_worst_indlen(bma->ip, temp); PREV.br_startblock = nullstartblock(da_new); PREV.br_blockcount = temp; -- 2.52.0
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/23477 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/62Y... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/23477 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/62Y...
participants (2)
-
Long Li -
patchwork bot