Offering: HULK hulk inclusion category: bugfix bugzilla: 189076, https://gitee.com/openeuler/kernel/issues/I76JSK
--------------------------------
While performing the io fault injection test, I caught the following data corruption report:
XFS (dm-6): Internal error ltbno + ltlen > bno at line 1976 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x40b/0x930 [xfs] CPU: 7 PID: 184267 Comm: kworker/7:1 Kdump: loaded Tainted: G O 5.10.0-136.12.0.86.h1179.eulerosv2r12.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20211121_093514-szxrtosci10000 04/01/2014 Workqueue: xfs-inodegc/dm-6 xfs_inodegc_worker [xfs] Call Trace: dump_stack+0x57/0x6e xfs_corruption_error+0x81/0x90 [xfs] xfs_free_ag_extent+0x43c/0x930 [xfs] xfs_free_agfl_block+0x3b/0xd0 [xfs] xfs_agfl_free_finish_item+0x14c/0x160 [xfs] xfs_defer_finish_one+0xd5/0x220 [xfs] xfs_defer_finish_noroll+0xb5/0x210 [xfs] xfs_defer_finish+0x11/0x70 [xfs] xfs_itruncate_extents_flags+0xc1/0x240 [xfs] xfs_inactive_truncate+0xab/0xe0 [xfs] xfs_inactive+0x154/0x170 [xfs] xfs_inodegc_inactivate+0x16/0x50 [xfs] xfs_inodegc_worker+0xa0/0x110 [xfs] process_one_work+0x1b5/0x350 worker_thread+0x49/0x310 kthread+0xfe/0x140 ret_from_fork+0x22/0x30 XFS (dm-6): Corruption detected. Unmount and run xfs_repair
After analyzing the disk image, I found that the cause of the problem was that the transactions were not replayed.
The problem arises in that the iclog buffer IO completion updates the l_last_sync_lsn with it's own LSN. Transactions can be large enough to span many iclogs, only commit iclog goes to update l_last_sync_lsn. Since the last_sync_lsn update and the insertion of the item into the ail list releases the l_icloglock, if the ail is is empty in the meantime, the new iclog gets the last_sync_lsn as tail lsn. If the new iclog is written to disk and a shutdown occurs, the current iclog will not be able to replay in the next mount.
xlog_state_done_syncing xlog_state_do_callback spin_lock(&log->l_icloglock); xlog_state_do_iclog_callbacks xlog_state_iodone_process_iclog xlog_state_set_callback last_sync_lsn = iclog->ic_header.h_lsn spin_unlock(&log->l_icloglock); ====>AIL is empty and get tail lsn for new iclog xlog_cil_process_committed xlog_cil_committed xfs_trans_committed_bulk(ctx->start_lsn) xfs_log_item_batch_insert(commit_lsn) xlog_state_clean_iclog(log, iclog) spin_lock(&log->l_icloglock); spin_unlock(&log->l_icloglock);
Fix is simple, updates the l_last_sync_lsn with it's first ctx start lsn when commit iclog buffer IO completion. Even if the above happens, the iclog will be replayed as well.
Signed-off-by: Long Li leo.lilong@huawei.com --- fs/xfs/xfs_log.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 059f0c61f94c..9a363f787ba4 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2684,19 +2684,22 @@ xlog_get_lowest_lsn( static void xlog_state_set_callback( struct xlog *log, - struct xlog_in_core *iclog, - xfs_lsn_t header_lsn) + struct xlog_in_core *iclog) { + struct xfs_cil_ctx *ctx; + trace_xlog_iclog_callback(iclog, _RET_IP_); iclog->ic_state = XLOG_STATE_CALLBACK;
- ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - header_lsn) <= 0); - - if (list_empty_careful(&iclog->ic_callbacks)) + ctx = list_first_entry_or_null(&iclog->ic_callbacks, + struct xfs_cil_ctx, iclog_entry); + if (!ctx) return;
- atomic64_set(&log->l_last_sync_lsn, header_lsn); + ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), + ctx->start_lsn) <= 0); + + atomic64_set(&log->l_last_sync_lsn, ctx->start_lsn); xlog_grant_push_ail(log, 0); }
@@ -2731,7 +2734,7 @@ xlog_state_iodone_process_iclog( lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) return false; - xlog_state_set_callback(log, iclog, header_lsn); + xlog_state_set_callback(log, iclog); return false; default: /*