From: Zhihao Cheng chengzhihao1@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C5HV CVE: NA
Reference: https://lore.kernel.org/lkml/20230110015327.1181863-1-chengzhihao1@huawei.co...
--------------------------------
Following process will make data lost and could lead to a filesystem corrupted problem:
1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and jh->b_transaction = NULL 2. T1 is added into journal->j_checkpoint_transactions. 3. Get bh prepare to write while doing checkpoing: PA PB do_get_write_access jbd2_log_do_checkpoint spin_lock(&jh->b_state_lock) if (buffer_dirty(bh)) clear_buffer_dirty(bh) // clear buffer dirty set_buffer_jbddirty(bh) transaction = journal->j_checkpoint_transactions jh = transaction->t_checkpoint_list if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) // bh won't be flushed jbd2_cleanup_journal_tail __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved) 4. Aborting journal/Power-cut before writing latest bh on journal area.
In this way we get a corrupted filesystem with bh's data lost.
Fix it by moving the clearing of buffer_dirty bit just before the call to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction assignment are under journal->j_list_lock locked, so that jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed even bh is currently not dirty. And journal_shrink_one_cp_list() won't remove jh from checkpoint list if the buffer head is reused in do_get_write_access().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898 Cc: stable@kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: zhanchengbin zhanchengbin1@huawei.com Suggested-by: Jan Kara jack@suse.cz Conflicts: fs/jbd2/transaction.c [ 464170647b5648bb8("jbd2: Make state lock a spinlock") is not applied. ] Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/jbd2/transaction.c | 50 +++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 3b31bd1f7b77..149190c2ac89 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -935,36 +935,28 @@ do_get_write_access(handle_t *handle, struct journal_head *jh, * ie. locked but not dirty) or tune2fs (which may actually have * the buffer dirtied, ugh.) */
- if (buffer_dirty(bh)) { + if (buffer_dirty(bh) && jh->b_transaction) { + warn_dirty_buffer(bh); /* - * First question: is this buffer already part of the current - * transaction or the existing committing transaction? - */ - if (jh->b_transaction) { - J_ASSERT_JH(jh, - jh->b_transaction == transaction || - jh->b_transaction == - journal->j_committing_transaction); - if (jh->b_next_transaction) - J_ASSERT_JH(jh, jh->b_next_transaction == - transaction); - warn_dirty_buffer(bh); - } - /* - * In any case we need to clean the dirty flag and we must - * do it under the buffer lock to be sure we don't race - * with running write-out. + * We need to clean the dirty flag and we must do it under the + * buffer lock to be sure we don't race with running write-out. */ JBUFFER_TRACE(jh, "Journalling dirty buffer"); clear_buffer_dirty(bh); + /* + * The buffer is going to be added to BJ_Reserved list now and + * nothing guarantees jbd2_journal_dirty_metadata() will be + * ever called for it. So we need to set jbddirty bit here to + * make sure the buffer is dirtied and written out when the + * journaling machinery is done with it. + */ set_buffer_jbddirty(bh); }
- unlock_buffer(bh); - error = -EROFS; if (is_handle_aborted(handle)) { jbd_unlock_bh_state(bh); + unlock_buffer(bh); goto out; } error = 0; @@ -974,8 +966,10 @@ do_get_write_access(handle_t *handle, struct journal_head *jh, * b_next_transaction points to it */ if (jh->b_transaction == transaction || - jh->b_next_transaction == transaction) + jh->b_next_transaction == transaction) { + unlock_buffer(bh); goto done; + }
/* * this is the first time this transaction is touching this buffer, @@ -999,10 +993,24 @@ do_get_write_access(handle_t *handle, struct journal_head *jh, */ smp_wmb(); spin_lock(&journal->j_list_lock); + if (test_clear_buffer_dirty(bh)) { + /* + * Execute buffer dirty clearing and jh->b_transaction + * assignment under journal->j_list_lock locked to + * prevent bh being removed from checkpoint list if + * the buffer is in an intermediate state (not dirty + * and jh->b_transaction is NULL). + */ + JBUFFER_TRACE(jh, "Journalling dirty buffer"); + set_buffer_jbddirty(bh); + } __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved); spin_unlock(&journal->j_list_lock); + unlock_buffer(bh); goto done; } + unlock_buffer(bh); + /* * If there is already a copy-out version of this buffer, then we don't * need to make another one
From: Yipeng Zou zouyipeng@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6BO2R CVE: NA
--------------------------------
The printk has the risk of deadlocks in the context of an interrupt.
Introduce printk safe to avoid it.
Fixes: bdc370d64dc4 ("genirq: Introduce warn log when irq be reentrant") Signed-off-by: Yipeng Zou zouyipeng@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/irq/chip.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 39bed3e4da3c..6d5c1fe792b5 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -513,9 +513,19 @@ static bool irq_may_run(struct irq_desc *desc) if (!irqd_has_set(&desc->irq_data, mask)) return true;
- if (irqd_get(&desc->irq_data) & IRQD_IRQ_INPROGRESS) - pr_warn_ratelimited("irq %u(%lu) may be reentrant in multiple cpus.\n", - desc->irq_data.irq, desc->irq_data.hwirq); + if (irqd_get(&desc->irq_data) & IRQD_IRQ_INPROGRESS) { + const char *name = NULL; + + if (desc->name) + name = desc->name; + else if (desc->action) + name = desc->action->name; + + printk_safe_enter(); + pr_warn("irq %u(%s) may be reentrant in multiple cpus.\n", + desc->irq_data.irq, name == NULL ? "NULL" : name); + printk_safe_exit(); + }
/* * If the interrupt is an armed wakeup source, mark it pending
From: Li Nan linan122@huawei.com
hulk inclusion category: bugfix bugzilla: 188227, https://gitee.com/openeuler/kernel/issues/I6AG8P CVE: NA
--------------------------------
There is no lock protection in md_wakeup_thread() and sync_thread might be freed during wake up as below. Use pers_lock to protect it.
T1 T2 md_start_sync md_register_thread md_wakeup_thread raid1d md_check_recovery md_reap_sync_thread md_unregister_thread kfree wake_up ->sync_thread was freed
Fixes: fac05f256691f ("md: don't start resync thread directly from md thread") Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 2f99f47d5e61..8b55566bc3a0 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8854,11 +8854,10 @@ static int remove_and_add_spares(struct mddev *mddev, static void md_start_sync(struct work_struct *ws) { struct mddev *mddev = container_of(ws, struct mddev, del_work); + struct md_thread *sync_thread;
- mddev->sync_thread = md_register_thread(md_do_sync, - mddev, - "resync"); - if (!mddev->sync_thread) { + sync_thread = md_register_thread(md_do_sync, mddev, "resync"); + if (!sync_thread) { pr_warn("%s: could not start resync thread...\n", mdname(mddev)); /* leave the spares where they are, it shouldn't hurt */ @@ -8872,8 +8871,12 @@ static void md_start_sync(struct work_struct *ws) &mddev->recovery)) if (mddev->sysfs_action) sysfs_notify_dirent_safe(mddev->sysfs_action); - } else + } else { + spin_lock(&pers_lock); + mddev->sync_thread = sync_thread; md_wakeup_thread(mddev->sync_thread); + spin_unlock(&pers_lock); + } sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(mddev); }