From: Josef Bacik <josef(a)toxicpanda.com>
stable inclusion
from stable-v5.15.26
commit 6599d5e8bd758d897fd2ef4dc388ae50278b1f7e
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IALPSL
CVE: CVE-2022-48901
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
--------------------------------
commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.
We hit a bug with a recovering relocation on mount for one of our file
systems in production. I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time. This
presented as an error while looking up an extent item
WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
RIP: 0010:lookup_inline_extent_backref+0x647/0x680
RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
Call Trace:
<TASK>
insert_inline_extent_backref+0x46/0xd0
__btrfs_inc_extent_ref.isra.0+0x5f/0x200
? btrfs_merge_delayed_refs+0x164/0x190
__btrfs_run_delayed_refs+0x561/0xfa0
? btrfs_search_slot+0x7b4/0xb30
? btrfs_update_root+0x1a9/0x2c0
btrfs_run_delayed_refs+0x73/0x1f0
? btrfs_update_root+0x1a9/0x2c0
btrfs_commit_transaction+0x50/0xa50
? btrfs_update_reloc_root+0x122/0x220
prepare_to_merge+0x29f/0x320
relocate_block_group+0x2b8/0x550
btrfs_relocate_block_group+0x1a6/0x350
btrfs_relocate_chunk+0x27/0xe0
btrfs_balance+0x777/0xe60
balance_kthread+0x35/0x50
? btrfs_balance+0xe60/0xe60
kthread+0x16b/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex. However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.
Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.
Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key. If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info. Then
balance can wait until this flag is cleared to start up again.
If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Conflicts:
fs/btrfs/disk-io.c
fs/btrfs/ctree.h
fs/btrfs/transaction.h
[conflicts due to 8cd2908846d1 ("btrfs: clear oneshot options on mount
and remount") not mergered]
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/btrfs/ctree.h | 10 ++++++++++
fs/btrfs/disk-io.c | 10 ++++++++++
fs/btrfs/extent-tree.c | 10 ++++++++++
fs/btrfs/relocation.c | 13 +++++++++++++
fs/btrfs/root-tree.c | 15 +++++++++++++++
fs/btrfs/transaction.c | 33 ++++++++++++++++++++++++++++++++-
fs/btrfs/transaction.h | 1 +
7 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 84ac3642023f..90b46ac13003 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -567,6 +567,9 @@ enum {
/* Indicate that we can't trust the free space tree for caching yet */
BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
+
+ /* Indicate we have half completed snapshot deletions pending. */
+ BTRFS_FS_UNFINISHED_DROPS,
};
/*
@@ -1033,8 +1036,15 @@ enum {
BTRFS_ROOT_HAS_LOG_TREE,
/* Qgroup flushing is in progress */
BTRFS_ROOT_QGROUP_FLUSHING,
+ /* This root has a drop operation that was started previously. */
+ BTRFS_ROOT_UNFINISHED_DROP,
};
+static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+}
+
/*
* Record swapped tree blocks of a subvolume tree for delayed subtree trace
* code. For detail check comment in fs/btrfs/qgroup.c.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b9a3bdacdede..05bc52003e19 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3427,6 +3427,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
}
set_bit(BTRFS_FS_OPEN, &fs_info->flags);
+ /* Kick the cleaner thread so it'll start deleting snapshots. */
+ if (test_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags))
+ wake_up_process(fs_info->cleaner_kthread);
+
/*
* backuproot only affect mount behavior, and if open_ctree succeeded,
* no need to keep the flag
@@ -4092,6 +4096,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
kthread_park(fs_info->cleaner_kthread);
+ /*
+ * If we had UNFINISHED_DROPS we could still be processing them, so
+ * clear that bit and wake up relocation so it can stop.
+ */
+ btrfs_wake_unfinished_drop(fs_info);
+
/* wait for the qgroup rescan worker to stop */
btrfs_qgroup_wait_for_completion(fs_info, false);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 284294620e9f..578b217bc4ea 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5353,6 +5353,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
int ret;
int level;
bool root_dropped = false;
+ bool unfinished_drop = false;
btrfs_debug(fs_info, "Drop subvolume %llu", root->root_key.objectid);
@@ -5395,6 +5396,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
* already dropped.
*/
set_bit(BTRFS_ROOT_DELETING, &root->state);
+ unfinished_drop = test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+
if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
level = btrfs_header_level(root->node);
path->nodes[level] = btrfs_lock_root_node(root);
@@ -5571,6 +5574,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
kfree(wc);
btrfs_free_path(path);
out:
+ /*
+ * We were an unfinished drop root, check to see if there are any
+ * pending, and if not clear and wake up any waiters.
+ */
+ if (!err && unfinished_drop)
+ btrfs_maybe_wake_unfinished_drop(fs_info);
+
/*
* So if we need to stop dropping the snapshot for whatever reason we
* need to make sure to add it back to the dead root list so that we
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c21545c5b34b..cf7a8fcea4d6 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3595,6 +3595,19 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
int rw = 0;
int err = 0;
+ /*
+ * This only gets set if we had a half-deleted snapshot on mount. We
+ * cannot allow relocation to start while we're still trying to clean up
+ * these pending deletions.
+ */
+ ret = wait_on_bit(&fs_info->flags, BTRFS_FS_UNFINISHED_DROPS, TASK_INTERRUPTIBLE);
+ if (ret)
+ return ret;
+
+ /* We may have been woken up by close_ctree, so bail if we're closing. */
+ if (btrfs_fs_closing(fs_info))
+ return -EINTR;
+
bg = btrfs_lookup_block_group(fs_info, group_start);
if (!bg)
return -ENOENT;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index cd40e7a92552..7bdaccb58ec0 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -280,6 +280,21 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info)
WARN_ON(!test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state));
if (btrfs_root_refs(&root->root_item) == 0) {
+ struct btrfs_key drop_key;
+
+ btrfs_disk_key_to_cpu(&drop_key, &root->root_item.drop_progress);
+ /*
+ * If we have a non-zero drop_progress then we know we
+ * made it partly through deleting this snapshot, and
+ * thus we need to make sure we block any balance from
+ * happening until this snapshot is completely dropped.
+ */
+ if (drop_key.objectid != 0 || drop_key.type != 0 ||
+ drop_key.offset != 0) {
+ set_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+ set_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+ }
+
set_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
btrfs_add_dead_root(root);
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a6ddf91d5c40..cf7e8a20bb02 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1294,6 +1294,32 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
return 0;
}
+/*
+ * If we had a pending drop we need to see if there are any others left in our
+ * dead roots list, and if not clear our bit and wake any waiters.
+ */
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ /*
+ * We put the drop in progress roots at the front of the list, so if the
+ * first entry doesn't have UNFINISHED_DROP set we can wake everybody
+ * up.
+ */
+ spin_lock(&fs_info->trans_lock);
+ if (!list_empty(&fs_info->dead_roots)) {
+ struct btrfs_root *root = list_first_entry(&fs_info->dead_roots,
+ struct btrfs_root,
+ root_list);
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state)) {
+ spin_unlock(&fs_info->trans_lock);
+ return;
+ }
+ }
+ spin_unlock(&fs_info->trans_lock);
+
+ btrfs_wake_unfinished_drop(fs_info);
+}
+
/*
* dead roots are old snapshots that need to be deleted. This allocates
* a dirty root struct and adds it into the list of dead roots that need to
@@ -1306,7 +1332,12 @@ void btrfs_add_dead_root(struct btrfs_root *root)
spin_lock(&fs_info->trans_lock);
if (list_empty(&root->root_list)) {
btrfs_grab_root(root);
- list_add_tail(&root->root_list, &fs_info->dead_roots);
+
+ /* We want to process the partially complete drops first. */
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state))
+ list_add(&root->root_list, &fs_info->dead_roots);
+ else
+ list_add_tail(&root->root_list, &fs_info->dead_roots);
}
spin_unlock(&fs_info->trans_lock);
}
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index eb26eb068fe8..0599e3994633 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -216,6 +216,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid);
void btrfs_add_dead_root(struct btrfs_root *root);
int btrfs_defrag_root(struct btrfs_root *root);
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info);
int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
--
2.39.2
From: Josef Bacik <josef(a)toxicpanda.com>
stable inclusion
from stable-v5.15.26
commit 6599d5e8bd758d897fd2ef4dc388ae50278b1f7e
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IALPSL
CVE: CVE-2022-48901
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
--------------------------------
commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.
We hit a bug with a recovering relocation on mount for one of our file
systems in production. I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time. This
presented as an error while looking up an extent item
WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
RIP: 0010:lookup_inline_extent_backref+0x647/0x680
RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
Call Trace:
<TASK>
insert_inline_extent_backref+0x46/0xd0
__btrfs_inc_extent_ref.isra.0+0x5f/0x200
? btrfs_merge_delayed_refs+0x164/0x190
__btrfs_run_delayed_refs+0x561/0xfa0
? btrfs_search_slot+0x7b4/0xb30
? btrfs_update_root+0x1a9/0x2c0
btrfs_run_delayed_refs+0x73/0x1f0
? btrfs_update_root+0x1a9/0x2c0
btrfs_commit_transaction+0x50/0xa50
? btrfs_update_reloc_root+0x122/0x220
prepare_to_merge+0x29f/0x320
relocate_block_group+0x2b8/0x550
btrfs_relocate_block_group+0x1a6/0x350
btrfs_relocate_chunk+0x27/0xe0
btrfs_balance+0x777/0xe60
balance_kthread+0x35/0x50
? btrfs_balance+0xe60/0xe60
kthread+0x16b/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex. However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.
Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.
Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key. If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info. Then
balance can wait until this flag is cleared to start up again.
If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Conflicts:
fs/btrfs/disk-io.c
fs/btrfs/ctree.h
fs/btrfs/transaction.h
[conflicts due to 8cd2908846d1 ("btrfs: clear oneshot options on mount
and remount") not mergered]
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/btrfs/ctree.h | 10 ++++++++++
fs/btrfs/disk-io.c | 10 ++++++++++
fs/btrfs/extent-tree.c | 10 ++++++++++
fs/btrfs/relocation.c | 13 +++++++++++++
fs/btrfs/root-tree.c | 15 +++++++++++++++
fs/btrfs/transaction.c | 33 ++++++++++++++++++++++++++++++++-
fs/btrfs/transaction.h | 1 +
7 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 482842ac397b..abc1e72a4464 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -565,6 +565,9 @@ enum {
/* Indicate that we can't trust the free space tree for caching yet */
BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
+
+ /* Indicate we have half completed snapshot deletions pending. */
+ BTRFS_FS_UNFINISHED_DROPS,
};
/*
@@ -1031,8 +1034,15 @@ enum {
BTRFS_ROOT_HAS_LOG_TREE,
/* Qgroup flushing is in progress */
BTRFS_ROOT_QGROUP_FLUSHING,
+ /* This root has a drop operation that was started previously. */
+ BTRFS_ROOT_UNFINISHED_DROP,
};
+static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+}
+
/*
* Record swapped tree blocks of a subvolume tree for delayed subtree trace
* code. For detail check comment in fs/btrfs/qgroup.c.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 707344288b1c..b923011dec85 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3444,6 +3444,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
}
set_bit(BTRFS_FS_OPEN, &fs_info->flags);
+ /* Kick the cleaner thread so it'll start deleting snapshots. */
+ if (test_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags))
+ wake_up_process(fs_info->cleaner_kthread);
+
/*
* backuproot only affect mount behavior, and if open_ctree succeeded,
* no need to keep the flag
@@ -4109,6 +4113,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
kthread_park(fs_info->cleaner_kthread);
+ /*
+ * If we had UNFINISHED_DROPS we could still be processing them, so
+ * clear that bit and wake up relocation so it can stop.
+ */
+ btrfs_wake_unfinished_drop(fs_info);
+
/* wait for the qgroup rescan worker to stop */
btrfs_qgroup_wait_for_completion(fs_info, false);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3ba43a40032c..1cf787209798 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5365,6 +5365,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
int ret;
int level;
bool root_dropped = false;
+ bool unfinished_drop = false;
btrfs_debug(fs_info, "Drop subvolume %llu", root->root_key.objectid);
@@ -5407,6 +5408,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
* already dropped.
*/
set_bit(BTRFS_ROOT_DELETING, &root->state);
+ unfinished_drop = test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+
if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
level = btrfs_header_level(root->node);
path->nodes[level] = btrfs_lock_root_node(root);
@@ -5583,6 +5586,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
kfree(wc);
btrfs_free_path(path);
out:
+ /*
+ * We were an unfinished drop root, check to see if there are any
+ * pending, and if not clear and wake up any waiters.
+ */
+ if (!err && unfinished_drop)
+ btrfs_maybe_wake_unfinished_drop(fs_info);
+
/*
* So if we need to stop dropping the snapshot for whatever reason we
* need to make sure to add it back to the dead root list so that we
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 93db4486a943..bdfa4a93d373 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3601,6 +3601,19 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
int rw = 0;
int err = 0;
+ /*
+ * This only gets set if we had a half-deleted snapshot on mount. We
+ * cannot allow relocation to start while we're still trying to clean up
+ * these pending deletions.
+ */
+ ret = wait_on_bit(&fs_info->flags, BTRFS_FS_UNFINISHED_DROPS, TASK_INTERRUPTIBLE);
+ if (ret)
+ return ret;
+
+ /* We may have been woken up by close_ctree, so bail if we're closing. */
+ if (btrfs_fs_closing(fs_info))
+ return -EINTR;
+
bg = btrfs_lookup_block_group(fs_info, group_start);
if (!bg)
return -ENOENT;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index cd40e7a92552..7bdaccb58ec0 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -280,6 +280,21 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info)
WARN_ON(!test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state));
if (btrfs_root_refs(&root->root_item) == 0) {
+ struct btrfs_key drop_key;
+
+ btrfs_disk_key_to_cpu(&drop_key, &root->root_item.drop_progress);
+ /*
+ * If we have a non-zero drop_progress then we know we
+ * made it partly through deleting this snapshot, and
+ * thus we need to make sure we block any balance from
+ * happening until this snapshot is completely dropped.
+ */
+ if (drop_key.objectid != 0 || drop_key.type != 0 ||
+ drop_key.offset != 0) {
+ set_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+ set_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+ }
+
set_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
btrfs_add_dead_root(root);
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 89117cc2cf05..7d0ea8aeadb0 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1301,6 +1301,32 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
return 0;
}
+/*
+ * If we had a pending drop we need to see if there are any others left in our
+ * dead roots list, and if not clear our bit and wake any waiters.
+ */
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ /*
+ * We put the drop in progress roots at the front of the list, so if the
+ * first entry doesn't have UNFINISHED_DROP set we can wake everybody
+ * up.
+ */
+ spin_lock(&fs_info->trans_lock);
+ if (!list_empty(&fs_info->dead_roots)) {
+ struct btrfs_root *root = list_first_entry(&fs_info->dead_roots,
+ struct btrfs_root,
+ root_list);
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state)) {
+ spin_unlock(&fs_info->trans_lock);
+ return;
+ }
+ }
+ spin_unlock(&fs_info->trans_lock);
+
+ btrfs_wake_unfinished_drop(fs_info);
+}
+
/*
* dead roots are old snapshots that need to be deleted. This allocates
* a dirty root struct and adds it into the list of dead roots that need to
@@ -1313,7 +1339,12 @@ void btrfs_add_dead_root(struct btrfs_root *root)
spin_lock(&fs_info->trans_lock);
if (list_empty(&root->root_list)) {
btrfs_grab_root(root);
- list_add_tail(&root->root_list, &fs_info->dead_roots);
+
+ /* We want to process the partially complete drops first. */
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state))
+ list_add(&root->root_list, &fs_info->dead_roots);
+ else
+ list_add_tail(&root->root_list, &fs_info->dead_roots);
}
spin_unlock(&fs_info->trans_lock);
}
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index eb26eb068fe8..0599e3994633 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -216,6 +216,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid);
void btrfs_add_dead_root(struct btrfs_root *root);
int btrfs_defrag_root(struct btrfs_root *root);
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info);
int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
--
2.39.2
From: Lance Richardson <rlance(a)google.com>
stable inclusion
from stable-v6.6.44
commit 1fe97f68fce1ba24bf823bfb0eb0956003473130
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAKQ2W
CVE: CVE-2024-43856
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 28e8b7406d3a1f5329a03aa25a43aa28e087cb20 ]
dmam_free_coherent() frees a DMA allocation, which makes the
freed vaddr available for reuse, then calls devres_destroy()
to remove and free the data structure used to track the DMA
allocation. Between the two calls, it is possible for a
concurrent task to make an allocation with the same vaddr
and add it to the devres list.
If this happens, there will be two entries in the devres list
with the same vaddr and devres_destroy() can free the wrong
entry, triggering the WARN_ON() in dmam_match.
Fix by destroying the devres entry before freeing the DMA
allocation.
Tested:
kokonut //net/encryption
http://sponge2/b9145fe6-0f72-4325-ac2f-a84d81075b03
Fixes: 9ac7849e35f7 ("devres: device resource management")
Signed-off-by: Lance Richardson <rlance(a)google.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Kaixiong Yu <yukaixiong(a)huawei.com>
---
kernel/dma/mapping.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 51bb8fa8eb89..453c0fbe87ff 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -60,8 +60,8 @@ void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
{
struct dma_devres match_data = { size, vaddr, dma_handle };
- dma_free_coherent(dev, size, vaddr, dma_handle);
WARN_ON(devres_destroy(dev, dmam_release, dmam_match, &match_data));
+ dma_free_coherent(dev, size, vaddr, dma_handle);
}
EXPORT_SYMBOL(dmam_free_coherent);
--
2.25.1
From: Maciej Żenczykowski <maze(a)google.com>
stable inclusion
from stable-v4.19.271
commit 63d161f29cd39c050e8873aa36e0c9fc013bb763
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IALIDP
CVE: CVE-2023-52894
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
commit c6ec929595c7443250b2a4faea988c62019d5cd2 upstream.
In Google internal bug 265639009 we've received an (as yet) unreproducible
crash report from an aarch64 GKI 5.10.149-android13 running device.
AFAICT the source code is at:
https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_1…
The call stack is:
ncm_close() -> ncm_notify() -> ncm_do_notify()
with the crash at:
ncm_do_notify+0x98/0x270
Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)
Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):
// halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification)
0B 0D 00 79 strh w11, [x8, #6]
// word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request)
6C 0A 00 B9 str w12, [x19, #8]
// x10 (NULL) was read here from offset 0 of valid pointer x9
// IMHO we're reading 'cdev->gadget' and getting NULL
// gadget is indeed at offset 0 of struct usb_composite_dev
2A 01 40 F9 ldr x10, [x9]
// loading req->buf pointer, which is at offset 0 of struct usb_request
69 02 40 F9 ldr x9, [x19]
// x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed
4B 5D 40 B9 ldr w11, [x10, #0x5c]
which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:
event->wLength = cpu_to_le16(8);
req->length = NCM_STATUS_BYTECOUNT;
/* SPEED_CHANGE data is up/down speeds in bits/sec */
data = req->buf + sizeof *event;
data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
My analysis of registers and NULL ptr deref crash offset
(Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c)
heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing:
data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
which calls:
ncm_bitrate(NULL)
which then calls:
gadget_is_superspeed(NULL)
which reads
((struct usb_gadget *)NULL)->max_speed
and hits a panic.
AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C.
(remember there's a GKI KABI reservation of 16 bytes in struct work_struct)
It's not at all clear to me how this is all supposed to work...
but returning 0 seems much better than panic-ing...
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Cc: Carlos Llamas <cmllamas(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Maciej Żenczykowski <maze(a)google.com>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20230117131839.1138208-1-maze@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Conflicts:
drivers/usb/gadget/function/f_ncm.c
[Context conflict]
Signed-off-by: Wang Hai <wanghai38(a)huawei.com>
---
drivers/usb/gadget/function/f_ncm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index 3da7e6511fc6..6031e96cf351 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -85,7 +85,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f)
/* peak (theoretical) bulk transfer rate in bits-per-second */
static inline unsigned ncm_bitrate(struct usb_gadget *g)
{
- if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER)
+ if (!g)
+ return 0;
+ else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER)
return 13 * 1024 * 8 * 1000 * 8;
else if (gadget_is_dualspeed(g) && g->speed == USB_SPEED_HIGH)
return 13 * 512 * 8 * 1000 * 8;
--
2.17.1