From: Qu Wenruo wqu@suse.com
stable inclusion from stable-v5.10.121 commit 509e9710b80215892d84c5b7122a9c19d1eb0df6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit d201238ccd2f30b9bfcfadaeae0972e3a486a176 upstream.
[BUG] There is a report that a btrfs has a bad super block num devices.
This makes btrfs to reject the fs completely.
BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here BTRFS error (device sdd3): failed to read chunk tree: -22 BTRFS error (device sdd3): open_ctree failed
[CAUSE] During btrfs device removal, chunk tree and super block num devs are updated in two different transactions:
btrfs_rm_device() |- btrfs_rm_dev_item(device) | |- trans = btrfs_start_transaction() | | Now we got transaction X | | | |- btrfs_del_item() | | Now device item is removed from chunk tree | | | |- btrfs_commit_transaction() | Transaction X got committed, super num devs untouched, | but device item removed from chunk tree. | (AKA, super num devs is already incorrect) | |- cur_devices->num_devices--; |- cur_devices->total_devices--; |- btrfs_set_super_num_devices() All those operations are not in transaction X, thus it will only be written back to disk in next transaction.
So after the transaction X in btrfs_rm_dev_item() committed, but before transaction X+1 (which can be minutes away), a power loss happen, then we got the super num mismatch.
This has been fixed by commit bbac58698a55 ("btrfs: remove device item and update super block in the same transaction").
[FIX] Make the super_num_devices check less strict, converting it from a hard error to a warning, and reset the value to a correct one for the current or next transaction commit.
As the number of device items is the critical information where the super block num_devices is only a cached value (and also useful for cross checking), it's safe to automatically update it. Other device related problems like missing device are handled after that and may require other means to resolve, like degraded mount. With this fix, potentially affected filesystems won't fail mount and require the manual repair by btrfs check.
Reported-by: Luca Béla Palkovics luca.bela.palkovics@gmail.com Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CK... CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com Acked-by: Xie XiuQi xiexiuqi@huawei.com --- fs/btrfs/volumes.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 366d04763864..2fdf178aa76f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7191,12 +7191,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info) * do another round of validation checks. */ if (total_dev != fs_info->fs_devices->total_devices) { - btrfs_err(fs_info, - "super_num_devices %llu mismatch with num_devices %llu found here", + btrfs_warn(fs_info, +"super block num_devices %llu mismatch with DEV_ITEM count %llu, will be repaired on next transaction commit", btrfs_super_num_devices(fs_info->super_copy), total_dev); - ret = -EINVAL; - goto error; + fs_info->fs_devices->total_devices = total_dev; + btrfs_set_super_num_devices(fs_info->super_copy, total_dev); } if (btrfs_super_total_bytes(fs_info->super_copy) < fs_info->fs_devices->total_rw_bytes) {