From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: 55097 https://gitee.com/openeuler/kernel/issues/I4DDEL
-------------------------------------------------
Now there no protection between partition operations (e.g, partition rescan) and delete_partition() in del_gendisk(), so the following scenario is possible:
CPU 1
blkdev_ioctl del_gendisk blkdev_reread_part lock bd_mutex drop_partitions check_partition lock lookup_sem // for each partition deletion_partion
// for each partition add_partition
The newly added partitions, the device files (e.g, /dev/sdXN) and the symlinks in /sys/class/block will be left behind. If the deleted disk is online again, the scan of partition will fail with the following error:
sysfs: cannot create duplicate filename '/class/block/sdaN' sdX: pN could not be added: 17
Vanilla kernel tries to fix that by commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk"), but it introduces dead-lock for nbd/loop/xen-frontblk drivers. These in-tree drivers can be fixed, but there may be other affected block drivers, especially the out-of-tree ones, so fixing it in another way.
Two methods are considered. The first is waiting for the end of partition operations in del_gendisk(). It is OK but it needs adding new fields in gendisk (bool & wait_queue_head_t). The second is reusing lookup_sem and GENHD_FL_UP to serialize partition operations and del_gendisk(). Now the latter is chose and here are the details.
There are six partition operations:
(1) add_partition() in blkpg_ioctl() (2) deletion_partion() in blkpg_ioctl() (3) resize in blkpg_ioctl() (4) partition rescan and revalidate in bdev_disk_changed() (5) deletion_partion() in del_gendisk()
op (1)~(4) already take bd_mutex, so using down_read() to serialize with down_write() in del_gendisk() is OK. op (3) only updates the values in hd_struct, so no lock is needed, because it already increase the ref of hd_struct.
lookup_sem is used to prevent a newly-created blocking device inode from associating with a deleting gendisk, and the locking order is:
part->bd_mutex -> disk->lookup_sem or whole->bd_mutex -> disk->lookup_sem
Now it is also used to serialize the partition operations and the new locking order will be:
part->bd_mutex -> whole->bd_mutex -> disk->lookup_sem
and it is OK.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Chen Jun chenjun102@huawei.com --- block/partitions/core.c | 4 ++++ fs/block_dev.c | 12 ++++++++---- 2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/block/partitions/core.c b/block/partitions/core.c index 1f031f074ffd..569b0ca9f6e1 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -523,6 +523,7 @@ int bdev_add_partition(struct block_device *bdev, int partno, int ret;
mutex_lock(&bdev->bd_mutex); + down_read(&disk->lookup_sem); if (!(disk->flags & GENHD_FL_UP)) { ret = -ENXIO; goto out; @@ -537,6 +538,7 @@ int bdev_add_partition(struct block_device *bdev, int partno, ADDPART_FLAG_NONE, NULL); ret = PTR_ERR_OR_ZERO(part); out: + up_read(&disk->lookup_sem); mutex_unlock(&bdev->bd_mutex); return ret; } @@ -553,6 +555,7 @@ int bdev_del_partition(struct block_device *bdev, int partno)
mutex_lock(&bdevp->bd_mutex); mutex_lock_nested(&bdev->bd_mutex, 1); + down_read(&bdev->bd_disk->lookup_sem);
ret = -ENXIO; part = disk_get_part(bdev->bd_disk, partno); @@ -569,6 +572,7 @@ int bdev_del_partition(struct block_device *bdev, int partno) delete_partition(part); ret = 0; out_unlock: + up_read(&bdev->bd_disk->lookup_sem); mutex_unlock(&bdev->bd_mutex); mutex_unlock(&bdevp->bd_mutex); bdput(bdevp); diff --git a/fs/block_dev.c b/fs/block_dev.c index d2881fd351a3..46801789f2dc 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1427,14 +1427,16 @@ int bdev_disk_changed(struct block_device *bdev, bool invalidate) int ret;
lockdep_assert_held(&bdev->bd_mutex); - - if (!(disk->flags & GENHD_FL_UP)) - return -ENXIO; + down_read(&disk->lookup_sem); + if (!(disk->flags & GENHD_FL_UP)) { + ret = -ENXIO; + goto out; + }
rescan: ret = blk_drop_partitions(bdev); if (ret) - return ret; + goto out;
clear_bit(GD_NEED_PART_SCAN, &disk->state);
@@ -1469,6 +1471,8 @@ int bdev_disk_changed(struct block_device *bdev, bool invalidate) kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE); }
+out: + up_read(&disk->lookup_sem); return ret; } /*