From: Ma Wupeng mawupeng1@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6RKHX CVE: NA
--------------------------------
After the fork operation, it is erroneous for the child process to have a reliable page size twice that of its parent process.
Upon examining the mm_struct structure, it was discovered that reliable_nr_page should be initialized to 0, similar to how RSS is initialized during mm_init(). This particular problem that arises during forking is merely one such example.
To resolve this issue, it is recommended to set reliable_nr_page to 0 during the mm_init() operation.
Fixes: 094eaabb3fe8 ("proc: Count reliable memory usage of reliable tasks") Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/mem_reliable.h | 8 ++++++++ kernel/fork.c | 1 + 2 files changed, 9 insertions(+)
diff --git a/include/linux/mem_reliable.h b/include/linux/mem_reliable.h index 6d57c36fb676..aa3fe77c8a72 100644 --- a/include/linux/mem_reliable.h +++ b/include/linux/mem_reliable.h @@ -123,6 +123,13 @@ static inline bool mem_reliable_shmem_limit_check(void) shmem_reliable_nr_page; }
+static inline void reliable_clear_page_counter(struct mm_struct *mm) +{ + if (!mem_reliable_is_enabled()) + return; + + atomic_long_set(&mm->reliable_nr_page, 0); +} #else #define reliable_enabled 0 #define reliable_allow_fb_enabled() false @@ -171,6 +178,7 @@ static inline void reliable_lru_add_batch(int zid, enum lru_list lru, int val) {}
static inline bool mem_reliable_counter_initialized(void) { return false; } +static inline void reliable_clear_page_counter(struct mm_struct *mm) {} #endif
#endif diff --git a/kernel/fork.c b/kernel/fork.c index b5453a26655e..c256525d4ce5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1007,6 +1007,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, atomic_long_set(&mm->locked_vm, 0); mm->pinned_vm = 0; memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); + reliable_clear_page_counter(mm); spin_lock_init(&mm->page_table_lock); spin_lock_init(&mm->arg_lock); mm_init_cpumask(mm);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there are no functional changes except that MD_RECOVERY_RUNNING is checked again after 'reconfig_mutex' is held.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 61 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 45 insertions(+), 16 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 2532780d0dca..76564b1fd4c3 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4662,6 +4662,46 @@ action_show(struct mddev *mddev, char *page) return sprintf(page, "%s\n", type); }
+static void stop_sync_thread(struct mddev *mddev) +{ + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + return; + + if (mddev_lock(mddev)) + return; + + /* + * Check again in case MD_RECOVERY_RUNNING is cleared before lock is + * held. + */ + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { + mddev_unlock(mddev); + return; + } + + if (work_pending(&mddev->del_work)) + flush_workqueue(md_misc_wq); + + if (mddev->sync_thread) { + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + md_reap_sync_thread(mddev); + } + + mddev_unlock(mddev); +} + +static void idle_sync_thread(struct mddev *mddev) +{ + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev); +} + +static void frozen_sync_thread(struct mddev *mddev) +{ + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); + stop_sync_thread(mddev); +} + static ssize_t action_store(struct mddev *mddev, const char *page, size_t len) { @@ -4669,22 +4709,11 @@ action_store(struct mddev *mddev, const char *page, size_t len) return -EINVAL;
- if (cmd_match(page, "idle") || cmd_match(page, "frozen")) { - if (cmd_match(page, "frozen")) - set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - else - clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - mddev_lock(mddev) == 0) { - if (work_pending(&mddev->del_work)) - flush_workqueue(md_misc_wq); - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); - } - mddev_unlock(mddev); - } - } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + if (cmd_match(page, "idle")) + idle_sync_thread(mddev); + else if (cmd_match(page, "frozen")) + frozen_sync_thread(mddev); + else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; else if (cmd_match(page, "resync")) clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
Currently, for idle and frozen, action_store will hold 'reconfig_mutex' and call md_reap_sync_thread() to stop sync thread, however, this will cause deadlock (explained in the next patch). In order to fix the problem, following patch will release 'reconfig_mutex' and wait on 'resync_wait', like md_set_readonly() and do_md_stop() does.
Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN' unconditionally, which might cause unexpected problems, for example, frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which might starve in progress frozen.
This patch add a mutex to synchronize idle and frozen from action_store().
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 5 +++++ drivers/md/md.h | 3 +++ 2 files changed, 8 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 76564b1fd4c3..6018f07fd3db 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -566,6 +566,7 @@ void mddev_init(struct mddev *mddev) mutex_init(&mddev->open_mutex); mutex_init(&mddev->reconfig_mutex); mutex_init(&mddev->bitmap_info.mutex); + mutex_init(&mddev->sync_mutex); INIT_LIST_HEAD(&mddev->disks); INIT_LIST_HEAD(&mddev->all_mddevs); timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0); @@ -4692,14 +4693,18 @@ static void stop_sync_thread(struct mddev *mddev)
static void idle_sync_thread(struct mddev *mddev) { + mutex_lock(&mddev->sync_mutex); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + mutex_unlock(&mddev->sync_mutex); }
static void frozen_sync_thread(struct mddev *mddev) { + mutex_lock(&mddev->sync_mutex); set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + mutex_unlock(&mddev->sync_mutex); }
static ssize_t diff --git a/drivers/md/md.h b/drivers/md/md.h index 916b4ff4d9e0..c770892a9da6 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -509,6 +509,9 @@ struct mddev { unsigned int good_device_nr; /* good device num within cluster raid */
bool has_superblocks:1; + + /* Used to synchronize idle and frozen for action_store() */ + struct mutex sync_mutex; };
enum recovery_flags {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
Our test found a following deadlock in raid10:
1) Issue a normal write, and such write failed:
raid10_end_write_request set_bit(R10BIO_WriteError, &r10_bio->state) one_write_done reschedule_retry
// later from md thread raid10d handle_write_completed list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
// later from md thread raid10d if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) list_move(conf->bio_end_io_list.prev, &tmp) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list) raid_end_bio_io(r10_bio)
Dependency chain 1: normal io is waiting for updating superblock
2) Trigger a recovery:
raid10_sync_request raise_barrier
Dependency chain 2: sync thread is waiting for normal io
3) echo idle/frozen to sync_action:
action_store mddev_lock md_unregister_thread kthread_stop
Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
4) md thread can't update superblock:
raid10d md_check_recovery if (mddev_trylock(mddev)) md_update_sb
Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
Hence cyclic dependency exist, in order to fix the problem, we must break one of them. Dependency 1 and 2 can't be broken because they are foundation design. Dependency 4 may be possible if it can be guaranteed that no io can be inflight, however, this requires a new mechanism which seems complex. Dependency 3 is a good choice, because idle/frozen only requires sync thread to finish, which can be done asynchronously that is already implemented, and 'reconfig_mutex' is not needed anymore.
This patch switch 'idle' and 'frozen' to wait sync thread to be done asynchronously, and this patch also add a sequence counter to record how many times sync thread is done, so that 'idle' won't keep waiting on new started sync thread.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 24 ++++++++++++++++++++---- drivers/md/md.h | 2 ++ 2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 6018f07fd3db..4d4204eda5be 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -573,6 +573,7 @@ void mddev_init(struct mddev *mddev) atomic_set(&mddev->active, 1); atomic_set(&mddev->openers, 0); atomic_set(&mddev->active_io, 0); + atomic_set(&mddev->sync_seq, 0); spin_lock_init(&mddev->lock); atomic_set(&mddev->flush_pending, 0); init_waitqueue_head(&mddev->sb_wait); @@ -4683,19 +4684,28 @@ static void stop_sync_thread(struct mddev *mddev) if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq);
- if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); - } + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + /* + * Thread might be blocked waiting for metadata update which will now + * never happen. + */ + if (mddev->sync_thread) + wake_up_process(mddev->sync_thread->tsk);
mddev_unlock(mddev); }
static void idle_sync_thread(struct mddev *mddev) { + int sync_seq = atomic_read(&mddev->sync_seq); + mutex_lock(&mddev->sync_mutex); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + + wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) || + !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); + mutex_unlock(&mddev->sync_mutex); }
@@ -4704,6 +4714,10 @@ static void frozen_sync_thread(struct mddev *mddev) mutex_lock(&mddev->sync_mutex); set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev); + + wait_event(resync_wait, mddev->sync_thread == NULL && + !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); + mutex_unlock(&mddev->sync_mutex); }
@@ -9135,6 +9149,8 @@ void md_reap_sync_thread(struct mddev *mddev)
/* resync has finished, collect result */ md_unregister_thread(&mddev->sync_thread); + atomic_inc(&mddev->sync_seq); + if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { diff --git a/drivers/md/md.h b/drivers/md/md.h index c770892a9da6..fe275ec34647 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -512,6 +512,8 @@ struct mddev {
/* Used to synchronize idle and frozen for action_store() */ struct mutex sync_mutex; + /* The sequence number for sync thread */ + atomic_t sync_seq; };
enum recovery_flags {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
We just replace md_reap_sync_thread() with wait_event(resync_wait, ...) from action_store(), this patch just make sure action_store() will still wait for everything to be done in md_reap_sync_thread().
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 4d4204eda5be..3db5ea4b40d5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9184,13 +9184,13 @@ void md_reap_sync_thread(struct mddev *mddev) clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); - wake_up(&resync_wait); /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(mddev); if (mddev->event_work.func) queue_work(md_misc_wq, &mddev->event_work); + wake_up(&resync_wait); } EXPORT_SYMBOL(md_reap_sync_thread);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
Before refactoring idle and frozen from action_store, interruptible apis is used so that hungtask warning won't be triggered if it takes too long to finish indle/frozen sync_thread. This patch do the same.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 3db5ea4b40d5..baa874823930 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4699,11 +4699,14 @@ static void idle_sync_thread(struct mddev *mddev) { int sync_seq = atomic_read(&mddev->sync_seq);
- mutex_lock(&mddev->sync_mutex); + if (mutex_lock_interruptible(&mddev->sync_mutex)) + return; + clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev);
- wait_event(resync_wait, sync_seq != atomic_read(&mddev->sync_seq) || + wait_event_interruptible(resync_wait, + sync_seq != atomic_read(&mddev->sync_seq) || !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
mutex_unlock(&mddev->sync_mutex); @@ -4711,11 +4714,13 @@ static void idle_sync_thread(struct mddev *mddev)
static void frozen_sync_thread(struct mddev *mddev) { - mutex_lock(&mddev->sync_mutex); + if (mutex_lock_interruptible(&mddev->sync_mutex)) + return; + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev);
- wait_event(resync_wait, mddev->sync_thread == NULL && + wait_event_interruptible(resync_wait, mddev->sync_thread == NULL && !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
mutex_unlock(&mddev->sync_mutex);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY CVE: NA
--------------------------------
Struct mddev is just used inside raid, just in case that md_mod is compiled from new kernel, and raid1/raid10 or other out-of-tree raid are compiled from old kernel.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/md/md.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/md/md.h b/drivers/md/md.h index fe275ec34647..ea67637f7082 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -510,10 +510,12 @@ struct mddev {
bool has_superblocks:1;
+#ifndef __GENKSYMS__ /* Used to synchronize idle and frozen for action_store() */ struct mutex sync_mutex; /* The sequence number for sync thread */ atomic_t sync_seq; +#endif };
enum recovery_flags {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
--------------------------------
This reverts commit 00f206947a6c9d4fbe40f4d328bfc11e04020cdc.
Mainline solution will be backported in following patches.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/partition-generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/partition-generic.c b/block/partition-generic.c index c4ac7a8c77dc..2261566741f4 100644 --- a/block/partition-generic.c +++ b/block/partition-generic.c @@ -464,7 +464,7 @@ static int drop_partitions(struct gendisk *disk, struct block_device *bdev) struct hd_struct *part; int res;
- if (bdev->bd_part_count || bdev->bd_super) + if (bdev->bd_part_count) return -EBUSY; res = invalidate_partition(disk, 0); if (res)
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.10-rc1 commit 9301fe734384990ef9a2463cb7aeb3b00bf5dad5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Use blkdev_get_by_dev instead of open coding it using bdget_disk + blkdev_get, and split the code to read the partition table into a separate helper to make it a little more obvious.
Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: - this patch just factor out a helper, bdget_disk + blkdev_get is still used because 'bdev->bd_invalidated' need to be set. Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/genhd.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c index 4a748603c881..daf7211429c1 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -639,31 +639,30 @@ static void register_disk(struct device *parent, struct gendisk *disk) } }
-static void disk_init_partition(struct gendisk *disk) +static void disk_scan_partitions(struct gendisk *disk) { - struct device *ddev = disk_to_dev(disk); struct block_device *bdev; - struct disk_part_iter piter; - struct hd_struct *part;
- /* No minors to use for partitions */ - if (!disk_part_scan_enabled(disk)) - goto exit; - - /* No such device (e.g., media were just removed) */ - if (!get_capacity(disk)) - goto exit; + if (!get_capacity(disk) || !disk_part_scan_enabled(disk)) + return;
bdev = bdget_disk(disk, 0); if (!bdev) - goto exit; + return;
bdev->bd_invalidated = 1; - if (blkdev_get(bdev, FMODE_READ, NULL)) - goto exit; - blkdev_put(bdev, FMODE_READ); + if (!blkdev_get(bdev, FMODE_READ, NULL)) + blkdev_put(bdev, FMODE_READ); +} + +static void disk_init_partition(struct gendisk *disk) +{ + struct device *ddev = disk_to_dev(disk); + struct disk_part_iter piter; + struct hd_struct *part; + + disk_scan_partitions(disk);
-exit: /* announce disk after possible partitions are created */ dev_set_uevent_suppress(ddev, 0); kobject_uevent(&ddev->kobj, KOBJ_ADD);
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.17-rc1 commit e16e506ccd673a3a888a34f8f694698305840044 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Unify the functionality that implements a partition rescan for a gendisk.
Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: - comit f0b870df80bc ("block: remove (__)blkdev_reread_part as an exported API") is not backported, and this patch doesn't remove (__)blkdev_reread_part apis as well. - commit b98bcd9ef2d6 ("block: reopen the device in blkdev_reread_part") is not backported, this patch switch blkdev_reread_part() to disk_scan_partitions() directly, which will reopen the device.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk.h | 1 + block/genhd.c | 20 +++++++++++++------- block/ioctl.c | 8 +++++++- 3 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/block/blk.h b/block/blk.h index 9269bb6b14f8..3f1c76b55336 100644 --- a/block/blk.h +++ b/block/blk.h @@ -214,6 +214,7 @@ unsigned int blk_plug_queued_count(struct request_queue *q); void blk_account_io_start(struct request *req, bool new_io); void blk_account_io_completion(struct request *req, unsigned int bytes); void blk_account_io_done(struct request *req, u64 now); +int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
/* * EH timer and IO completion will both attempt to 'grab' the request, make diff --git a/block/genhd.c b/block/genhd.c index daf7211429c1..1f981753d4e7 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -639,20 +639,25 @@ static void register_disk(struct device *parent, struct gendisk *disk) } }
-static void disk_scan_partitions(struct gendisk *disk) +int disk_scan_partitions(struct gendisk *disk, fmode_t mode) { struct block_device *bdev; + int ret;
- if (!get_capacity(disk) || !disk_part_scan_enabled(disk)) - return; + if (!disk_part_scan_enabled(disk)) + return -EINVAL;
bdev = bdget_disk(disk, 0); if (!bdev) - return; + return -ENOMEM;
bdev->bd_invalidated = 1; - if (!blkdev_get(bdev, FMODE_READ, NULL)) - blkdev_put(bdev, FMODE_READ); + + ret = blkdev_get(bdev, mode, NULL); + if (!ret) + blkdev_put(bdev, mode); + + return ret; }
static void disk_init_partition(struct gendisk *disk) @@ -661,7 +666,8 @@ static void disk_init_partition(struct gendisk *disk) struct disk_part_iter piter; struct hd_struct *part;
- disk_scan_partitions(disk); + if (get_capacity(disk)) + disk_scan_partitions(disk, FMODE_READ);
/* announce disk after possible partitions are created */ dev_set_uevent_suppress(ddev, 0); diff --git a/block/ioctl.c b/block/ioctl.c index 93939b0cbf03..da8d09146385 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -10,6 +10,8 @@ #include <linux/pr.h> #include <linux/uaccess.h>
+#include "blk.h" + static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user *arg) { struct block_device *bdevp; @@ -597,7 +599,11 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKPG: return blkpg_ioctl(bdev, argp); case BLKRRPART: - return blkdev_reread_part(bdev); + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + if (bdev != bdev->bd_contains) + return -EINVAL; + return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL); case BLKGETSIZE: size = i_size_read(bdev->bd_inode); if ((size >> 9) > ~0UL)
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
--------------------------------
Include blk.h in ioctl.c will cause kabi broken, because some data structure definitions is exposed. This patch add a separate header file to fix this problem.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk.h | 1 - block/blk_extra_api.h | 27 +++++++++++++++++++++++++++ block/ioctl.c | 2 +- 3 files changed, 28 insertions(+), 2 deletions(-) create mode 100644 block/blk_extra_api.h
diff --git a/block/blk.h b/block/blk.h index 3f1c76b55336..9269bb6b14f8 100644 --- a/block/blk.h +++ b/block/blk.h @@ -214,7 +214,6 @@ unsigned int blk_plug_queued_count(struct request_queue *q); void blk_account_io_start(struct request *req, bool new_io); void blk_account_io_completion(struct request *req, unsigned int bytes); void blk_account_io_done(struct request *req, u64 now); -int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
/* * EH timer and IO completion will both attempt to 'grab' the request, make diff --git a/block/blk_extra_api.h b/block/blk_extra_api.h new file mode 100644 index 000000000000..704d2a61bf12 --- /dev/null +++ b/block/blk_extra_api.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023. Huawei Technologies Co., Ltd. All rights reserved. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef BLK_EXTRA_API_H +#define BLK_EXTRA_API_H + +/* + * Include blk.h will cause kabi broken in some contexts because it will expose + * definitions for some data structure. This file is used for the apis that + * can't be placed in blk.h. + */ + +#include <linux/genhd.h> + +int disk_scan_partitions(struct gendisk *disk, fmode_t mode); + +#endif /* BLK_EXTRA_API_H */ diff --git a/block/ioctl.c b/block/ioctl.c index da8d09146385..ddc6d340e876 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -10,7 +10,7 @@ #include <linux/pr.h> #include <linux/uaccess.h>
-#include "blk.h" +#include "blk_extra_api.h"
static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user *arg) {
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v6.3-rc1 commit e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As explained in commit 36369f46e917 ("block: Do not reread partition table on exclusively open device"), reread partition on the device that is exclusively opened by someone else is problematic.
This patch will make sure partition scan will only be proceed if current thread open the device exclusively, or the device is not opened exclusively, and in the later case, other scanners and exclusive openers will be blocked temporarily until partition scan is done.
Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Cc: stable@vger.kernel.org Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/genhd.c block/ioctl.c Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/blk.h | 4 ++++ block/genhd.c | 36 ++++++++++++++++++++++++++++++++++-- block/ioctl.c | 2 +- fs/block_dev.c | 4 ++-- 4 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/block/blk.h b/block/blk.h index 9269bb6b14f8..965e9c507654 100644 --- a/block/blk.h +++ b/block/blk.h @@ -214,6 +214,10 @@ unsigned int blk_plug_queued_count(struct request_queue *q); void blk_account_io_start(struct request *req, bool new_io); void blk_account_io_completion(struct request *req, unsigned int bytes); void blk_account_io_done(struct request *req, u64 now); +int bd_prepare_to_claim(struct block_device *bdev, + struct block_device *whole, void *holder); +void bd_abort_claiming(struct block_device *bdev, struct block_device *whole, + void *holder);
/* * EH timer and IO completion will both attempt to 'grab' the request, make diff --git a/block/genhd.c b/block/genhd.c index 1f981753d4e7..bf095fb5c41a 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -651,12 +651,33 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) if (!bdev) return -ENOMEM;
- bdev->bd_invalidated = 1; + /* + * If the device is opened exclusively by current thread already, it's + * safe to scan partitons, otherwise, use bd_prepare_to_claim() to + * synchronize with other exclusive openers and other partition + * scanners. + */ + if (!(mode & FMODE_EXCL)) { + ret = bd_prepare_to_claim(bdev, bdev, disk_scan_partitions); + if (ret) { + bdput(bdev); + return ret; + }
- ret = blkdev_get(bdev, mode, NULL); + /* Ping the bdev until bd_abort_claiming() */ + bdgrab(bdev); + } + + bdev->bd_invalidated = 1; + ret = blkdev_get(bdev, mode & ~FMODE_EXCL, NULL); if (!ret) blkdev_put(bdev, mode);
+ if (!(mode & FMODE_EXCL)) { + bd_abort_claiming(bdev, bdev, disk_scan_partitions); + bdput(bdev); + } + return ret; }
@@ -694,6 +715,7 @@ static void disk_init_partition(struct gendisk *disk) static void __device_add_disk(struct device *parent, struct gendisk *disk, bool register_queue) { + struct block_device *bdev = NULL; dev_t devt; int retval;
@@ -746,12 +768,22 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk, disk_add_events(disk); blk_integrity_add(disk);
+ /* Make sure the first partition scan will be proceed */ + if (get_capacity(disk) && disk_part_scan_enabled(disk)) { + bdev = bdget_disk(disk, 0); + if (bdev) + bdev->bd_invalidated = 1; + } + /* * Set the flag at last, so that block devcie can't be opened * before it's registration is done. */ disk->flags |= GENHD_FL_UP; disk_init_partition(disk); + + if (bdev) + bdput(bdev); }
void device_add_disk(struct device *parent, struct gendisk *disk) diff --git a/block/ioctl.c b/block/ioctl.c index ddc6d340e876..911887eefc29 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -603,7 +603,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, return -EACCES; if (bdev != bdev->bd_contains) return -EINVAL; - return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL); + return disk_scan_partitions(bdev->bd_disk, mode); case BLKGETSIZE: size = i_size_read(bdev->bd_inode); if ((size >> 9) > ~0UL) diff --git a/fs/block_dev.c b/fs/block_dev.c index b4bb16d79d78..7fa66b5bf886 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1072,8 +1072,8 @@ static bool bd_may_claim(struct block_device *bdev, struct block_device *whole, * RETURNS: * 0 if @bdev can be claimed, -EBUSY otherwise. */ -static int bd_prepare_to_claim(struct block_device *bdev, - struct block_device *whole, void *holder) +int bd_prepare_to_claim(struct block_device *bdev, + struct block_device *whole, void *holder) { retry: spin_lock(&bdev_lock);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-v6.3-rc2 commit 428913bce1e67ccb4dae317fd0332545bf8c9233 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
If disk_scan_partitions() is called with 'FMODE_EXCL', blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow blkdev_put() is still called with 'FMODE_EXCL', which will cause 'bd_holders' counter to leak.
Fix the problem by using the right mode for blkdev_put().
Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@... Tested-by: Julian Ruess julianr@linux.ibm.com Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- block/genhd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c index bf095fb5c41a..ae4c4c4ae5a9 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -671,7 +671,7 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) bdev->bd_invalidated = 1; ret = blkdev_get(bdev, mode & ~FMODE_EXCL, NULL); if (!ret) - blkdev_put(bdev, mode); + blkdev_put(bdev, mode & ~FMODE_EXCL);
if (!(mode & FMODE_EXCL)) { bd_abort_claiming(bdev, bdev, disk_scan_partitions);