[PATCH openEuler-1.0-LTS 0/7] backport some zram bugfixes
Georgi Djakov (1): mm/page_io: use pr_alert_ratelimited for swap read/write errors Ming Lei (3): zram: fix race between zram_reset_device() and disksize_store() zram: don't fail to remove zram during unloading module zram: avoid race between zram_remove and disksize_store Rokudo Yan (1): zsmalloc: account the number of compacted pages correctly Sergey Senozhatsky (2): zram: do not lookup algorithm in backends table drivers/block/zram/zram_drv.c: do not keep dangling zcomp pointer after zram reset drivers/block/zram/zcomp.c | 11 ++++---- drivers/block/zram/zram_drv.c | 53 ++++++++++++++++++++++++----------- include/linux/zsmalloc.h | 2 +- mm/page_io.c | 12 ++++---- mm/zsmalloc.c | 17 +++++++---- 5 files changed, 59 insertions(+), 36 deletions(-) -- 2.43.0
From: Rokudo Yan <wu-yan@tcl.com> mainline inclusion from mainline-v5.12-rc1 commit 2395928158059b8f9858365fce7713ce7fef62e4 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Conflicts: drivers/block/zram/zram_drv.c [Context conflicts in mm_stat_show().] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zram_drv.c | 2 +- include/linux/zsmalloc.h | 2 +- mm/zsmalloc.c | 17 +++++++++++------ 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index dade3734a8ca..e62a903efb3c 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -873,7 +873,7 @@ static ssize_t mm_stat_show(struct device *dev, zram->limit_pages << PAGE_SHIFT, max_used << PAGE_SHIFT, (u64)atomic64_read(&zram->stats.same_pages), - pool_stats.pages_compacted, + atomic_long_read(&pool_stats.pages_compacted), (u64)atomic64_read(&zram->stats.huge_pages)); up_read(&zram->init_lock); diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 2219cce81ca4..4638dddc040b 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -36,7 +36,7 @@ enum zs_mapmode { struct zs_pool_stats { /* How many pages were migrated (freed) */ - unsigned long pages_compacted; + atomic_long_t pages_compacted; }; struct zs_pool; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index e0ac8b80a6ac..a16d2ad93387 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -2317,11 +2317,13 @@ static unsigned long zs_can_compact(struct size_class *class) return obj_wasted * class->pages_per_zspage; } -static void __zs_compact(struct zs_pool *pool, struct size_class *class) +static unsigned long __zs_compact(struct zs_pool *pool, + struct size_class *class) { struct zs_compact_control cc; struct zspage *src_zspage; struct zspage *dst_zspage = NULL; + unsigned long pages_freed = 0; spin_lock(&class->lock); while ((src_zspage = isolate_zspage(class, true))) { @@ -2351,7 +2353,7 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class) putback_zspage(class, dst_zspage); if (putback_zspage(class, src_zspage) == ZS_EMPTY) { free_zspage(pool, class, src_zspage); - pool->stats.pages_compacted += class->pages_per_zspage; + pages_freed += class->pages_per_zspage; } spin_unlock(&class->lock); cond_resched(); @@ -2362,12 +2364,15 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class) putback_zspage(class, src_zspage); spin_unlock(&class->lock); + + return pages_freed; } unsigned long zs_compact(struct zs_pool *pool) { int i; struct size_class *class; + unsigned long pages_freed = 0; for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { class = pool->size_class[i]; @@ -2375,10 +2380,11 @@ unsigned long zs_compact(struct zs_pool *pool) continue; if (class->index != i) continue; - __zs_compact(pool, class); + pages_freed += __zs_compact(pool, class); } + atomic_long_add(pages_freed, &pool->stats.pages_compacted); - return pool->stats.pages_compacted; + return pages_freed; } EXPORT_SYMBOL_GPL(zs_compact); @@ -2395,13 +2401,12 @@ static unsigned long zs_shrinker_scan(struct shrinker *shrinker, struct zs_pool *pool = container_of(shrinker, struct zs_pool, shrinker); - pages_freed = pool->stats.pages_compacted; /* * Compact classes and calculate compaction delta. * Can run concurrently with a manually triggered * (by user) compaction. */ - pages_freed = zs_compact(pool) - pages_freed; + pages_freed = zs_compact(pool); return pages_freed ? pages_freed : SHRINK_STOP; } -- 2.43.0
From: Ming Lei <ming.lei@redhat.com> mainline inclusion from mainline-v5.16-rc1 commit 6f1637795f2827d36aec9e0246487f5852e8abf7 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the ->init_lock is released in zram_reset_device(), disksize_store() can come in and try to allocate meta, but zram_reset_device() is freeing free meta, so cause races. Link: https://lore.kernel.org/linux-block/20210927163805.808907-1-mcgrof@kernel.or... Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: drivers/block/zram/zram_drv.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zram_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e62a903efb3c..3e88c7fd2961 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1486,12 +1486,13 @@ static void zram_reset_device(struct zram *zram) set_capacity(zram->disk, 0); part_stat_set_all(&zram->disk->part0, 0); - up_write(&zram->init_lock); /* I/O operation under all of CPU are done so let's free */ zram_meta_free(zram, disksize); memset(&zram->stats, 0, sizeof(zram->stats)); zcomp_destroy(comp); reset_bdev(zram); + + up_write(&zram->init_lock); } static ssize_t disksize_store(struct device *dev, -- 2.43.0
From: Ming Lei <ming.lei@redhat.com> mainline inclusion from mainline-v5.16-rc1 commit 8c54499a59b026a3dc2afccf6e1b36d5700d2fef category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When the zram module is being unloaded, no one should be using the zram disks. However even while being unloaded the zram module's sysfs attributes might be poked at to re-configure zram devices. This is expected, and kernfs ensures that these operations complete before device_del() completes. But reset_store() may set ->claim which will fail zram_remove(), when this happens, zram_reset_device() is bypassed, and zram->comp can't be destroyed, so the warning of 'Error: Removing state 63 which has instances left.' is triggered during unloading module, together with memory leak and sort of thing. Fixes the issue by not failing zram_remove() if ->claim is set, and we actually need to do nothing in case that zram_reset() is running since del_gendisk() will wait until zram_reset() is done. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: drivers/block/zram/zram_drv.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zram_drv.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 3e88c7fd2961..decb7c5ceacb 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1751,30 +1751,46 @@ static int zram_add(void) static int zram_remove(struct zram *zram) { struct block_device *bdev; + bool claimed; bdev = bdget_disk(zram->disk, 0); if (!bdev) return -ENOMEM; mutex_lock(&bdev->bd_mutex); - if (bdev->bd_openers || zram->claim) { + if (bdev->bd_openers) { mutex_unlock(&bdev->bd_mutex); bdput(bdev); return -EBUSY; } - zram->claim = true; + claimed = zram->claim; + if (!claimed) + zram->claim = true; mutex_unlock(&bdev->bd_mutex); zram_debugfs_unregister(zram); - /* Make sure all the pending I/O are finished */ - fsync_bdev(bdev); - zram_reset_device(zram); + + if (claimed) { + /* + * If we were claimed by reset_store(), del_gendisk() will + * wait until reset_store() is done, so nothing need to do. + */ + ; + } else { + /* Make sure all the pending I/O are finished */ + fsync_bdev(bdev); + zram_reset_device(zram); + } bdput(bdev); pr_info("Removed device: %s\n", zram->disk->disk_name); del_gendisk(zram->disk); + + /* del_gendisk drains pending reset_store */ + WARN_ON_ONCE(claimed && zram->claim); + blk_cleanup_queue(zram->disk->queue); put_disk(zram->disk); kfree(zram); @@ -1852,7 +1868,7 @@ static struct class zram_control_class = { static int zram_remove_cb(int id, void *ptr, void *data) { - zram_remove(ptr); + WARN_ON_ONCE(zram_remove(ptr)); return 0; } -- 2.43.0
From: Ming Lei <ming.lei@redhat.com> mainline inclusion from mainline-v5.16-rc1 commit 5a4b653655d554b5f51a5d2252882708c56a6f7e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- After resetting device in zram_remove(), disksize_store still may come and allocate resources again before deleting gendisk, fix the race by resetting zram after del_gendisk() returns. At that time, disksize_store can't come any more. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211025025426.2815424-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: drivers/block/zram/zram_drv.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zram_drv.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index decb7c5ceacb..742ebc7580c2 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1791,6 +1791,13 @@ static int zram_remove(struct zram *zram) /* del_gendisk drains pending reset_store */ WARN_ON_ONCE(claimed && zram->claim); + /* + * disksize_store() may be called in between zram_reset_device() + * and del_gendisk(), so run the last reset to avoid leaking + * anything allocated with disksize_store() + */ + zram_reset_device(zram); + blk_cleanup_queue(zram->disk->queue); put_disk(zram->disk); kfree(zram); -- 2.43.0
From: Sergey Senozhatsky <senozhatsky@chromium.org> mainline inclusion from mainline-v6.0-rc1 commit dc89997264de565999a1cb55db3f295d3a8e457b category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Always use crypto_has_comp() so that crypto can lookup module, call usermodhelper to load the modules, wait for usermodhelper to finish and so on. Otherwise crypto will do all of these steps under CPU hot-plug lock and this looks like too much stuff to handle under the CPU hot-plug lock. Besides this can end up in a deadlock when usermodhelper triggers a code path that attempts to lock the CPU hot-plug lock, that zram already holds. An example of such deadlock: - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto and waits for modprobe to finish disksize_store zcomp_create __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked zcomp_cpu_up_prepare crypto_alloc_base crypto_alg_mod_lookup call_usermodehelper_exec wait_for_completion_killable do_wait_for_common schedule - path B. async work kthread that brings in scsi device. It wants to register CPUHP states at some point, and it needs the CPU hot-plug lock for that, which is owned by zram. async_run_entry_fn scsi_probe_and_add_lun scsi_mq_alloc_queue blk_mq_init_queue blk_mq_init_allocated_queue blk_mq_realloc_hw_ctxs __cpuhp_state_add_instance __cpuhp_state_add_instance_cpuslocked mutex_lock schedule - path C. modprobe sleeps, waiting for all aync works to finish. load_module do_init_module async_synchronize_full async_synchronize_cookie_domain schedule [senozhatsky@chromium.org: add comment] Link: https://lkml.kernel.org/r/20220624060606.1014474-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20220622023501.517125-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: drivers/block/zram/zcomp.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zcomp.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 4ed0a78fdc09..b7fd2a815ddd 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -68,12 +68,6 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp) bool zcomp_available_algorithm(const char *comp) { - int i; - - i = __sysfs_match_string(backends, -1, comp); - if (i >= 0) - return true; - /* * Crypto does not ignore a trailing new line symbol, * so make sure you don't supply a string containing @@ -225,6 +219,11 @@ struct zcomp *zcomp_create(const char *compress) struct zcomp *comp; int error; + /* + * Crypto API will execute /sbin/modprobe if the compression module + * is not loaded yet. We must do it here, otherwise we are about to + * call /sbin/modprobe under CPU hot-plug lock. + */ if (!zcomp_available_algorithm(compress)) return ERR_PTR(-EINVAL); -- 2.43.0
From: Sergey Senozhatsky <senozhatsky@chromium.org> mainline inclusion from mainline-v6.1-rc1 commit 6d2453c3dbc5f70eafc1c866289a90a1fc57ce18 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- We do all reset operations under write lock, so we don't need to save ->disksize and ->comp to stack variables. Another thing is that ->comp is freed during zram reset, but comp pointer is not NULL-ed, so zram keeps the freed pointer value. Link: https://lkml.kernel.org/r/20220824035100.971816-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: drivers/block/zram/zram_drv.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- drivers/block/zram/zram_drv.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 742ebc7580c2..8d5f1c998b5d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1467,9 +1467,6 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector, static void zram_reset_device(struct zram *zram) { - struct zcomp *comp; - u64 disksize; - down_write(&zram->init_lock); zram->limit_pages = 0; @@ -1479,17 +1476,15 @@ static void zram_reset_device(struct zram *zram) return; } - comp = zram->comp; - disksize = zram->disksize; - zram->disksize = 0; - set_capacity(zram->disk, 0); part_stat_set_all(&zram->disk->part0, 0); /* I/O operation under all of CPU are done so let's free */ - zram_meta_free(zram, disksize); + zram_meta_free(zram, zram->disksize); + zram->disksize = 0; memset(&zram->stats, 0, sizeof(zram->stats)); - zcomp_destroy(comp); + zcomp_destroy(zram->comp); + zram->comp = NULL; reset_bdev(zram); up_write(&zram->init_lock); -- 2.43.0
From: Georgi Djakov <georgi.djakov@linaro.org> mainline inclusion from mainline-v5.12-rc1 commit 25eaab438dd58092c5f0c62118d933bf8b2fcc76 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9325 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- If there are errors during swap read or write, they can easily fill the log buffer and remove any previous messages that might be useful for debugging, especially on systems that rely for logging only on the kernel ring-buffer. For example, on a systems using zram as swap, we are more likely to see any page allocation errors preceding the swap write errors if the alerts are ratelimited. Link: https://lkml.kernel.org/r/20210201142055.29068-1-georgi.djakov@linaro.org Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- mm/page_io.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 9750f5580a91..81acbcfa754e 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -62,9 +62,9 @@ void end_swap_bio_write(struct bio *bio) * Also clear PG_reclaim to avoid rotate_reclaimable_page() */ set_page_dirty(page); - pr_alert("Write-error on swap-device (%u:%u:%llu)\n", - MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), - (unsigned long long)bio->bi_iter.bi_sector); + pr_alert_ratelimited("Write-error on swap-device (%u:%u:%llu)\n", + MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), + (unsigned long long)bio->bi_iter.bi_sector); ClearPageReclaim(page); } end_page_writeback(page); @@ -79,9 +79,9 @@ static void end_swap_bio_read(struct bio *bio) if (bio->bi_status) { SetPageError(page); ClearPageUptodate(page); - pr_alert("Read-error on swap-device (%u:%u:%llu)\n", - MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), - (unsigned long long)bio->bi_iter.bi_sector); + pr_alert_ratelimited("Read-error on swap-device (%u:%u:%llu)\n", + MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), + (unsigned long long)bio->bi_iter.bi_sector); goto out; } -- 2.43.0
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/23514 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/3IB... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/23514 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/3IB...
participants (2)
-
Jinjiang Tu -
patchwork bot