Christoph Hellwig (2): block: serialize all debugfs operations using q->debugfs_mutex block: remove per-disk debugfs files in blk_unregister_queue
Greg Kroah-Hartman (2): blk-mq: no need to check return value of debugfs_create functions blk-mq: fix up placement of debugfs directory of queue files
Jan Kara (1): blktrace: Avoid sparse warnings when assigning q->blk_trace
Luis Chamberlain (3): blktrace: break out of blktrace setup on concurrent calls blktrace: annotate required lock on do_blk_trace_setup() block: create the request_queue debugfs_dir on registration
Ming Lei (1): blk-mq: don't create hctx debugfs dir until q->debugfs_dir is created
Saravanan D (1): blk-mq: Fix spurious debugfs directory creation during initialization
Yu Kuai (3): block: shutdown blktrace in blk_release_queue() block: protect blk_mq_debugfs_register/unregister_hctx() with 'debugfs_mutex' block: fix kabi broken in struct request_queue
block/blk-core.c | 8 +- block/blk-io-hierarchy/debugfs.c | 6 ++ block/blk-io-hierarchy/stats.c | 8 ++ block/blk-mq-debugfs.c | 158 ++++++++++++------------------- block/blk-mq-debugfs.h | 36 +++---- block/blk-mq-sched.c | 11 +++ block/blk-mq.c | 2 + block/blk-sysfs.c | 24 +++-- block/blk.h | 2 - include/linux/blkdev.h | 11 ++- kernel/trace/blktrace.c | 72 +++++++------- 11 files changed, 168 insertions(+), 170 deletions(-)
From: Luis Chamberlain mcgrof@kernel.org
stable inclusion from stable-v4.19.131 commit c7b333600aac096d700b857eebed8269f08a1b4b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 1b0b283648163dae2a214ca28ed5a99f62a77319 ]
We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda.
We check for concurrent setup only at the very end of the blktrace setup though.
If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this:
The kernel will show these:
``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! ``
And userspace just sees this error message for the second call:
``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ```
The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace.
This can easily be reproduced with break-blktrace [0] run_0005.sh test.
Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist.
[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yu Kuai yukuai3@huawei.com --- kernel/trace/blktrace.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 49a84ee7ec30..703fc66b4510 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -3,6 +3,9 @@ * Copyright (C) 2006 Jens Axboe axboe@kernel.dk * */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/blktrace_api.h> @@ -526,6 +529,16 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_');
+ /* + * bdev can be NULL, as with scsi-generic, this is a helpful as + * we can be. + */ + if (q->blk_trace) { + pr_warn("Concurrent blktraces are not allowed on %s\n", + buts->name); + return -EBUSY; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM;
From: Jan Kara jack@suse.cz
mainline inclusion from mainline-v5.8-rc2 commit c3dbe541ef77754729de5e82be2d6e5d267c6c8c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Mostly for historical reasons, q->blk_trace is assigned through xchg() and cmpxchg() atomic operations. Although this is correct, sparse complains about this because it violates rcu annotations since commit c780e86dd48e ("blktrace: Protect q->blk_trace with RCU") which started to use rcu for accessing q->blk_trace. Furthermore there's no real need for atomic operations anymore since all changes to q->blk_trace happen under q->blk_trace_mutex and since it also makes more sense to check if q->blk_trace is set with the mutex held earlier.
So let's just replace xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and rcu_assign_pointer(). This makes the code more efficient and sparse happy.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com --- kernel/trace/blktrace.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 703fc66b4510..2591b245fe76 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -380,7 +380,8 @@ static int __blk_trace_remove(struct request_queue *q) { struct blk_trace *bt;
- bt = xchg(&q->blk_trace, NULL); + bt = rcu_replace_pointer(q->blk_trace, NULL, + lockdep_is_held(&q->blk_trace_mutex)); if (!bt) return -EINVAL;
@@ -533,7 +534,8 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, * bdev can be NULL, as with scsi-generic, this is a helpful as * we can be. */ - if (q->blk_trace) { + if (rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex))) { pr_warn("Concurrent blktraces are not allowed on %s\n", buts->name); return -EBUSY; @@ -612,10 +614,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, bt->pid = buts->pid; bt->trace_state = Blktrace_setup;
- ret = -EBUSY; - if (cmpxchg(&q->blk_trace, NULL, bt)) - goto err; - + rcu_assign_pointer(q->blk_trace, bt); get_probe_ref();
ret = 0; @@ -1669,7 +1668,8 @@ static int blk_trace_remove_queue(struct request_queue *q) { struct blk_trace *bt;
- bt = xchg(&q->blk_trace, NULL); + bt = rcu_replace_pointer(q->blk_trace, NULL, + lockdep_is_held(&q->blk_trace_mutex)); if (bt == NULL) return -EINVAL;
@@ -1703,10 +1703,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
blk_trace_setup_lba(bt, bdev);
- ret = -EBUSY; - if (cmpxchg(&q->blk_trace, NULL, bt)) - goto free_bt; - + rcu_assign_pointer(q->blk_trace, bt); get_probe_ref(); return 0;
From: Luis Chamberlain mcgrof@kernel.org
mainline inclusion from mainline-v5.9-rc1 commit a67549c8e568627290234e9fbe833cb9dfd36b55 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Ensure it is clear which lock is required on do_blk_trace_setup().
Suggested-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com --- kernel/trace/blktrace.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 2591b245fe76..ab824d742bfd 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -515,6 +515,8 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct dentry *dir = NULL; int ret;
+ lockdep_assert_held(&q->blk_trace_mutex); + if (!buts->buf_size || !buts->buf_nr) return -EINVAL;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
mainline inclusion from mainline-v5.2-rc5 commit 6cfc0081b046ebf50dd38c38e688c8de143614f3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this.
When all of these checks are cleaned up, lots of the functions used in the blk-mq-debugfs code can now return void, as no need to check the return value of them either.
Overall, this ends up cleaning up the code and making it smaller, always a nice win.
Cc: Jens Axboe axboe@kernel.dk Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: block/blk-mq-debugfs.h block/blk-mq-debugfs.c [Context conflicts] Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-mq-debugfs.c | 122 ++++++++++------------------------------- block/blk-mq-debugfs.h | 31 +++++------ 2 files changed, 43 insertions(+), 110 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 5ee91901d9cc..88c9e47417a4 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -823,38 +823,28 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = { {}, };
-bool debugfs_create_files(struct dentry *parent, void *data, +void debugfs_create_files(struct dentry *parent, void *data, const struct blk_mq_debugfs_attr *attr) { if (IS_ERR_OR_NULL(parent)) - return false; + return;
d_inode(parent)->i_private = data;
- for (; attr->name; attr++) { - if (!debugfs_create_file(attr->name, attr->mode, parent, - (void *)attr, &blk_mq_debugfs_fops)) - return false; - } - return true; + for (; attr->name; attr++) + debugfs_create_file(attr->name, attr->mode, parent, + (void *)attr, &blk_mq_debugfs_fops); }
-int blk_mq_debugfs_register(struct request_queue *q) +void blk_mq_debugfs_register(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; int i;
- if (!blk_debugfs_root) - return -ENOENT; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), blk_debugfs_root); - if (!q->debugfs_dir) - return -ENOMEM;
- if (!debugfs_create_files(q->debugfs_dir, q, - blk_mq_debugfs_queue_attrs)) - goto err; + debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
/* * blk_mq_init_sched() attempted to do this already, but q->debugfs_dir @@ -866,19 +856,13 @@ int blk_mq_debugfs_register(struct request_queue *q)
/* Similarly, blk_mq_init_hctx() couldn't do this previously. */ queue_for_each_hw_ctx(q, hctx, i) { - if (!hctx->debugfs_dir && blk_mq_debugfs_register_hctx(q, hctx)) - goto err; - if (q->elevator && !hctx->sched_debugfs_dir && - blk_mq_debugfs_register_sched_hctx(q, hctx)) - goto err; + if (!hctx->debugfs_dir) + blk_mq_debugfs_register_hctx(q, hctx); + if (q->elevator && !hctx->sched_debugfs_dir) + blk_mq_debugfs_register_sched_hctx(q, hctx); }
blk_mq_debugfs_register_hierarchy_stats(q); - return 0; - -err: - blk_mq_debugfs_unregister(q); - return -ENOMEM; }
void blk_mq_debugfs_unregister(struct request_queue *q) @@ -888,52 +872,32 @@ void blk_mq_debugfs_unregister(struct request_queue *q) q->debugfs_dir = NULL; }
-static int blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, - struct blk_mq_ctx *ctx) +static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, + struct blk_mq_ctx *ctx) { struct dentry *ctx_dir; char name[20];
snprintf(name, sizeof(name), "cpu%u", ctx->cpu); ctx_dir = debugfs_create_dir(name, hctx->debugfs_dir); - if (!ctx_dir) - return -ENOMEM; - - if (!debugfs_create_files(ctx_dir, ctx, blk_mq_debugfs_ctx_attrs)) - return -ENOMEM;
- return 0; + debugfs_create_files(ctx_dir, ctx, blk_mq_debugfs_ctx_attrs); }
-int blk_mq_debugfs_register_hctx(struct request_queue *q, - struct blk_mq_hw_ctx *hctx) +void blk_mq_debugfs_register_hctx(struct request_queue *q, + struct blk_mq_hw_ctx *hctx) { struct blk_mq_ctx *ctx; char name[20]; int i;
- if (!q->debugfs_dir) - return -ENOENT; - snprintf(name, sizeof(name), "hctx%u", hctx->queue_num); hctx->debugfs_dir = debugfs_create_dir(name, q->debugfs_dir); - if (!hctx->debugfs_dir) - return -ENOMEM;
- if (!debugfs_create_files(hctx->debugfs_dir, hctx, - blk_mq_debugfs_hctx_attrs)) - goto err; - - hctx_for_each_ctx(hctx, ctx, i) { - if (blk_mq_debugfs_register_ctx(hctx, ctx)) - goto err; - } + debugfs_create_files(hctx->debugfs_dir, hctx, blk_mq_debugfs_hctx_attrs);
- return 0; - -err: - blk_mq_debugfs_unregister_hctx(hctx); - return -ENOMEM; + hctx_for_each_ctx(hctx, ctx, i) + blk_mq_debugfs_register_ctx(hctx, ctx); }
void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx) @@ -943,17 +907,13 @@ void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx) hctx->debugfs_dir = NULL; }
-int blk_mq_debugfs_register_hctxs(struct request_queue *q) +void blk_mq_debugfs_register_hctxs(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; int i;
- queue_for_each_hw_ctx(q, hctx, i) { - if (blk_mq_debugfs_register_hctx(q, hctx)) - return -ENOMEM; - } - - return 0; + queue_for_each_hw_ctx(q, hctx, i) + blk_mq_debugfs_register_hctx(q, hctx); }
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q) @@ -965,29 +925,16 @@ void blk_mq_debugfs_unregister_hctxs(struct request_queue *q) blk_mq_debugfs_unregister_hctx(hctx); }
-int blk_mq_debugfs_register_sched(struct request_queue *q) +void blk_mq_debugfs_register_sched(struct request_queue *q) { struct elevator_type *e = q->elevator->type;
- if (!q->debugfs_dir) - return -ENOENT; - if (!e->queue_debugfs_attrs) - return 0; + return;
q->sched_debugfs_dir = debugfs_create_dir("sched", q->debugfs_dir); - if (!q->sched_debugfs_dir) - return -ENOMEM; - - if (!debugfs_create_files(q->sched_debugfs_dir, q, - e->queue_debugfs_attrs)) - goto err;
- return 0; - -err: - blk_mq_debugfs_unregister_sched(q); - return -ENOMEM; + debugfs_create_files(q->sched_debugfs_dir, q, e->queue_debugfs_attrs); }
void blk_mq_debugfs_unregister_sched(struct request_queue *q) @@ -996,27 +943,18 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q) q->sched_debugfs_dir = NULL; }
-int blk_mq_debugfs_register_sched_hctx(struct request_queue *q, - struct blk_mq_hw_ctx *hctx) +void blk_mq_debugfs_register_sched_hctx(struct request_queue *q, + struct blk_mq_hw_ctx *hctx) { struct elevator_type *e = q->elevator->type;
- if (!hctx->debugfs_dir) - return -ENOENT; - if (!e->hctx_debugfs_attrs) - return 0; + return;
hctx->sched_debugfs_dir = debugfs_create_dir("sched", hctx->debugfs_dir); - if (!hctx->sched_debugfs_dir) - return -ENOMEM; - - if (!debugfs_create_files(hctx->sched_debugfs_dir, hctx, - e->hctx_debugfs_attrs)) - return -ENOMEM; - - return 0; + debugfs_create_files(hctx->sched_debugfs_dir, hctx, + e->hctx_debugfs_attrs); }
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx) diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h index 70549712b0a2..5cbe1573d47b 100644 --- a/block/blk-mq-debugfs.h +++ b/block/blk-mq-debugfs.h @@ -19,21 +19,21 @@ void debugfs_rq_show(struct seq_file *m, struct request *rq); int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq); int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
-int blk_mq_debugfs_register(struct request_queue *q); +void blk_mq_debugfs_register(struct request_queue *q); void blk_mq_debugfs_unregister(struct request_queue *q); -int blk_mq_debugfs_register_hctx(struct request_queue *q, +void blk_mq_debugfs_register_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx); void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx); -int blk_mq_debugfs_register_hctxs(struct request_queue *q); +void blk_mq_debugfs_register_hctxs(struct request_queue *q); void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
-int blk_mq_debugfs_register_sched(struct request_queue *q); +void blk_mq_debugfs_register_sched(struct request_queue *q); void blk_mq_debugfs_unregister_sched(struct request_queue *q); -int blk_mq_debugfs_register_sched_hctx(struct request_queue *q, +void blk_mq_debugfs_register_sched_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx); void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
-bool debugfs_create_files(struct dentry *parent, void *data, +void debugfs_create_files(struct dentry *parent, void *data, const struct blk_mq_debugfs_attr *attr);
static inline bool blk_mq_debugfs_enabled(struct request_queue *q) @@ -42,47 +42,42 @@ static inline bool blk_mq_debugfs_enabled(struct request_queue *q) }
#else -static inline int blk_mq_debugfs_register(struct request_queue *q) +static inline void blk_mq_debugfs_register(struct request_queue *q) { - return 0; }
static inline void blk_mq_debugfs_unregister(struct request_queue *q) { }
-static inline int blk_mq_debugfs_register_hctx(struct request_queue *q, - struct blk_mq_hw_ctx *hctx) +static inline void blk_mq_debugfs_register_hctx(struct request_queue *q, + struct blk_mq_hw_ctx *hctx) { - return 0; }
static inline void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx) { }
-static inline int blk_mq_debugfs_register_hctxs(struct request_queue *q) +static inline void blk_mq_debugfs_register_hctxs(struct request_queue *q) { - return 0; }
static inline void blk_mq_debugfs_unregister_hctxs(struct request_queue *q) { }
-static inline int blk_mq_debugfs_register_sched(struct request_queue *q) +static inline void blk_mq_debugfs_register_sched(struct request_queue *q) { - return 0; }
static inline void blk_mq_debugfs_unregister_sched(struct request_queue *q) { }
-static inline int blk_mq_debugfs_register_sched_hctx(struct request_queue *q, - struct blk_mq_hw_ctx *hctx) +static inline void blk_mq_debugfs_register_sched_hctx(struct request_queue *q, + struct blk_mq_hw_ctx *hctx) { - return 0; }
static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
mainline inclusion from mainline-v5.2 commit 7e41c3c9b6ceb2da52ba9d2b328d1851f269a48e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
When the blk-mq debugfs file creation logic was "cleaned up" it was cleaned up too much, causing the queue file to not be created in the correct location. Turns out the check for the directory being present is needed as if that has not happened yet, the files should not be created, and the function will be called later on in the initialization code so that the files can be created in the correct location.
Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-mq-debugfs.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 88c9e47417a4..c5cf90a18f6d 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -929,6 +929,13 @@ void blk_mq_debugfs_register_sched(struct request_queue *q) { struct elevator_type *e = q->elevator->type;
+ /* + * If the parent directory has not been created yet, return, we will be + * called again later on and the directory/files will be created then. + */ + if (!q->debugfs_dir) + return; + if (!e->queue_debugfs_attrs) return;
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v6.0-rc1 commit f3ec5d11554778c24ac8915e847223ed71d104fc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues when gendisk isn't added yet, such as nvme tcp.
Fixes the warning of 'debugfs: Directory 'hctx0' with parent '/' already present!' which can be observed reliably when running blktests nvme/005.
Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220711090808.259682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-mq-debugfs.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index c5cf90a18f6d..e14f3e4500ff 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -891,6 +891,9 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q, char name[20]; int i;
+ if (!q->debugfs_dir) + return; + snprintf(name, sizeof(name), "hctx%u", hctx->queue_num); hctx->debugfs_dir = debugfs_create_dir(name, q->debugfs_dir);
From: Luis Chamberlain mcgrof@kernel.org
mainline inclusion from mainline-v5.9-rc1 commit 85e0cbbb8a79537dbc465e9deb449a08b2b092a6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
We were only creating the request_queue debugfs_dir only for make_request block drivers (multiqueue), but never for request-based block drivers. We did this as we were only creating non-blktrace additional debugfs files on that directory for make_request drivers. However, since blktrace *always* creates that directory anyway, we special-case the use of that directory on blktrace. Other than this being an eye-sore, this exposes request-based block drivers to the same debugfs fragile race that used to exist with make_request block drivers where if we start adding files onto that directory we can later run a race with a double removal of dentries on the directory if we don't deal with this carefully on blktrace.
Instead, just simplify things by always creating the request_queue debugfs_dir on request_queue registration. Rename the mutex also to reflect the fact that this is used outside of the blktrace context.
Signed-off-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: block/blk-core.c block/blk-sysfs.c block/blk.h include/linux/blkdev.h kernel/trace/blktrace.c [Context conflicts] Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-core.c | 8 +------ block/blk-mq-debugfs.c | 5 ----- block/blk-sysfs.c | 9 ++++++++ block/blk.h | 2 -- include/linux/blkdev.h | 4 ++-- kernel/trace/blktrace.c | 49 +++++++++++++++++++---------------------- 6 files changed, 35 insertions(+), 42 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index 0c74101424dc..835496adf694 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -45,9 +45,7 @@ #include "blk-rq-qos.h" #include "blk-io-hierarchy/stats.h"
-#ifdef CONFIG_DEBUG_FS struct dentry *blk_debugfs_root; -#endif
EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); @@ -1346,9 +1344,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
kobject_init(&q->kobj, &blk_queue_ktype);
-#ifdef CONFIG_BLK_DEV_IO_TRACE - mutex_init(&q->blk_trace_mutex); -#endif + mutex_init(&q->debugfs_mutex); mutex_init(&q->sysfs_lock); mutex_init(&q_wrapper->sysfs_dir_lock); spin_lock_init(&q->__queue_lock); @@ -4307,9 +4303,7 @@ int __init blk_dev_init(void)
init_blk_queue_async_dispatch();
-#ifdef CONFIG_DEBUG_FS blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif
return 0; } diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index e14f3e4500ff..231b58534d6f 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -841,9 +841,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i;
- q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
/* @@ -867,9 +864,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; }
static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 719687a394ea..e1683d1a3b01 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -11,6 +11,7 @@ #include <linux/blktrace_api.h> #include <linux/blk-mq.h> #include <linux/blk-cgroup.h> +#include <linux/debugfs.h> #include <linux/atomic.h>
#include "blk.h" @@ -895,6 +896,9 @@ static void __blk_release_queue(struct work_struct *work) }
blk_trace_shutdown(q); + mutex_lock(&q->debugfs_mutex); + debugfs_remove_recursive(q->debugfs_dir); + mutex_unlock(&q->debugfs_mutex);
if (q->mq_ops) blk_mq_debugfs_unregister(q); @@ -968,6 +972,11 @@ int blk_register_queue(struct gendisk *disk) goto unlock; }
+ mutex_lock(&q->debugfs_mutex); + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); + mutex_unlock(&q->debugfs_mutex); + if (q->mq_ops) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); diff --git a/block/blk.h b/block/blk.h index 99a57be83765..30146e2099e3 100644 --- a/block/blk.h +++ b/block/blk.h @@ -15,9 +15,7 @@ /* Max future timer expiry for timeouts */ #define BLK_MAX_TIMEOUT (5 * HZ)
-#ifdef CONFIG_DEBUG_FS extern struct dentry *blk_debugfs_root; -#endif
struct blk_flush_queue { unsigned int flush_queue_delayed:1; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 241f59eb5b64..eda670342060 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -641,9 +641,9 @@ struct request_queue { unsigned int sg_timeout; unsigned int sg_reserved_size; int node; + struct mutex debugfs_mutex; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; - struct mutex blk_trace_mutex; #endif /* * for flush operations @@ -689,8 +689,8 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split;
-#ifdef CONFIG_BLK_DEBUG_FS struct dentry *debugfs_dir; +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; #endif
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index ab824d742bfd..897890e8f774 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -381,7 +381,7 @@ static int __blk_trace_remove(struct request_queue *q) struct blk_trace *bt;
bt = rcu_replace_pointer(q->blk_trace, NULL, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); if (!bt) return -EINVAL;
@@ -394,9 +394,9 @@ int blk_trace_remove(struct request_queue *q) { int ret;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex); ret = __blk_trace_remove(q); - mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex);
return ret; } @@ -515,14 +515,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct dentry *dir = NULL; int ret;
- lockdep_assert_held(&q->blk_trace_mutex); + lockdep_assert_held(&q->debugfs_mutex);
if (!buts->buf_size || !buts->buf_nr) return -EINVAL;
- if (!blk_debugfs_root) - return -ENOENT; - strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
@@ -537,7 +534,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, * we can be. */ if (rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex))) { + lockdep_is_held(&q->debugfs_mutex))) { pr_warn("Concurrent blktraces are not allowed on %s\n", buts->name); return -EBUSY; @@ -653,9 +650,9 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev, { int ret;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex); ret = __blk_trace_setup(q, name, dev, bdev, arg); - mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex);
return ret; } @@ -700,7 +697,7 @@ static int __blk_trace_startstop(struct request_queue *q, int start) struct blk_trace *bt;
bt = rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); if (bt == NULL) return -EINVAL;
@@ -714,9 +711,9 @@ int blk_trace_startstop(struct request_queue *q, int start) { int ret;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex); ret = __blk_trace_startstop(q, start); - mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex);
return ret; } @@ -745,7 +742,7 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) if (!q) return -ENXIO;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex);
switch (cmd) { case BLKTRACESETUP: @@ -771,7 +768,7 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) break; }
- mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex); return ret; }
@@ -782,12 +779,12 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) **/ void blk_trace_shutdown(struct request_queue *q) { - mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex); if (rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex))) + lockdep_is_held(&q->debugfs_mutex))) __blk_trace_remove(q);
- mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex); }
#ifdef CONFIG_BLK_CGROUP @@ -1671,7 +1668,7 @@ static int blk_trace_remove_queue(struct request_queue *q) struct blk_trace *bt;
bt = rcu_replace_pointer(q->blk_trace, NULL, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); if (bt == NULL) return -EINVAL;
@@ -1848,10 +1845,10 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, if (q == NULL) goto out_bdput;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex);
bt = rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); if (attr == &dev_attr_enable) { ret = sprintf(buf, "%u\n", !!bt); goto out_unlock_bdev; @@ -1869,7 +1866,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, ret = sprintf(buf, "%llu\n", bt->end_lba);
out_unlock_bdev: - mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex); out_bdput: bdput(bdev); out: @@ -1912,10 +1909,10 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, if (q == NULL) goto out_bdput;
- mutex_lock(&q->blk_trace_mutex); + mutex_lock(&q->debugfs_mutex);
bt = rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); if (attr == &dev_attr_enable) { if (!!value == !!bt) { ret = 0; @@ -1932,7 +1929,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, if (bt == NULL) { ret = blk_trace_setup_queue(q, bdev); bt = rcu_dereference_protected(q->blk_trace, - lockdep_is_held(&q->blk_trace_mutex)); + lockdep_is_held(&q->debugfs_mutex)); }
if (ret == 0) { @@ -1947,7 +1944,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, }
out_unlock_bdev: - mutex_unlock(&q->blk_trace_mutex); + mutex_unlock(&q->debugfs_mutex); out_bdput: bdput(bdev); out:
From: Saravanan D saravanand@fb.com
mainline inclusion from mainline-v5.13-rc1 commit 1e91e28e374d0b0b912154c192716374609360d9 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
blk_mq_debugfs_register_sched_hctx() called from device_add_disk()->elevator_init_mq()->blk_mq_init_sched() initialization sequence does not have relevant parent directory setup and thus spuriously attempts "sched" directory creation from root mount of debugfs for every hw queue detected on the block device
dmesg ... debugfs: Directory 'sched' with parent '/' already present! debugfs: Directory 'sched' with parent '/' already present! . . debugfs: Directory 'sched' with parent '/' already present! ...
The parent debugfs directory for hw queues get properly setup device_add_disk()->blk_register_queue()->blk_mq_debugfs_register() ->blk_mq_debugfs_register_hctx() later in the block device initialization sequence.
A simple check for debugfs_dir has been added to thwart premature debugfs directory/file creation attempts.
Signed-off-by: Saravanan D saravanand@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-mq-debugfs.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 231b58534d6f..8bb6045d8f0d 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -953,6 +953,14 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q, { struct elevator_type *e = q->elevator->type;
+ /* + * If the parent debugfs directory has not been created yet, return; + * We will be called again later on with appropriate parent debugfs + * directory from blk_register_queue() + */ + if (!hctx->debugfs_dir) + return; + if (!e->hctx_debugfs_attrs) return;
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.19-rc4 commit 5cf9c91ba927119fc6606b938b1895bb2459d3bc category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and removing the main queue debugfs directory. Use the existing debugfs_mutex to serialize all debugfs operations that rely on q->debugfs_dir or the directories hanging off it.
To make the teardown code a little simpler declare all debugfs dentry pointers and not just the main one uncoditionally in blkdev.h.
Move debugfs_mutex next to the dentries that it protects and document what it is used for.
Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: block/blk-mq-debugfs.h block/blk-mq-debugfs.c block/blk-mq-sched.c block/blk-rq-qos.c block/blk-rq-qos.h block/blk-sysfs.c block/blk-io-hierarchy/debugfs.c block/blk-io-hierarchy/stats.c include/linux/blkdev.h kernel/trace/blktrace.c [Context conflicts] Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-io-hierarchy/debugfs.c | 6 ++++++ block/blk-io-hierarchy/stats.c | 8 ++++++++ block/blk-mq-debugfs.c | 17 ++++++++++++----- block/blk-mq-debugfs.h | 5 ----- block/blk-mq-sched.c | 11 +++++++++++ block/blk-sysfs.c | 19 +++++++++---------- include/linux/blkdev.h | 7 ++++--- kernel/trace/blktrace.c | 3 --- 8 files changed, 50 insertions(+), 26 deletions(-)
diff --git a/block/blk-io-hierarchy/debugfs.c b/block/blk-io-hierarchy/debugfs.c index 29c17e116773..ba2f4af49d1d 100644 --- a/block/blk-io-hierarchy/debugfs.c +++ b/block/blk-io-hierarchy/debugfs.c @@ -198,6 +198,8 @@ void blk_mq_debugfs_register_hierarchy(struct request_queue *q, struct blk_io_hierarchy_stats *stats = queue_to_wrapper(q)->io_hierarchy_stats;
+ lockdep_assert_held(&q->debugfs_mutex); + if (!blk_mq_hierarchy_registered(q, stage) || !blk_mq_debugfs_enabled(q)) return; @@ -211,6 +213,8 @@ void blk_mq_debugfs_unregister_hierarchy(struct request_queue *q, struct blk_io_hierarchy_stats *stats = queue_to_wrapper(q)->io_hierarchy_stats;
+ lockdep_assert_held(&q->debugfs_mutex); + if (!blk_mq_hierarchy_registered(q, stage) || !blk_mq_debugfs_enabled(q)) return; @@ -223,6 +227,8 @@ void blk_mq_debugfs_create_default_hierarchy_attr(struct request_queue *q) struct blk_io_hierarchy_stats *stats = queue_to_wrapper(q)->io_hierarchy_stats;
+ lockdep_assert_held(&q->debugfs_mutex); + if (!blk_mq_debugfs_enabled(q)) return;
diff --git a/block/blk-io-hierarchy/stats.c b/block/blk-io-hierarchy/stats.c index b9e79b435149..9b6b735fd5bf 100644 --- a/block/blk-io-hierarchy/stats.c +++ b/block/blk-io-hierarchy/stats.c @@ -33,6 +33,8 @@ void blk_mq_debugfs_register_hierarchy_stats(struct request_queue *q) struct blk_io_hierarchy_stats *stats; enum stage_group stage;
+ lockdep_assert_held(&q->debugfs_mutex); + stats = queue_to_wrapper(q)->io_hierarchy_stats; if (!stats || !blk_mq_debugfs_enabled(q)) return; @@ -203,8 +205,10 @@ void blk_mq_register_hierarchy(struct request_queue *q, enum stage_group stage)
blk_mq_freeze_queue(q);
+ mutex_lock(&q->debugfs_mutex); WRITE_ONCE(stats->hstage[stage], hstage); blk_mq_debugfs_register_hierarchy(q, stage); + mutex_unlock(&q->debugfs_mutex);
blk_mq_unfreeze_queue(q); } @@ -220,6 +224,8 @@ void blk_mq_unregister_hierarchy(struct request_queue *q, if (!blk_mq_hierarchy_registered(q, stage)) return;
+ mutex_lock(&q->debugfs_mutex); + blk_mq_debugfs_unregister_hierarchy(q, stage); blk_io_hierarchy_iodump_exit(q, stage);
@@ -230,6 +236,8 @@ void blk_mq_unregister_hierarchy(struct request_queue *q, spin_unlock(&stats->hstage_lock);
kfree(hstage); + + mutex_unlock(&q->debugfs_mutex); } EXPORT_SYMBOL_GPL(blk_mq_unregister_hierarchy);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 8bb6045d8f0d..5347620fe7c7 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -862,11 +862,6 @@ void blk_mq_debugfs_register(struct request_queue *q) blk_mq_debugfs_register_hierarchy_stats(q); }
-void blk_mq_debugfs_unregister(struct request_queue *q) -{ - q->sched_debugfs_dir = NULL; -} - static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx) { @@ -900,6 +895,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx) { + if (!hctx->queue->debugfs_dir) + return; debugfs_remove_recursive(hctx->debugfs_dir); hctx->sched_debugfs_dir = NULL; hctx->debugfs_dir = NULL; @@ -927,6 +924,8 @@ void blk_mq_debugfs_register_sched(struct request_queue *q) { struct elevator_type *e = q->elevator->type;
+ lockdep_assert_held(&q->debugfs_mutex); + /* * If the parent directory has not been created yet, return, we will be * called again later on and the directory/files will be created then. @@ -944,6 +943,8 @@ void blk_mq_debugfs_register_sched(struct request_queue *q)
void blk_mq_debugfs_unregister_sched(struct request_queue *q) { + lockdep_assert_held(&q->debugfs_mutex); + debugfs_remove_recursive(q->sched_debugfs_dir); q->sched_debugfs_dir = NULL; } @@ -953,6 +954,8 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q, { struct elevator_type *e = q->elevator->type;
+ lockdep_assert_held(&q->debugfs_mutex); + /* * If the parent debugfs directory has not been created yet, return; * We will be called again later on with appropriate parent debugfs @@ -972,6 +975,10 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx) { + lockdep_assert_held(&hctx->queue->debugfs_mutex); + + if (!hctx->queue->debugfs_dir) + return; debugfs_remove_recursive(hctx->sched_debugfs_dir); hctx->sched_debugfs_dir = NULL; } diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h index 5cbe1573d47b..4ef149cf5cfb 100644 --- a/block/blk-mq-debugfs.h +++ b/block/blk-mq-debugfs.h @@ -20,7 +20,6 @@ int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq); int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
void blk_mq_debugfs_register(struct request_queue *q); -void blk_mq_debugfs_unregister(struct request_queue *q); void blk_mq_debugfs_register_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx); void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx); @@ -46,10 +45,6 @@ static inline void blk_mq_debugfs_register(struct request_queue *q) { }
-static inline void blk_mq_debugfs_unregister(struct request_queue *q) -{ -} - static inline void blk_mq_debugfs_register_hctx(struct request_queue *q, struct blk_mq_hw_ctx *hctx) { diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 1c8befbe7b69..443d92e8982a 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -609,7 +609,9 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) if (ret) goto err;
+ mutex_lock(&q->debugfs_mutex); blk_mq_debugfs_register_sched(q); + mutex_unlock(&q->debugfs_mutex);
queue_for_each_hw_ctx(q, hctx, i) { if (e->ops.mq.init_hctx) { @@ -621,7 +623,9 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) return ret; } } + mutex_lock(&q->debugfs_mutex); blk_mq_debugfs_register_sched_hctx(q, hctx); + mutex_unlock(&q->debugfs_mutex); }
return 0; @@ -638,13 +642,20 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e) unsigned int i;
queue_for_each_hw_ctx(q, hctx, i) { + mutex_lock(&q->debugfs_mutex); blk_mq_debugfs_unregister_sched_hctx(hctx); + mutex_unlock(&q->debugfs_mutex); + if (e->type->ops.mq.exit_hctx && hctx->sched_data) { e->type->ops.mq.exit_hctx(hctx, i); hctx->sched_data = NULL; } } + + mutex_lock(&q->debugfs_mutex); blk_mq_debugfs_unregister_sched(q); + mutex_unlock(&q->debugfs_mutex); + if (e->type->ops.mq.exit_sched) e->type->ops.mq.exit_sched(e); blk_mq_sched_tags_teardown(q); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index e1683d1a3b01..a2a24757fc4e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -895,14 +895,13 @@ static void __blk_release_queue(struct work_struct *work) blk_mq_release(q); }
- blk_trace_shutdown(q); mutex_lock(&q->debugfs_mutex); + blk_trace_shutdown(q); debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; + q->sched_debugfs_dir = NULL; mutex_unlock(&q->debugfs_mutex);
- if (q->mq_ops) - blk_mq_debugfs_unregister(q); - bioset_exit(&q->bio_split);
ida_simple_remove(&blk_queue_ida, q->id); @@ -972,17 +971,17 @@ int blk_register_queue(struct gendisk *disk) goto unlock; }
+ if (q->mq_ops) + __blk_mq_register_dev(dev, q); + mutex_lock(&q->sysfs_lock); + mutex_lock(&q->debugfs_mutex); q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), blk_debugfs_root); - mutex_unlock(&q->debugfs_mutex); - - if (q->mq_ops) { - __blk_mq_register_dev(dev, q); + if (q->mq_ops) blk_mq_debugfs_register(q); - } + mutex_unlock(&q->debugfs_mutex);
- mutex_lock(&q->sysfs_lock); if (q->request_fn || (q->mq_ops && q->elevator)) { ret = elv_register_queue(q, false); if (ret) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index eda670342060..9948b3248e96 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -641,7 +641,6 @@ struct request_queue { unsigned int sg_timeout; unsigned int sg_reserved_size; int node; - struct mutex debugfs_mutex; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; #endif @@ -690,9 +689,11 @@ struct request_queue { struct bio_set bio_split;
struct dentry *debugfs_dir; -#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; -#endif + /* + * Serializes all debugfs metadata operations using the above dentries. + */ + struct mutex debugfs_mutex;
bool mq_sysfs_init_done;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 897890e8f774..8310695efd3d 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -779,12 +779,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg) **/ void blk_trace_shutdown(struct request_queue *q) { - mutex_lock(&q->debugfs_mutex); if (rcu_dereference_protected(q->blk_trace, lockdep_is_held(&q->debugfs_mutex))) __blk_trace_remove(q); - - mutex_unlock(&q->debugfs_mutex); }
#ifdef CONFIG_BLK_CGROUP
From: Christoph Hellwig hch@lst.de
mainline inclusion from mainline-v5.19-rc4 commit 99d055b4fd4bbb309c6cdb51a0d420669f777944 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
The block debugfs files are created in blk_register_queue, which is called by add_disk and use a naming scheme based on the disk_name. After del_gendisk returns that name can be reused and thus we must not leave these debugfs files around, otherwise the kernel is unhappy and spews messages like:
Directory XXXXX with parent 'block' already present!
and the newly created devices will not have working debugfs files.
Move the unregistration to blk_unregister_queue instead (which matches the sysfs unregistration) to make sure the debugfs life time rules match those of the disk name.
As part of the move also make sure the whole debugfs unregistration is inside a single debugfs_mutex critical section.
Note that this breaks blktests block/002, which checks that the debugfs directory has not been removed while blktests is running, but that particular check should simply be removed from the test case.
Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220614074827.458955-4-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Conflicts: block/blk-mq-debugfs.h block/blk-mq-debugfs.c block/blk-rq-qos.c block/blk-sysfs.c [Context conflicts] Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-sysfs.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index a2a24757fc4e..225823ddd810 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -895,13 +895,6 @@ static void __blk_release_queue(struct work_struct *work) blk_mq_release(q); }
- mutex_lock(&q->debugfs_mutex); - blk_trace_shutdown(q); - debugfs_remove_recursive(q->debugfs_dir); - q->debugfs_dir = NULL; - q->sched_debugfs_dir = NULL; - mutex_unlock(&q->debugfs_mutex); - bioset_exit(&q->bio_split);
ida_simple_remove(&blk_queue_ida, q->id); @@ -1076,6 +1069,13 @@ void blk_unregister_queue(struct gendisk *disk) mutex_unlock(&q->sysfs_lock); mutex_unlock(&queue_to_wrapper(q)->sysfs_dir_lock);
+ mutex_lock(&q->debugfs_mutex); + blk_trace_shutdown(q); + debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; + q->sched_debugfs_dir = NULL; + mutex_unlock(&q->debugfs_mutex); + /* Now that we've deleted all child objects, we can delete the queue. */ kobject_uevent(&q->kobj, KOBJ_REMOVE); kobject_del(&q->kobj);
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
----------------------------------------
Commit 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue") move blk_trace_shutdown() from blk_release_queue() to blk_unregister_queue(). However, blktrace can still be enabled through ioctl after blk_unregister_queue(), and blktrace will be leaked in this case.
Fix the problem by calling blk_trace_shutdown() in blk_release_queue().
Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-sysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 225823ddd810..b23d82fbe736 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -895,6 +895,10 @@ static void __blk_release_queue(struct work_struct *work) blk_mq_release(q); }
+ mutex_lock(&q->debugfs_mutex); + blk_trace_shutdown(q); + mutex_unlock(&q->debugfs_mutex); + bioset_exit(&q->bio_split);
ida_simple_remove(&blk_queue_ida, q->id);
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
----------------------------------------
All operations to create and remove files under 'q->debugfs_dir' should be protected by 'q->debugfs_mutex'.
Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-mq-debugfs.c | 8 ++++++++ block/blk-mq.c | 2 ++ 2 files changed, 10 insertions(+)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 5347620fe7c7..8f99b216ca0c 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -881,6 +881,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q, char name[20]; int i;
+ lockdep_assert_held(&q->debugfs_mutex); + if (!q->debugfs_dir) return;
@@ -895,6 +897,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx) { + lockdep_assert_held(&hctx->queue->debugfs_mutex); + if (!hctx->queue->debugfs_dir) return; debugfs_remove_recursive(hctx->debugfs_dir); @@ -907,8 +911,10 @@ void blk_mq_debugfs_register_hctxs(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i;
+ mutex_lock(&q->debugfs_mutex); queue_for_each_hw_ctx(q, hctx, i) blk_mq_debugfs_register_hctx(q, hctx); + mutex_unlock(&q->debugfs_mutex); }
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q) @@ -916,8 +922,10 @@ void blk_mq_debugfs_unregister_hctxs(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i;
+ mutex_lock(&q->debugfs_mutex); queue_for_each_hw_ctx(q, hctx, i) blk_mq_debugfs_unregister_hctx(hctx); + mutex_unlock(&q->debugfs_mutex); }
void blk_mq_debugfs_register_sched(struct request_queue *q) diff --git a/block/blk-mq.c b/block/blk-mq.c index 8502e7495d58..0392b014ac89 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2428,7 +2428,9 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, queue_for_each_hw_ctx(q, hctx, i) { if (i == nr_queue) break; + mutex_lock(&q->debugfs_mutex); blk_mq_debugfs_unregister_hctx(hctx); + mutex_unlock(&q->debugfs_mutex); blk_mq_exit_hctx(q, set, hctx, i); } }
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
----------------------------------------
The field 'blk_trace_mutex' is moved and renamed.
Signed-off-by: Yu Kuai yukuai3@huawei.com --- include/linux/blkdev.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9948b3248e96..e6d199cdfd4c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -643,6 +643,14 @@ struct request_queue { int node; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; +#endif + /* + * Serializes all debugfs metadata operations using the above dentries. + */ +#ifndef __GENKSYMS__ + struct mutex debugfs_mutex; +#else + struct mutex blk_trace_mutex; #endif /* * for flush operations @@ -690,10 +698,6 @@ struct request_queue {
struct dentry *debugfs_dir; struct dentry *sched_debugfs_dir; - /* - * Serializes all debugfs metadata operations using the above dentries. - */ - struct mutex debugfs_mutex;
bool mq_sysfs_init_done;
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/11645 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/5...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/11645 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/5...