Linux mainline-5.13 uses sbitmap to track the queue depth of SCSI devices. This patch set backports the mainline patches to OLK-5.10 and fixes some KABI issues.
Tests on scsi_debug show that this method can improve IO performance by more than 30%.
Bart Van Assche (2): scsi: sbitmap: Silence a debug kernel warning triggered by sbitmap_put() scsi: core: Fix scsi_device_max_queue_depth()
Kashyap Desai (1): scsi: megaraid_sas: Replace sdev_busy with local counter
Ming Lei (12): scsi: sbitmap: Remove sbitmap_clear_bit_unlock scsi: sbitmap: Maintain allocation round_robin in sbitmap scsi: sbitmap: Add helpers for updating allocation hint scsi: sbitmap: Move allocation hint into sbitmap scsi: sbitmap: Export sbitmap_weight scsi: sbitmap: Add sbitmap_calculate_shift() helper scsi: blk-mq: Add callbacks for storing & retrieving budget token scsi: blk-mq: Return budget token from .get_budget callback scsi: core: Add scsi_device_busy() wrapper scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024) scsi: core: Replace sdev->device_busy with sbitmap scsi: blk-mq: Fix build warning when making htmldocs
Pavel Begunkov (1): sbitmap: optimise sbitmap_deferred_clear()
Sumit Saxena (1): scsi: core: Increase max device queue_depth to 4096
Zheng Qixing (3): sbitmap: fix kabi broken by adding struct sbitmap_extend sbitmap: fix kabi broken in struct blk_mq_ops and struct scsi_cmnd sbitmap: fix kabi broken in struct scsi_device
block/blk-mq-sched.c | 17 +- block/blk-mq.c | 38 ++- block/blk-mq.h | 25 +- block/kyber-iosched.c | 3 +- drivers/message/fusion/mptsas.c | 2 +- drivers/scsi/megaraid/megaraid_sas.h | 2 + drivers/scsi/megaraid/megaraid_sas_fusion.c | 47 +++- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- drivers/scsi/scsi.c | 13 ++ drivers/scsi/scsi_lib.c | 71 ++++-- drivers/scsi/scsi_priv.h | 3 + drivers/scsi/scsi_scan.c | 45 +++- drivers/scsi/scsi_sysfs.c | 4 +- drivers/scsi/sg.c | 2 +- drivers/vhost/scsi.c | 4 +- include/linux/blk-mq.h | 17 +- include/linux/sbitmap.h | 91 ++++++-- include/scsi/scsi_cmnd.h | 2 +- include/scsi/scsi_device.h | 13 +- lib/sbitmap.c | 243 +++++++++++--------- 20 files changed, 448 insertions(+), 196 deletions(-)
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 4ec591790356f0e5a95f8d278b0cfd04aea2ae52 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
No one uses this helper any more, so kill it.
Link: https://lore.kernel.org/r/20210122023317.687987-2-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Hannes Reinecke hare@suse.de Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/sbitmap.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index a0c8bd1c83ad..71f43f0069dc 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -323,12 +323,6 @@ static inline void sbitmap_deferred_clear_bit(struct sbitmap *sb, unsigned int b set_bit(SB_NR_TO_BIT(sb, bitnr), addr); }
-static inline void sbitmap_clear_bit_unlock(struct sbitmap *sb, - unsigned int bitnr) -{ - clear_bit_unlock(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); -} - static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) { return test_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr));
From: Pavel Begunkov asml.silence@gmail.com
mainline inclusion from mainline-v5.11-rc1 commit b78beea038a3087df63bba7adaacb476a8ca95af category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Because of spinlocks and atomics sbitmap_deferred_clear() have to reload &sb->map[index] on each access even though the map address won't change. Pass in sbitmap_word instead of {sb, index}, so it's cached in a variable. It also improves code generation of sbitmap_find_bit_in_index().
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- lib/sbitmap.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 904697c64961..849ee1ccd13c 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -12,32 +12,32 @@ /* * See if we have deferred clears that we can batch move */ -static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index) +static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) { unsigned long mask, val; bool ret = false; unsigned long flags;
- spin_lock_irqsave(&sb->map[index].swap_lock, flags); + spin_lock_irqsave(&map->swap_lock, flags);
- if (!sb->map[index].cleared) + if (!map->cleared) goto out_unlock;
/* * First get a stable cleared mask, setting the old mask to 0. */ - mask = xchg(&sb->map[index].cleared, 0); + mask = xchg(&map->cleared, 0);
/* * Now clear the masked bits in our free word */ do { - val = sb->map[index].word; - } while (cmpxchg(&sb->map[index].word, val, val & ~mask) != val); + val = map->word; + } while (cmpxchg(&map->word, val, val & ~mask) != val);
ret = true; out_unlock: - spin_unlock_irqrestore(&sb->map[index].swap_lock, flags); + spin_unlock_irqrestore(&map->swap_lock, flags); return ret; }
@@ -92,7 +92,7 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth) unsigned int i;
for (i = 0; i < sb->map_nr; i++) - sbitmap_deferred_clear(sb, i); + sbitmap_deferred_clear(&sb->map[i]);
sb->depth = depth; sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); @@ -139,15 +139,15 @@ static int __sbitmap_get_word(unsigned long *word, unsigned long depth, static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index, unsigned int alloc_hint, bool round_robin) { + struct sbitmap_word *map = &sb->map[index]; int nr;
do { - nr = __sbitmap_get_word(&sb->map[index].word, - sb->map[index].depth, alloc_hint, + nr = __sbitmap_get_word(&map->word, map->depth, alloc_hint, !round_robin); if (nr != -1) break; - if (!sbitmap_deferred_clear(sb, index)) + if (!sbitmap_deferred_clear(map)) break; } while (1);
@@ -207,7 +207,7 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint, break; }
- if (sbitmap_deferred_clear(sb, index)) + if (sbitmap_deferred_clear(&sb->map[index])) goto again;
/* Jump to next index. */
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit efe1f3a1d5833c0ddd61ee50dbef8908f65a0a5e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Currently the allocation round_robin info is maintained by sbitmap_queue.
However, bit allocation really belongs to sbitmap. Move it there.
Link: https://lore.kernel.org/r/20210122023317.687987-3-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Hannes Reinecke hare@suse.de Cc: virtualization@lists.linux-foundation.org Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- block/blk-mq.c | 2 +- block/kyber-iosched.c | 3 ++- drivers/vhost/scsi.c | 4 ++-- include/linux/sbitmap.h | 20 ++++++++++---------- lib/sbitmap.c | 28 ++++++++++++++-------------- 5 files changed, 29 insertions(+), 28 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index 376f1661645c..e20fa54b9de9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2967,7 +2967,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, goto free_cpumask;
if (sbitmap_init_node(&hctx->ctx_map, nr_cpu_ids, ilog2(8), - gfp, node)) + gfp, node, false)) goto free_ctxs; hctx->nr_ctx = 0;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 3036f087ec7b..7a807225399f 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -476,7 +476,8 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
for (i = 0; i < KYBER_NUM_DOMAINS; i++) { if (sbitmap_init_node(&khd->kcq_map[i], hctx->nr_ctx, - ilog2(8), GFP_KERNEL, hctx->numa_node)) { + ilog2(8), GFP_KERNEL, hctx->numa_node, + false)) { while (--i >= 0) sbitmap_free(&khd->kcq_map[i]); goto err_kcqs; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 4ce9f00ae10e..ab230f6f79e8 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -614,7 +614,7 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, return ERR_PTR(-EIO); }
- tag = sbitmap_get(&svq->scsi_tags, 0, false); + tag = sbitmap_get(&svq->scsi_tags, 0); if (tag < 0) { pr_err("Unable to obtain tag for vhost_scsi_cmd\n"); return ERR_PTR(-ENOMEM); @@ -1512,7 +1512,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) return 0;
if (sbitmap_init_node(&svq->scsi_tags, max_cmds, -1, GFP_KERNEL, - NUMA_NO_NODE)) + NUMA_NO_NODE, false)) return -ENOMEM; svq->max_cmds = max_cmds;
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 71f43f0069dc..f39bfece1c44 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -62,6 +62,11 @@ struct sbitmap { */ unsigned int map_nr;
+ /** + * @round_robin: Allocate bits in strict round-robin order. + */ + bool round_robin; + /** * @map: Allocated bitmap. */ @@ -132,11 +137,6 @@ struct sbitmap_queue { */ atomic_t ws_active;
- /** - * @round_robin: Allocate bits in strict round-robin order. - */ - bool round_robin; - /** * @min_shallow_depth: The minimum shallow depth which may be passed to * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). @@ -152,11 +152,14 @@ struct sbitmap_queue { * given, a good default is chosen. * @flags: Allocation flags. * @node: Memory node to allocate on. + * @round_robin: If true, be stricter about allocation order; always allocate + * starting from the last allocated bit. This is less efficient + * than the default behavior (false). * * Return: Zero on success or negative errno on failure. */ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, - gfp_t flags, int node); + gfp_t flags, int node, bool round_robin);
/** * sbitmap_free() - Free memory used by a &struct sbitmap. @@ -182,15 +185,12 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth); * sbitmap_get() - Try to allocate a free bit from a &struct sbitmap. * @sb: Bitmap to allocate from. * @alloc_hint: Hint for where to start searching for a free bit. - * @round_robin: If true, be stricter about allocation order; always allocate - * starting from the last allocated bit. This is less efficient - * than the default behavior (false). * * This operation provides acquire barrier semantics if it succeeds. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ -int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin); +int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint);
/** * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap, diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 849ee1ccd13c..2d1e545ed97f 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -42,7 +42,7 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) }
int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, - gfp_t flags, int node) + gfp_t flags, int node, bool round_robin) { unsigned int bits_per_word; unsigned int i; @@ -67,6 +67,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, sb->shift = shift; sb->depth = depth; sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); + sb->round_robin = round_robin;
if (depth == 0) { sb->map = NULL; @@ -137,14 +138,14 @@ static int __sbitmap_get_word(unsigned long *word, unsigned long depth, }
static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index, - unsigned int alloc_hint, bool round_robin) + unsigned int alloc_hint) { struct sbitmap_word *map = &sb->map[index]; int nr;
do { nr = __sbitmap_get_word(&map->word, map->depth, alloc_hint, - !round_robin); + !sb->round_robin); if (nr != -1) break; if (!sbitmap_deferred_clear(map)) @@ -154,7 +155,7 @@ static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index, return nr; }
-int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin) +int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint) { unsigned int i, index; int nr = -1; @@ -166,14 +167,13 @@ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin) * alloc_hint to find the right word index. No point in looping * twice in find_next_zero_bit() for that case. */ - if (round_robin) + if (sb->round_robin) alloc_hint = SB_NR_TO_BIT(sb, alloc_hint); else alloc_hint = 0;
for (i = 0; i < sb->map_nr; i++) { - nr = sbitmap_find_bit_in_index(sb, index, alloc_hint, - round_robin); + nr = sbitmap_find_bit_in_index(sb, index, alloc_hint); if (nr != -1) { nr += index << sb->shift; break; @@ -358,7 +358,8 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, int ret; int i;
- ret = sbitmap_init_node(&sbq->sb, depth, shift, flags, node); + ret = sbitmap_init_node(&sbq->sb, depth, shift, flags, node, + round_robin); if (ret) return ret;
@@ -390,7 +391,6 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, atomic_set(&sbq->ws[i].wait_cnt, sbq->wake_batch); }
- sbq->round_robin = round_robin; return 0; } EXPORT_SYMBOL_GPL(sbitmap_queue_init_node); @@ -452,12 +452,12 @@ int __sbitmap_queue_get(struct sbitmap_queue *sbq) hint = depth ? prandom_u32() % depth : 0; this_cpu_write(*sbq->alloc_hint, hint); } - nr = sbitmap_get(&sbq->sb, hint, sbq->round_robin); + nr = sbitmap_get(&sbq->sb, hint);
if (nr == -1) { /* If the map is full, a hint won't do us much good. */ this_cpu_write(*sbq->alloc_hint, 0); - } else if (nr == hint || unlikely(sbq->round_robin)) { + } else if (nr == hint || unlikely(sbq->sb.round_robin)) { /* Only update the hint if we used it. */ hint = nr + 1; if (hint >= depth - 1) @@ -488,7 +488,7 @@ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, if (nr == -1) { /* If the map is full, a hint won't do us much good. */ this_cpu_write(*sbq->alloc_hint, 0); - } else if (nr == hint || unlikely(sbq->round_robin)) { + } else if (nr == hint || unlikely(sbq->sb.round_robin)) { /* Only update the hint if we used it. */ hint = nr + 1; if (hint >= depth - 1) @@ -627,7 +627,7 @@ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, smp_mb__after_atomic(); sbitmap_queue_wake_up(sbq);
- if (likely(!sbq->round_robin && nr < sbq->sb.depth)) + if (likely(!sbq->sb.round_robin && nr < sbq->sb.depth)) *per_cpu_ptr(sbq->alloc_hint, cpu) = nr; } EXPORT_SYMBOL_GPL(sbitmap_queue_clear); @@ -684,7 +684,7 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m) } seq_puts(m, "}\n");
- seq_printf(m, "round_robin=%d\n", sbq->round_robin); + seq_printf(m, "round_robin=%d\n", sbq->sb.round_robin); seq_printf(m, "min_shallow_depth=%u\n", sbq->min_shallow_depth); } EXPORT_SYMBOL_GPL(sbitmap_queue_show);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit bf2c4282a10a92810ba83e85677a5273d6ca0df5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Add helpers for updating allocation hint so that we can avoid duplicate code.
Prepare for moving allocation hint into sbitmap.
Link: https://lore.kernel.org/r/20210122023317.687987-4-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Hannes Reinecke hare@suse.de Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- lib/sbitmap.c | 93 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 39 deletions(-)
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 2d1e545ed97f..578986a905b3 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -9,6 +9,55 @@ #include <linux/sbitmap.h> #include <linux/seq_file.h>
+static int init_alloc_hint(struct sbitmap_queue *sbq, gfp_t flags) +{ + unsigned depth = sbq->sb.depth; + + sbq->alloc_hint = alloc_percpu_gfp(unsigned int, flags); + if (!sbq->alloc_hint) + return -ENOMEM; + + if (depth && !sbq->sb.round_robin) { + int i; + + for_each_possible_cpu(i) + *per_cpu_ptr(sbq->alloc_hint, i) = prandom_u32() % depth; + } + + return 0; +} + +static inline unsigned update_alloc_hint_before_get(struct sbitmap_queue *sbq, + unsigned int depth) +{ + unsigned hint; + + hint = this_cpu_read(*sbq->alloc_hint); + if (unlikely(hint >= depth)) { + hint = depth ? prandom_u32() % depth : 0; + this_cpu_write(*sbq->alloc_hint, hint); + } + + return hint; +} + +static inline void update_alloc_hint_after_get(struct sbitmap_queue *sbq, + unsigned int depth, + unsigned int hint, + unsigned int nr) +{ + if (nr == -1) { + /* If the map is full, a hint won't do us much good. */ + this_cpu_write(*sbq->alloc_hint, 0); + } else if (nr == hint || unlikely(sbq->sb.round_robin)) { + /* Only update the hint if we used it. */ + hint = nr + 1; + if (hint >= depth - 1) + hint = 0; + this_cpu_write(*sbq->alloc_hint, hint); + } +} + /* * See if we have deferred clears that we can batch move */ @@ -363,17 +412,11 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, if (ret) return ret;
- sbq->alloc_hint = alloc_percpu_gfp(unsigned int, flags); - if (!sbq->alloc_hint) { + if (init_alloc_hint(sbq, flags) != 0) { sbitmap_free(&sbq->sb); return -ENOMEM; }
- if (depth && !round_robin) { - for_each_possible_cpu(i) - *per_cpu_ptr(sbq->alloc_hint, i) = prandom_u32() % depth; - } - sbq->min_shallow_depth = UINT_MAX; sbq->wake_batch = sbq_calc_wake_batch(sbq, depth); atomic_set(&sbq->wake_index, 0); @@ -446,24 +489,10 @@ int __sbitmap_queue_get(struct sbitmap_queue *sbq) unsigned int hint, depth; int nr;
- hint = this_cpu_read(*sbq->alloc_hint); depth = READ_ONCE(sbq->sb.depth); - if (unlikely(hint >= depth)) { - hint = depth ? prandom_u32() % depth : 0; - this_cpu_write(*sbq->alloc_hint, hint); - } + hint = update_alloc_hint_before_get(sbq, depth); nr = sbitmap_get(&sbq->sb, hint); - - if (nr == -1) { - /* If the map is full, a hint won't do us much good. */ - this_cpu_write(*sbq->alloc_hint, 0); - } else if (nr == hint || unlikely(sbq->sb.round_robin)) { - /* Only update the hint if we used it. */ - hint = nr + 1; - if (hint >= depth - 1) - hint = 0; - this_cpu_write(*sbq->alloc_hint, hint); - } + update_alloc_hint_after_get(sbq, depth, hint, nr);
return nr; } @@ -477,24 +506,10 @@ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
WARN_ON_ONCE(shallow_depth < sbq->min_shallow_depth);
- hint = this_cpu_read(*sbq->alloc_hint); depth = READ_ONCE(sbq->sb.depth); - if (unlikely(hint >= depth)) { - hint = depth ? prandom_u32() % depth : 0; - this_cpu_write(*sbq->alloc_hint, hint); - } + hint = update_alloc_hint_before_get(sbq, depth); nr = sbitmap_get_shallow(&sbq->sb, hint, shallow_depth); - - if (nr == -1) { - /* If the map is full, a hint won't do us much good. */ - this_cpu_write(*sbq->alloc_hint, 0); - } else if (nr == hint || unlikely(sbq->sb.round_robin)) { - /* Only update the hint if we used it. */ - hint = nr + 1; - if (hint >= depth - 1) - hint = 0; - this_cpu_write(*sbq->alloc_hint, hint); - } + update_alloc_hint_after_get(sbq, depth, hint, nr);
return nr; }
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit c548e62bcf6adc7066ff201e9ecc88e536dd8890 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Allocation hint should have belonged to sbitmap. Also, when sbitmap's depth is high and there is no need to use mulitple wakeup queues, user can benefit from percpu allocation hint too.
Move allocation hint into sbitmap, then SCSI device queue can benefit from allocation hint when converting to plain sbitmap.
Convert vhost/scsi.c to use sbitmap allocation with percpu alloc hint. This is more efficient than the previous approach.
Link: https://lore.kernel.org/r/20210122023317.687987-5-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Mike Christie michael.christie@oracle.com Cc: virtualization@lists.linux-foundation.org Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com
Conflicts: include/linux/sbitmap.h [The conflict here is due to inconsistencies in the context caused by commit dde4fe5675233 ("kabi: Add kabi reservation for storage module").] Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- block/blk-mq.c | 2 +- block/kyber-iosched.c | 2 +- drivers/vhost/scsi.c | 4 +- include/linux/sbitmap.h | 41 +++++++++------ lib/sbitmap.c | 112 +++++++++++++++++++++++----------------- 5 files changed, 96 insertions(+), 65 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index e20fa54b9de9..412a0023b610 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2967,7 +2967,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, goto free_cpumask;
if (sbitmap_init_node(&hctx->ctx_map, nr_cpu_ids, ilog2(8), - gfp, node, false)) + gfp, node, false, false)) goto free_ctxs; hctx->nr_ctx = 0;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 7a807225399f..fbf8f7e00241 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -477,7 +477,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) for (i = 0; i < KYBER_NUM_DOMAINS; i++) { if (sbitmap_init_node(&khd->kcq_map[i], hctx->nr_ctx, ilog2(8), GFP_KERNEL, hctx->numa_node, - false)) { + false, false)) { while (--i >= 0) sbitmap_free(&khd->kcq_map[i]); goto err_kcqs; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index ab230f6f79e8..8531751b28dd 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -614,7 +614,7 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, return ERR_PTR(-EIO); }
- tag = sbitmap_get(&svq->scsi_tags, 0); + tag = sbitmap_get(&svq->scsi_tags); if (tag < 0) { pr_err("Unable to obtain tag for vhost_scsi_cmd\n"); return ERR_PTR(-ENOMEM); @@ -1512,7 +1512,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) return 0;
if (sbitmap_init_node(&svq->scsi_tags, max_cmds, -1, GFP_KERNEL, - NUMA_NO_NODE, false)) + NUMA_NO_NODE, false, true)) return -ENOMEM; svq->max_cmds = max_cmds;
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index f39bfece1c44..dc69fa86a75e 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -72,6 +72,14 @@ struct sbitmap { */ struct sbitmap_word *map;
+ /* + * @alloc_hint: Cache of last successfully allocated or freed bit. + * + * This is per-cpu, which allows multiple users to stick to different + * cachelines until the map is exhausted. + */ + unsigned int __percpu *alloc_hint; + KABI_RESERVE(1) };
@@ -108,14 +116,6 @@ struct sbitmap_queue { */ struct sbitmap sb;
- /* - * @alloc_hint: Cache of last successfully allocated or freed bit. - * - * This is per-cpu, which allows multiple users to stick to different - * cachelines until the map is exhausted. - */ - unsigned int __percpu *alloc_hint; - /** * @wake_batch: Number of bits which must be freed before we wake up any * waiters. @@ -155,11 +155,13 @@ struct sbitmap_queue { * @round_robin: If true, be stricter about allocation order; always allocate * starting from the last allocated bit. This is less efficient * than the default behavior (false). + * @alloc_hint: If true, apply percpu hint for where to start searching for + * a free bit. * * Return: Zero on success or negative errno on failure. */ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, - gfp_t flags, int node, bool round_robin); + gfp_t flags, int node, bool round_robin, bool alloc_hint);
/** * sbitmap_free() - Free memory used by a &struct sbitmap. @@ -167,6 +169,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, */ static inline void sbitmap_free(struct sbitmap *sb) { + free_percpu(sb->alloc_hint); kfree(sb->map); sb->map = NULL; } @@ -184,19 +187,17 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth); /** * sbitmap_get() - Try to allocate a free bit from a &struct sbitmap. * @sb: Bitmap to allocate from. - * @alloc_hint: Hint for where to start searching for a free bit. * * This operation provides acquire barrier semantics if it succeeds. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ -int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint); +int sbitmap_get(struct sbitmap *sb);
/** * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap, * limiting the depth used from each word. * @sb: Bitmap to allocate from. - * @alloc_hint: Hint for where to start searching for a free bit. * @shallow_depth: The maximum number of bits to allocate from a single word. * * This rather specific operation allows for having multiple users with @@ -208,8 +209,7 @@ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint); * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ -int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint, - unsigned long shallow_depth); +int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth);
/** * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap. @@ -323,6 +323,18 @@ static inline void sbitmap_deferred_clear_bit(struct sbitmap *sb, unsigned int b set_bit(SB_NR_TO_BIT(sb, bitnr), addr); }
+/* + * Pair of sbitmap_get, and this one applies both cleared bit and + * allocation hint. + */ +static inline void sbitmap_put(struct sbitmap *sb, unsigned int bitnr) +{ + sbitmap_deferred_clear_bit(sb, bitnr); + + if (likely(sb->alloc_hint && !sb->round_robin && bitnr < sb->depth)) + *this_cpu_ptr(sb->alloc_hint) = bitnr; +} + static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) { return test_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); @@ -371,7 +383,6 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, static inline void sbitmap_queue_free(struct sbitmap_queue *sbq) { kfree(sbq->ws); - free_percpu(sbq->alloc_hint); sbitmap_free(&sbq->sb); }
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 578986a905b3..56d4921eaf5e 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -9,52 +9,51 @@ #include <linux/sbitmap.h> #include <linux/seq_file.h>
-static int init_alloc_hint(struct sbitmap_queue *sbq, gfp_t flags) +static int init_alloc_hint(struct sbitmap *sb, gfp_t flags) { - unsigned depth = sbq->sb.depth; + unsigned depth = sb->depth;
- sbq->alloc_hint = alloc_percpu_gfp(unsigned int, flags); - if (!sbq->alloc_hint) + sb->alloc_hint = alloc_percpu_gfp(unsigned int, flags); + if (!sb->alloc_hint) return -ENOMEM;
- if (depth && !sbq->sb.round_robin) { + if (depth && !sb->round_robin) { int i;
for_each_possible_cpu(i) - *per_cpu_ptr(sbq->alloc_hint, i) = prandom_u32() % depth; + *per_cpu_ptr(sb->alloc_hint, i) = prandom_u32() % depth; } - return 0; }
-static inline unsigned update_alloc_hint_before_get(struct sbitmap_queue *sbq, +static inline unsigned update_alloc_hint_before_get(struct sbitmap *sb, unsigned int depth) { unsigned hint;
- hint = this_cpu_read(*sbq->alloc_hint); + hint = this_cpu_read(*sb->alloc_hint); if (unlikely(hint >= depth)) { hint = depth ? prandom_u32() % depth : 0; - this_cpu_write(*sbq->alloc_hint, hint); + this_cpu_write(*sb->alloc_hint, hint); }
return hint; }
-static inline void update_alloc_hint_after_get(struct sbitmap_queue *sbq, +static inline void update_alloc_hint_after_get(struct sbitmap *sb, unsigned int depth, unsigned int hint, unsigned int nr) { if (nr == -1) { /* If the map is full, a hint won't do us much good. */ - this_cpu_write(*sbq->alloc_hint, 0); - } else if (nr == hint || unlikely(sbq->sb.round_robin)) { + this_cpu_write(*sb->alloc_hint, 0); + } else if (nr == hint || unlikely(sb->round_robin)) { /* Only update the hint if we used it. */ hint = nr + 1; if (hint >= depth - 1) hint = 0; - this_cpu_write(*sbq->alloc_hint, hint); + this_cpu_write(*sb->alloc_hint, hint); } }
@@ -91,7 +90,8 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) }
int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, - gfp_t flags, int node, bool round_robin) + gfp_t flags, int node, bool round_robin, + bool alloc_hint) { unsigned int bits_per_word; unsigned int i; @@ -123,9 +123,18 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, return 0; }
+ if (alloc_hint) { + if (init_alloc_hint(sb, flags)) + return -ENOMEM; + } else { + sb->alloc_hint = NULL; + } + sb->map = kcalloc_node(sb->map_nr, sizeof(*sb->map), flags, node); - if (!sb->map) + if (!sb->map) { + free_percpu(sb->alloc_hint); return -ENOMEM; + }
for (i = 0; i < sb->map_nr; i++) { sb->map[i].depth = min(depth, bits_per_word); @@ -204,7 +213,7 @@ static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index, return nr; }
-int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint) +static int __sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint) { unsigned int i, index; int nr = -1; @@ -236,10 +245,27 @@ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint)
return nr; } + +int sbitmap_get(struct sbitmap *sb) +{ + int nr; + unsigned int hint, depth; + + if (WARN_ON_ONCE(unlikely(!sb->alloc_hint))) + return -1; + + depth = READ_ONCE(sb->depth); + hint = update_alloc_hint_before_get(sb, depth); + nr = __sbitmap_get(sb, hint); + update_alloc_hint_after_get(sb, depth, hint, nr); + + return nr; +} EXPORT_SYMBOL_GPL(sbitmap_get);
-int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint, - unsigned long shallow_depth) +static int __sbitmap_get_shallow(struct sbitmap *sb, + unsigned int alloc_hint, + unsigned long shallow_depth) { unsigned int i, index; int nr = -1; @@ -271,6 +297,22 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
return nr; } + +int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth) +{ + int nr; + unsigned int hint, depth; + + if (WARN_ON_ONCE(unlikely(!sb->alloc_hint))) + return -1; + + depth = READ_ONCE(sb->depth); + hint = update_alloc_hint_before_get(sb, depth); + nr = __sbitmap_get_shallow(sb, hint, shallow_depth); + update_alloc_hint_after_get(sb, depth, hint, nr); + + return nr; +} EXPORT_SYMBOL_GPL(sbitmap_get_shallow);
bool sbitmap_any_bit_set(const struct sbitmap *sb) @@ -408,15 +450,10 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, int i;
ret = sbitmap_init_node(&sbq->sb, depth, shift, flags, node, - round_robin); + round_robin, true); if (ret) return ret;
- if (init_alloc_hint(sbq, flags) != 0) { - sbitmap_free(&sbq->sb); - return -ENOMEM; - } - sbq->min_shallow_depth = UINT_MAX; sbq->wake_batch = sbq_calc_wake_batch(sbq, depth); atomic_set(&sbq->wake_index, 0); @@ -424,7 +461,6 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
sbq->ws = kzalloc_node(SBQ_WAIT_QUEUES * sizeof(*sbq->ws), flags, node); if (!sbq->ws) { - free_percpu(sbq->alloc_hint); sbitmap_free(&sbq->sb); return -ENOMEM; } @@ -486,32 +522,16 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_resize);
int __sbitmap_queue_get(struct sbitmap_queue *sbq) { - unsigned int hint, depth; - int nr; - - depth = READ_ONCE(sbq->sb.depth); - hint = update_alloc_hint_before_get(sbq, depth); - nr = sbitmap_get(&sbq->sb, hint); - update_alloc_hint_after_get(sbq, depth, hint, nr); - - return nr; + return sbitmap_get(&sbq->sb); } EXPORT_SYMBOL_GPL(__sbitmap_queue_get);
int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, unsigned int shallow_depth) { - unsigned int hint, depth; - int nr; - WARN_ON_ONCE(shallow_depth < sbq->min_shallow_depth);
- depth = READ_ONCE(sbq->sb.depth); - hint = update_alloc_hint_before_get(sbq, depth); - nr = sbitmap_get_shallow(&sbq->sb, hint, shallow_depth); - update_alloc_hint_after_get(sbq, depth, hint, nr); - - return nr; + return sbitmap_get_shallow(&sbq->sb, shallow_depth); } EXPORT_SYMBOL_GPL(__sbitmap_queue_get_shallow);
@@ -643,7 +663,7 @@ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, sbitmap_queue_wake_up(sbq);
if (likely(!sbq->sb.round_robin && nr < sbq->sb.depth)) - *per_cpu_ptr(sbq->alloc_hint, cpu) = nr; + *per_cpu_ptr(sbq->sb.alloc_hint, cpu) = nr; } EXPORT_SYMBOL_GPL(sbitmap_queue_clear);
@@ -681,7 +701,7 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m) if (!first) seq_puts(m, ", "); first = false; - seq_printf(m, "%u", *per_cpu_ptr(sbq->alloc_hint, i)); + seq_printf(m, "%u", *per_cpu_ptr(sbq->sb.alloc_hint, i)); } seq_puts(m, "}\n");
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit cbb9950b41dd9dfb7c2be3429ba09f83b8b1ff98 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
SCSI's .device_busy will be converted to sbitmap and sbitmap_weight is needed. Export the helper.
The only existing user of sbitmap_weight() uses it to find out how many bits are set and not cleared. Align sbitmap_weight() meaning with this usage model.
Link: https://lore.kernel.org/r/20210122023317.687987-6-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/sbitmap.h | 10 ++++++++++ lib/sbitmap.c | 11 ++++++----- 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index dc69fa86a75e..1b9447640574 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -349,6 +349,16 @@ static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) */ void sbitmap_show(struct sbitmap *sb, struct seq_file *m);
+ +/** + * sbitmap_weight() - Return how many set and not cleared bits in a &struct + * sbitmap. + * @sb: Bitmap to check. + * + * Return: How many set and not cleared bits set + */ +unsigned int sbitmap_weight(const struct sbitmap *sb); + /** * sbitmap_bitmap_show() - Write a hex dump of a &struct sbitmap to a &struct * seq_file. diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 56d4921eaf5e..c2e8065761fb 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -342,20 +342,21 @@ static unsigned int __sbitmap_weight(const struct sbitmap *sb, bool set) return weight; }
-static unsigned int sbitmap_weight(const struct sbitmap *sb) +static unsigned int sbitmap_cleared(const struct sbitmap *sb) { - return __sbitmap_weight(sb, true); + return __sbitmap_weight(sb, false); }
-static unsigned int sbitmap_cleared(const struct sbitmap *sb) +unsigned int sbitmap_weight(const struct sbitmap *sb) { - return __sbitmap_weight(sb, false); + return __sbitmap_weight(sb, true) - sbitmap_cleared(sb); } +EXPORT_SYMBOL_GPL(sbitmap_weight);
void sbitmap_show(struct sbitmap *sb, struct seq_file *m) { seq_printf(m, "depth=%u\n", sb->depth); - seq_printf(m, "busy=%u\n", sbitmap_weight(sb) - sbitmap_cleared(sb)); + seq_printf(m, "busy=%u\n", sbitmap_weight(sb)); seq_printf(m, "cleared=%u\n", sbitmap_cleared(sb)); seq_printf(m, "bits_per_word=%u\n", 1U << sb->shift); seq_printf(m, "map_nr=%u\n", sb->map_nr);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 2d13b1ea9f4affdaa7af0e0e4a1358d28f80c54f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Move code for calculating default shift into a public helper which can be used by SCSI.
Link: https://lore.kernel.org/r/20210122023317.687987-7-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/sbitmap.h | 18 ++++++++++++++++++ lib/sbitmap.c | 16 +++------------- 2 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 1b9447640574..9208ccbed720 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -340,6 +340,24 @@ static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) return test_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); }
+static inline int sbitmap_calculate_shift(unsigned int depth) +{ + int shift = ilog2(BITS_PER_LONG); + + /* + * If the bitmap is small, shrink the number of bits per word so + * we spread over a few cachelines, at least. If less than 4 + * bits, just forget about it, it's not going to work optimally + * anyway. + */ + if (depth >= 4) { + while ((4U << shift) > depth) + shift--; + } + + return shift; +} + /** * sbitmap_show() - Dump &struct sbitmap information to a &struct seq_file. * @sb: Bitmap to show. diff --git a/lib/sbitmap.c b/lib/sbitmap.c index c2e8065761fb..62d3b5d5e009 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -96,19 +96,9 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, unsigned int bits_per_word; unsigned int i;
- if (shift < 0) { - shift = ilog2(BITS_PER_LONG); - /* - * If the bitmap is small, shrink the number of bits per word so - * we spread over a few cachelines, at least. If less than 4 - * bits, just forget about it, it's not going to work optimally - * anyway. - */ - if (depth >= 4) { - while ((4U << shift) > depth) - shift--; - } - } + if (shift < 0) + shift = sbitmap_calculate_shift(depth); + bits_per_word = 1U << shift; if (bits_per_word > BITS_PER_LONG) return -EINVAL;
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit d022d18c045fc2ccf92d0f14cf80f98eb0a8e119 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Since SCSI is the only driver which requires dispatch budget move the token from struct request to struct scsi_cmnd.
Link: https://lore.kernel.org/r/20210122023317.687987-8-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Hannes Reinecke hare@suse.de Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi_lib.c | 18 ++++++++++++++++++ include/linux/blk-mq.h | 9 +++++++++ include/scsi/scsi_cmnd.h | 2 ++ 3 files changed, 29 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 2732b0c3f0e2..6bf9b6189515 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1645,6 +1645,20 @@ static bool scsi_mq_get_budget(struct request_queue *q) return false; }
+static void scsi_mq_set_rq_budget_token(struct request *req, int token) +{ + struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); + + cmd->budget_token = token; +} + +static int scsi_mq_get_rq_budget_token(struct request *req) +{ + struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); + + return cmd->budget_token; +} + static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { @@ -1854,6 +1868,8 @@ static const struct blk_mq_ops scsi_mq_ops_no_commit = { .cleanup_rq = scsi_cleanup_rq, .busy = scsi_mq_lld_busy, .map_queues = scsi_map_queues, + .set_rq_budget_token = scsi_mq_set_rq_budget_token, + .get_rq_budget_token = scsi_mq_get_rq_budget_token, };
@@ -1882,6 +1898,8 @@ static const struct blk_mq_ops scsi_mq_ops = { .cleanup_rq = scsi_cleanup_rq, .busy = scsi_mq_lld_busy, .map_queues = scsi_map_queues, + .set_rq_budget_token = scsi_mq_set_rq_budget_token, + .get_rq_budget_token = scsi_mq_get_rq_budget_token, };
struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 289ec99e376a..6284345f01ca 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -340,6 +340,15 @@ struct blk_mq_ops { */ void (*put_budget)(struct request_queue *);
+ /* + * @set_rq_budget_toekn: store rq's budget token + */ + void (*set_rq_budget_token)(struct request *, int); + /* + * @get_rq_budget_toekn: retrieve rq's budget token + */ + int (*get_rq_budget_token)(struct request *); + /** * @timeout: Called on request timeout. */ diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 89999ebbb291..49e238cd477e 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -76,6 +76,8 @@ struct scsi_cmnd {
int eh_eflags; /* Used by error handlr */
+ int budget_token; + /* * This is set to jiffies as it was when the command was first * allocated. It is used to time how long the command has
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 2a5a24aa83382a88c43d18a901fab66e6ffe1199 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
SCSI uses a global atomic variable to track queue depth for each LUN/request queue.
This doesn't scale well when there are lots of CPU cores and the disk is very fast. It has been observed that IOPS is affected a lot by tracking queue depth via sdev->device_busy in the I/O path.
Return budget token from .get_budget callback. The budget token can be passed to driver so that we can replace the atomic variable with sbitmap_queue and alleviate the scaling problems that way.
Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com
Conflicts: block/blk-mq.c [The conflict here is due to inconsistencies in the context caused by 65267b13c8b22 ("blk-mq: Fix potential io hung for shared sbitmap per tagset").] block/blk-mq.h [The conflict here is due to inconsistencies in the context caused by badcb143e9e6b ("block: update nsecs[] in part_") and 2ed9d17cc07fc ("block : don't use cmpxchg64() on").] Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- block/blk-mq-sched.c | 17 +++++++++++++---- block/blk-mq.c | 36 +++++++++++++++++++++++++----------- block/blk-mq.h | 25 +++++++++++++++++++++---- drivers/scsi/scsi_lib.c | 16 +++++++++++----- include/linux/blk-mq.h | 4 ++-- 5 files changed, 72 insertions(+), 26 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index c92d25b71a72..0a37d15caae4 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -131,6 +131,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
do { struct request *rq; + int budget_token;
if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; @@ -140,12 +141,13 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) break; }
- if (!blk_mq_get_dispatch_budget(q)) + budget_token = blk_mq_get_dispatch_budget(q); + if (budget_token < 0) break;
rq = e->type->ops.dispatch_request(hctx); if (!rq) { - blk_mq_put_dispatch_budget(q); + blk_mq_put_dispatch_budget(q, budget_token); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the @@ -157,6 +159,8 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) break; }
+ blk_mq_set_rq_budget_token(rq, budget_token); + /* * Now this rq owns the budget which has to be released * if this rq won't be queued to driver via .queue_rq() @@ -237,6 +241,8 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) struct request *rq;
do { + int budget_token; + if (!list_empty_careful(&hctx->dispatch)) { ret = -EAGAIN; break; @@ -245,12 +251,13 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) if (!sbitmap_any_bit_set(&hctx->ctx_map)) break;
- if (!blk_mq_get_dispatch_budget(q)) + budget_token = blk_mq_get_dispatch_budget(q); + if (budget_token < 0) break;
rq = blk_mq_dequeue_from_ctx(hctx, ctx); if (!rq) { - blk_mq_put_dispatch_budget(q); + blk_mq_put_dispatch_budget(q, budget_token); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the @@ -262,6 +269,8 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) break; }
+ blk_mq_set_rq_budget_token(rq, budget_token); + /* * Now this rq owns the budget which has to be released * if this rq won't be queued to driver via .queue_rq() diff --git a/block/blk-mq.c b/block/blk-mq.c index 412a0023b610..5a9c02d0199c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1391,10 +1391,15 @@ static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq, bool need_budget) { struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + int budget_token = -1;
- if (need_budget && !blk_mq_get_dispatch_budget(rq->q)) { - blk_mq_put_driver_tag(rq); - return PREP_DISPATCH_NO_BUDGET; + if (need_budget) { + budget_token = blk_mq_get_dispatch_budget(rq->q); + if (budget_token < 0) { + blk_mq_put_driver_tag(rq); + return PREP_DISPATCH_NO_BUDGET; + } + blk_mq_set_rq_budget_token(rq, budget_token); }
if (!blk_mq_get_driver_tag(rq)) { @@ -1411,7 +1416,7 @@ static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq, * together during handling partial dispatch */ if (need_budget) - blk_mq_put_dispatch_budget(rq->q); + blk_mq_put_dispatch_budget(rq->q, budget_token); return PREP_DISPATCH_NO_TAG; } } @@ -1421,12 +1426,16 @@ static enum prep_dispatch blk_mq_prep_dispatch_rq(struct request *rq,
/* release all allocated budgets before calling to blk_mq_dispatch_rq_list */ static void blk_mq_release_budgets(struct request_queue *q, - unsigned int nr_budgets) + struct list_head *list) { - int i; + struct request *rq;
- for (i = 0; i < nr_budgets; i++) - blk_mq_put_dispatch_budget(q); + list_for_each_entry(rq, list, queuelist) { + int budget_token = blk_mq_get_rq_budget_token(rq); + + if (budget_token >= 0) + blk_mq_put_dispatch_budget(q, budget_token); + } }
/* @@ -1529,7 +1538,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) || blk_mq_is_sbitmap_shared(hctx->flags));
- blk_mq_release_budgets(q, nr_budgets); + if (nr_budgets) + blk_mq_release_budgets(q, list);
spin_lock(&hctx->lock); list_splice_tail_init(list, &hctx->dispatch); @@ -2179,6 +2189,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, { struct request_queue *q = rq->q; bool run_queue = true; + int budget_token;
/* * RCU or SRCU read lock is needed before checking quiesced flag. @@ -2196,11 +2207,14 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (q->elevator && !bypass_insert) goto insert;
- if (!blk_mq_get_dispatch_budget(q)) + budget_token = blk_mq_get_dispatch_budget(q); + if (budget_token < 0) goto insert;
+ blk_mq_set_rq_budget_token(rq, budget_token); + if (!blk_mq_get_driver_tag(rq)) { - blk_mq_put_dispatch_budget(q); + blk_mq_put_dispatch_budget(q, budget_token); goto insert; }
diff --git a/block/blk-mq.h b/block/blk-mq.h index 7bb0b82bfbe9..20bceea10081 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -203,17 +203,34 @@ unsigned int blk_mq_in_flight_with_stat(struct request_queue *q, struct hd_struct *part); #endif
-static inline void blk_mq_put_dispatch_budget(struct request_queue *q) +static inline void blk_mq_put_dispatch_budget(struct request_queue *q, + int budget_token) { if (q->mq_ops->put_budget) - q->mq_ops->put_budget(q); + q->mq_ops->put_budget(q, budget_token); }
-static inline bool blk_mq_get_dispatch_budget(struct request_queue *q) +static inline int blk_mq_get_dispatch_budget(struct request_queue *q) { if (q->mq_ops->get_budget) return q->mq_ops->get_budget(q); - return true; + return 0; +} + +static inline void blk_mq_set_rq_budget_token(struct request *rq, int token) +{ + if (token < 0) + return; + + if (rq->q->mq_ops->set_rq_budget_token) + rq->q->mq_ops->set_rq_budget_token(rq, token); +} + +static inline int blk_mq_get_rq_budget_token(struct request *rq) +{ + if (rq->q->mq_ops->get_rq_budget_token) + return rq->q->mq_ops->get_rq_budget_token(rq); + return -1; }
static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 6bf9b6189515..ba5780359930 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -334,6 +334,7 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd) atomic_dec(&starget->target_busy);
atomic_dec(&sdev->device_busy); + cmd->budget_token = -1; }
static void scsi_kick_queue(struct request_queue *q) @@ -1145,6 +1146,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd) unsigned long jiffies_at_alloc; int retries, to_clear; bool in_flight; + int budget_token = cmd->budget_token;
if (!blk_rq_is_scsi(rq) && !(flags & SCMD_INITIALIZED)) { flags |= SCMD_INITIALIZED; @@ -1174,6 +1176,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd) cmd->retries = retries; if (in_flight) __set_bit(SCMD_STATE_INFLIGHT, &cmd->state); + cmd->budget_token = budget_token;
}
@@ -1608,19 +1611,19 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) blk_mq_complete_request(cmd->request); }
-static void scsi_mq_put_budget(struct request_queue *q) +static void scsi_mq_put_budget(struct request_queue *q, int budget_token) { struct scsi_device *sdev = q->queuedata;
atomic_dec(&sdev->device_busy); }
-static bool scsi_mq_get_budget(struct request_queue *q) +static int scsi_mq_get_budget(struct request_queue *q) { struct scsi_device *sdev = q->queuedata;
if (scsi_dev_queue_ready(q, sdev)) - return true; + return 0;
atomic_inc(&sdev->restarts);
@@ -1642,7 +1645,7 @@ static bool scsi_mq_get_budget(struct request_queue *q) if (unlikely(atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))) blk_mq_delay_run_hw_queues(sdev->request_queue, SCSI_QUEUE_DELAY); - return false; + return -1; }
static void scsi_mq_set_rq_budget_token(struct request *req, int token) @@ -1670,6 +1673,8 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, blk_status_t ret; int reason;
+ WARN_ON_ONCE(cmd->budget_token < 0); + /* * If the device is not in running state we will reject some or all * commands. @@ -1721,7 +1726,8 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, if (scsi_target(sdev)->can_queue > 0) atomic_dec(&scsi_target(sdev)->target_busy); out_put_budget: - scsi_mq_put_budget(q); + scsi_mq_put_budget(q, cmd->budget_token); + cmd->budget_token = -1; switch (ret) { case BLK_STS_OK: break; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 6284345f01ca..f25fb564c24f 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -333,12 +333,12 @@ struct blk_mq_ops { * reserved budget. Also we have to handle failure case * of .get_budget for avoiding I/O deadlock. */ - bool (*get_budget)(struct request_queue *); + int (*get_budget)(struct request_queue *);
/** * @put_budget: Release the reserved budget. */ - void (*put_budget)(struct request_queue *); + void (*put_budget)(struct request_queue *, int);
/* * @set_rq_budget_toekn: store rq's budget token
From: Kashyap Desai kashyap.desai@broadcom.com
mainline inclusion from mainline-v5.13-rc1 commit 6cb9b15238a389a8892a6ed08f5c68a0ac45d720 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Use local tracking of per-sdev outstanding command since sdev_busy in SCSI mid layer is improved for performance reason using sbitmap (earlier it was atomic variable).
Link: https://lore.kernel.org/r/20210122023317.687987-11-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Kashyap Desai kashyap.desai@broadcom.com Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/megaraid/megaraid_sas.h | 2 + drivers/scsi/megaraid/megaraid_sas_fusion.c | 47 +++++++++++++++++---- 2 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index f78cb87e46fa..40767eface53 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -2021,10 +2021,12 @@ union megasas_frame { * struct MR_PRIV_DEVICE - sdev private hostdata * @is_tm_capable: firmware managed tm_capable flag * @tm_busy: TM request is in progress + * @sdev_priv_busy: pending command per sdev */ struct MR_PRIV_DEVICE { bool is_tm_capable; bool tm_busy; + atomic_t sdev_priv_busy; atomic_t r1_ldio_hint; u8 interface_type; u8 task_abort_tmo; diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 85942c145502..badbff720df7 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -220,6 +220,40 @@ megasas_clear_intr_fusion(struct megasas_instance *instance) return 1; }
+static inline void +megasas_sdev_busy_inc(struct megasas_instance *instance, + struct scsi_cmnd *scmd) +{ + if (instance->perf_mode == MR_BALANCED_PERF_MODE) { + struct MR_PRIV_DEVICE *mr_device_priv_data = + scmd->device->hostdata; + atomic_inc(&mr_device_priv_data->sdev_priv_busy); + } +} + +static inline void +megasas_sdev_busy_dec(struct megasas_instance *instance, + struct scsi_cmnd *scmd) +{ + if (instance->perf_mode == MR_BALANCED_PERF_MODE) { + struct MR_PRIV_DEVICE *mr_device_priv_data = + scmd->device->hostdata; + atomic_dec(&mr_device_priv_data->sdev_priv_busy); + } +} + +static inline int +megasas_sdev_busy_read(struct megasas_instance *instance, + struct scsi_cmnd *scmd) +{ + if (instance->perf_mode == MR_BALANCED_PERF_MODE) { + struct MR_PRIV_DEVICE *mr_device_priv_data = + scmd->device->hostdata; + return atomic_read(&mr_device_priv_data->sdev_priv_busy); + } + return 0; +} + /** * megasas_get_cmd_fusion - Get a command from the free pool * @instance: Adapter soft state @@ -357,15 +391,9 @@ megasas_get_msix_index(struct megasas_instance *instance, struct megasas_cmd_fusion *cmd, u8 data_arms) { - int sdev_busy; - - /* TBD - if sml remove device_busy in future, driver - * should track counter in internal structure. - */ - sdev_busy = atomic_read(&scmd->device->device_busy); - if (instance->perf_mode == MR_BALANCED_PERF_MODE && - sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) { + (megasas_sdev_busy_read(instance, scmd) > + (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH))) { cmd->request_desc->SCSIIO.MSIxIndex = mega_mod64((atomic64_add_return(1, &instance->high_iops_outstanding) / MR_HIGH_IOPS_BATCH_COUNT), instance->low_latency_index_start); @@ -3390,6 +3418,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance, * Issue the command to the FW */
+ megasas_sdev_busy_inc(instance, scmd); megasas_fire_cmd_fusion(instance, req_desc);
if (r1_cmd) @@ -3450,6 +3479,7 @@ megasas_complete_r1_command(struct megasas_instance *instance, scmd_local->SCp.ptr = NULL; megasas_return_cmd_fusion(instance, cmd); scsi_dma_unmap(scmd_local); + megasas_sdev_busy_dec(instance, scmd_local); scmd_local->scsi_done(scmd_local); } } @@ -3558,6 +3588,7 @@ complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex, scmd_local->SCp.ptr = NULL; megasas_return_cmd_fusion(instance, cmd_fusion); scsi_dma_unmap(scmd_local); + megasas_sdev_busy_dec(instance, scmd_local); scmd_local->scsi_done(scmd_local); } else /* Optimal VD - R1 FP command completion. */ megasas_complete_r1_command(instance, cmd_fusion);
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 8278807abd338f2246b6ae8057f2ec61a80a5614 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Add scsi_device_busy() helper to prepare drivers for tracking device queue depth via sbitmap_queue.
Link: https://lore.kernel.org/r/20210122023317.687987-12-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/message/fusion/mptsas.c | 2 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- drivers/scsi/scsi_lib.c | 4 ++-- drivers/scsi/scsi_sysfs.c | 2 +- drivers/scsi/sg.c | 2 +- include/scsi/scsi_device.h | 5 +++++ 6 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 18b91ea1a353..25dd9b2e12ab 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -3756,7 +3756,7 @@ mptsas_send_link_status_event(struct fw_event_work *fw_event) printk(MYIOC_s_DEBUG_FMT "SDEV OUTSTANDING CMDS" "%d\n", ioc->name, - atomic_read(&sdev->device_busy))); + scsi_device_busy(sdev))); }
} diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index c4a322c4b83c..e69394c00b39 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3255,7 +3255,7 @@ scsih_dev_reset(struct scsi_cmnd *scmd) MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET, 0, 0, tr_timeout, tr_method); /* Check for busy commands after reset */ - if (r == SUCCESS && atomic_read(&scmd->device->device_busy)) + if (r == SUCCESS && scsi_device_busy(scmd->device)) r = FAILED; out: sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(0x%p)\n", diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ba5780359930..477d7bb344d8 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -390,7 +390,7 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
static inline bool scsi_device_is_busy(struct scsi_device *sdev) { - if (atomic_read(&sdev->device_busy) >= sdev->queue_depth) + if (scsi_device_busy(sdev) >= sdev->queue_depth) return true; if (atomic_read(&sdev->device_blocked) > 0) return true; @@ -1642,7 +1642,7 @@ static int scsi_mq_get_budget(struct request_queue *q) * the .restarts flag, and the request queue will be run for handling * this request, see scsi_end_request(). */ - if (unlikely(atomic_read(&sdev->device_busy) == 0 && + if (unlikely(scsi_device_busy(sdev) == 0 && !scsi_device_blocked(sdev))) blk_mq_delay_run_hw_queues(sdev->request_queue, SCSI_QUEUE_DELAY); return -1; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index e15f561e756c..b878916b57f1 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -679,7 +679,7 @@ sdev_show_device_busy(struct device *dev, struct device_attribute *attr, char *buf) { struct scsi_device *sdev = to_scsi_device(dev); - return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_busy)); + return snprintf(buf, 20, "%d\n", scsi_device_busy(sdev)); } static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index e1c086ac8a60..ec89119ce5b7 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2517,7 +2517,7 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v) scsidp->id, scsidp->lun, (int) scsidp->type, 1, (int) scsidp->queue_depth, - (int) atomic_read(&scsidp->device_busy), + (int) scsi_device_busy(scsidp), (int) scsi_device_online(scsidp)); } read_unlock_irqrestore(&sg_index_lock, iflags); diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 8b9d1aac62d1..f9795d0a7a34 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -606,6 +606,11 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev) return 0; }
+static inline int scsi_device_busy(struct scsi_device *sdev) +{ + return atomic_read(&sdev->device_busy); +} + #define MODULE_ALIAS_SCSI_DEVICE(type) \ MODULE_ALIAS("scsi:t-" __stringify(type) "*") #define SCSI_DEVICE_MODALIAS_FMT "scsi:t-0x%02x"
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit ca44532139514f5fb0a5a081cd8576e4abe54e65 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Limit SCSI device's queue depth to max(host->can_queue, 1024) in scsi_change_queue_depth(). 1024 is big enough for saturating current fast SCSI LUN(SSD or RAID volume on multiple SSDs). Also single hardware queue depth is usually enough for saturating single LUN because per-core performance is often considered in storage design.
This patch is needed for replacing sdev->device_busy with sbitmap which has to be pre-allocated with reasonable max depth.
Link: https://lore.kernel.org/r/20210122023317.687987-13-ming.lei@redhat.com Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 57b0dc08afeb..7fdc1a215fb9 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -214,6 +214,15 @@ void scsi_finish_command(struct scsi_cmnd *cmd) scsi_io_completion(cmd, good_bytes); }
+ +/* + * 1024 is big enough for saturating the fast scsi LUN now + */ +static int scsi_device_max_queue_depth(struct scsi_device *sdev) +{ + return max_t(int, sdev->host->can_queue, 1024); +} + /** * scsi_change_queue_depth - change a device's queue depth * @sdev: SCSI Device in question @@ -223,6 +232,8 @@ void scsi_finish_command(struct scsi_cmnd *cmd) */ int scsi_change_queue_depth(struct scsi_device *sdev, int depth) { + depth = min_t(int, depth, scsi_device_max_queue_depth(sdev)); + if (depth > 0) { sdev->queue_depth = depth; wmb();
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 020b0f0a31920e5b7e7e120d4560453b67b70733 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
SCSI currently uses an atomic variable to track queue depth for each attached device. The queue depth depends on many factors such as transport type and device implementation. In addition, the SCSI device queue depth is not a static entity but changes over time as a result of congestion management.
While blk-mq currently tracks queue depth for each hctx, it can't easily be changed to accommodate the SCSI per-device requirement.
The current approach of using an atomic variable doesn't scale well when there are lots of CPU cores and the disk is very fast. IOPS can be substantially impacted by the atomic in the hot path.
Replace the atomic variable sdev->device_busy with an sbitmap for tracking the SCSI device queue depth.
It has been observed that IOPS is improved ~30% by this patchset in the following test:
1) test machine(32 logical CPU cores) Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Model name: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
2) setup scsi_debug: modprobe scsi_debug virtual_gb=128 max_luns=1 submit_queues=32 delay=0 max_queue=256
3) fio script: fio --rw=randread --size=128G --direct=1 --ioengine=libaio --iodepth=2048 \ --numjobs=32 --bs=4k --group_reporting=1 --group_reporting=1 --runtime=60 \ --loops=10000 --name=job1 --filename=/dev/sdN
[mkp: fix device_busy reference in mpt3sas]
Link: https://lore.kernel.org/r/20210122023317.687987-14-ming.lei@redhat.com Link: https://lore.kernel.org/linux-block/20200119071432.18558-6-ming.lei@redhat.c... Cc: Omar Sandoval osandov@fb.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Cc: Ewan D. Milne emilne@redhat.com Cc: Hannes Reinecke hare@suse.de Tested-by: Sumanesh Samanta sumanesh.samanta@broadcom.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com
Conflicts: drivers/scsi/mpt3sas/mpt3sas_base.c [The conflict here is due to the lack of introduction of commit 664f0dce2058 ("scsi: mpt3sas: Add support for shared host tagset for CPU hotplug").] Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi.c | 4 +++- drivers/scsi/scsi_lib.c | 35 ++++++++++++++++++----------------- drivers/scsi/scsi_priv.h | 3 +++ drivers/scsi/scsi_scan.c | 23 +++++++++++++++++++++-- drivers/scsi/scsi_sysfs.c | 2 ++ include/scsi/scsi_device.h | 5 +++-- 6 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 7fdc1a215fb9..ceb06d84d981 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -218,7 +218,7 @@ void scsi_finish_command(struct scsi_cmnd *cmd) /* * 1024 is big enough for saturating the fast scsi LUN now */ -static int scsi_device_max_queue_depth(struct scsi_device *sdev) +int scsi_device_max_queue_depth(struct scsi_device *sdev) { return max_t(int, sdev->host->can_queue, 1024); } @@ -242,6 +242,8 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth) if (sdev->request_queue) blk_set_queue_depth(sdev->request_queue, depth);
+ sbitmap_resize(&sdev->budget_map, sdev->queue_depth); + return sdev->queue_depth; } EXPORT_SYMBOL(scsi_change_queue_depth); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 477d7bb344d8..e368f2724c2e 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -333,7 +333,7 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd) if (starget->can_queue > 0) atomic_dec(&starget->target_busy);
- atomic_dec(&sdev->device_busy); + sbitmap_put(&sdev->budget_map, cmd->budget_token); cmd->budget_token = -1; }
@@ -1258,19 +1258,20 @@ scsi_device_state_check(struct scsi_device *sdev, struct request *req) }
/* - * scsi_dev_queue_ready: if we can send requests to sdev, return 1 else - * return 0. - * - * Called with the queue_lock held. + * scsi_dev_queue_ready: if we can send requests to sdev, assign one token + * and return the token else return -1. */ static inline int scsi_dev_queue_ready(struct request_queue *q, struct scsi_device *sdev) { - unsigned int busy; + int token;
- busy = atomic_inc_return(&sdev->device_busy) - 1; + token = sbitmap_get(&sdev->budget_map); if (atomic_read(&sdev->device_blocked)) { - if (busy) + if (token < 0) + goto out; + + if (scsi_device_busy(sdev) > 1) goto out_dec;
/* @@ -1282,13 +1283,12 @@ static inline int scsi_dev_queue_ready(struct request_queue *q, "unblocking device at zero depth\n")); }
- if (busy >= sdev->queue_depth) - goto out_dec; - - return 1; + return token; out_dec: - atomic_dec(&sdev->device_busy); - return 0; + if (token >= 0) + sbitmap_put(&sdev->budget_map, token); +out: + return -1; }
/* @@ -1615,15 +1615,16 @@ static void scsi_mq_put_budget(struct request_queue *q, int budget_token) { struct scsi_device *sdev = q->queuedata;
- atomic_dec(&sdev->device_busy); + sbitmap_put(&sdev->budget_map, budget_token); }
static int scsi_mq_get_budget(struct request_queue *q) { struct scsi_device *sdev = q->queuedata; + int token = scsi_dev_queue_ready(q, sdev);
- if (scsi_dev_queue_ready(q, sdev)) - return 0; + if (token >= 0) + return token;
atomic_inc(&sdev->restarts);
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index ba39c21196f5..f2004e847822 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -5,6 +5,7 @@ #include <linux/device.h> #include <linux/async.h> #include <scsi/scsi_device.h> +#include <linux/sbitmap.h>
struct request_queue; struct request; @@ -182,6 +183,8 @@ static inline void scsi_dh_add_device(struct scsi_device *sdev) { } static inline void scsi_dh_release_device(struct scsi_device *sdev) { } #endif
+extern int scsi_device_max_queue_depth(struct scsi_device *sdev); + /* * internal scsi timeout functions: for use by mid-layer and transport * classes. diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 1e052c1570c4..89fcb3ab8b9b 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -215,6 +215,7 @@ static void scsi_unlock_floptical(struct scsi_device *sdev, static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, u64 lun, void *hostdata) { + unsigned int depth; struct scsi_device *sdev; int display_failure_msg = 1, ret; struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); @@ -276,8 +277,25 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, WARN_ON_ONCE(!blk_get_queue(sdev->request_queue)); sdev->request_queue->queuedata = sdev;
- scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ? - sdev->host->cmd_per_lun : 1); + depth = sdev->host->cmd_per_lun ?: 1; + + /* + * Use .can_queue as budget map's depth because we have to + * support adjusting queue depth from sysfs. Meantime use + * default device queue depth to figure out sbitmap shift + * since we use this queue depth most of times. + */ + if (sbitmap_init_node(&sdev->budget_map, + scsi_device_max_queue_depth(sdev), + sbitmap_calculate_shift(depth), + GFP_KERNEL, sdev->request_queue->node, + false, true)) { + put_device(&starget->dev); + kfree(sdev); + goto out; + } + + scsi_change_queue_depth(sdev, depth);
scsi_sysfs_device_initialize(sdev);
@@ -980,6 +998,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, scsi_attach_vpd(sdev);
sdev->max_queue_depth = sdev->queue_depth; + WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth); sdev->sdev_bflags = *bflags;
/* diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index b878916b57f1..80ad038c6f7f 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -480,6 +480,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev->request_queue = NULL;
+ sbitmap_free(&sdev->budget_map); + mutex_lock(&sdev->inquiry_mutex); vpd_pg0 = rcu_replace_pointer(sdev->vpd_pg0, vpd_pg0, lockdep_is_held(&sdev->inquiry_mutex)); diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index f9795d0a7a34..929311440781 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -9,6 +9,7 @@ #include <linux/blkdev.h> #include <scsi/scsi.h> #include <linux/atomic.h> +#include <linux/sbitmap.h>
struct device; struct request_queue; @@ -107,7 +108,7 @@ struct scsi_device { struct list_head siblings; /* list of all devices on this host */ struct list_head same_target_siblings; /* just the devices sharing same target id */
- atomic_t device_busy; /* commands actually active on LLDD */ + struct sbitmap budget_map; atomic_t device_blocked; /* Device returned QUEUE_FULL. */
atomic_t restarts; @@ -608,7 +609,7 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev)
static inline int scsi_device_busy(struct scsi_device *sdev) { - return atomic_read(&sdev->device_busy); + return sbitmap_weight(&sdev->budget_map); }
#define MODULE_ALIAS_SCSI_DEVICE(type) \
From: Bart Van Assche bvanassche@acm.org
mainline inclusion from mainline-v5.13-rc1 commit 035e9f471691a16c32b389c8b2f236043a2a50d7 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------
All sbitmap code uses implied preemption protection to update sb->alloc_hint except sbitmap_put(). Using implied preemption protection is safe since the value of sb->alloc_hint only affects performance of sbitmap allocations but not their correctness. Change this_cpu_ptr() in sbitmap_put() into raw_cpu_ptr() to suppress the following kernel warning that appears with preemption debugging enabled (CONFIG_DEBUG_PREEMPT):
BUG: using smp_processor_id() in preemptible [00000000] code: scsi_eh_0/152 caller is debug_smp_processor_id+0x17/0x20 CPU: 1 PID: 152 Comm: scsi_eh_0 Tainted: G W 5.12.0-rc1-dbg+ #6 Call Trace: show_stack+0x52/0x58 dump_stack+0xaf/0xf3 check_preemption_disabled+0xce/0xd0 debug_smp_processor_id+0x17/0x20 scsi_device_unbusy+0x13a/0x1c0 [scsi_mod] scsi_finish_command+0x4d/0x290 [scsi_mod] scsi_eh_flush_done_q+0x1e7/0x280 [scsi_mod] ata_scsi_port_error_handler+0x592/0x750 [libata] ata_scsi_error+0x1a0/0x1f0 [libata] scsi_error_handler+0x19e/0x330 [scsi_mod] kthread+0x222/0x250 ret_from_fork+0x1f/0x30
Link: https://lore.kernel.org/r/20210317032648.9080-1-bvanassche@acm.org Fixes: c548e62bcf6a ("scsi: sbitmap: Move allocation hint into sbitmap") Cc: Hannes Reinecke hare@suse.de Cc: Omar Sandoval osandov@fb.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/sbitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 9208ccbed720..c1db875f77f8 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -332,7 +332,7 @@ static inline void sbitmap_put(struct sbitmap *sb, unsigned int bitnr) sbitmap_deferred_clear_bit(sb, bitnr);
if (likely(sb->alloc_hint && !sb->round_robin && bitnr < sb->depth)) - *this_cpu_ptr(sb->alloc_hint) = bitnr; + *raw_cpu_ptr(sb->alloc_hint) = bitnr; }
static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr)
From: Ming Lei ming.lei@redhat.com
mainline inclusion from mainline-v5.13-rc1 commit 85367040511f8402d7e4054d8c17b053c75e33ff category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
Fixes the following warning when running 'make htmldocs':
include/linux/blk-mq.h:395: warning: Function parameter or member 'set_rq_budget_token' not described in 'blk_mq_ops' include/linux/blk-mq.h:395: warning: Function parameter or member 'get_rq_budget_token' not described in 'blk_mq_ops'
[mkp: added warning messages]
Link: https://lore.kernel.org/r/20210421154526.1954174-1-ming.lei@redhat.com Fixes: d022d18c045f ("scsi: blk-mq: Add callbacks for storing & retrieving budget token") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/blk-mq.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f25fb564c24f..24ea13c4e88d 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -340,12 +340,12 @@ struct blk_mq_ops { */ void (*put_budget)(struct request_queue *, int);
- /* - * @set_rq_budget_toekn: store rq's budget token + /** + * @set_rq_budget_token: store rq's budget token */ void (*set_rq_budget_token)(struct request *, int); - /* - * @get_rq_budget_toekn: retrieve rq's budget token + /** + * @get_rq_budget_token: retrieve rq's budget token */ int (*get_rq_budget_token)(struct request *);
From: Bart Van Assche bvanassche@acm.org
mainline inclusion from mainline-v5.17-rc1 commit 4bc3bffc1a885eb5cb259e4a25146a4c7b1034e3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------
The comment above scsi_device_max_queue_depth() and also the description of commit ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)") contradict the implementation of the function scsi_device_max_queue_depth(). Additionally, the maximum queue depth of a SCSI LUN never exceeds host->can_queue. Fix scsi_device_max_queue_depth() by changing max_t() into min_t().
Link: https://lore.kernel.org/r/20211203231950.193369-2-bvanassche@acm.org Fixes: ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)") Cc: Hannes Reinecke hare@suse.de Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Tested-by: Bean Huo beanhuo@micron.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bean Huo beanhuo@micron.com Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index ceb06d84d981..fd6f31a56a97 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -216,11 +216,11 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
/* - * 1024 is big enough for saturating the fast scsi LUN now + * 1024 is big enough for saturating fast SCSI LUNs. */ int scsi_device_max_queue_depth(struct scsi_device *sdev) { - return max_t(int, sdev->host->can_queue, 1024); + return min_t(int, sdev->host->can_queue, 1024); }
/**
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
--------------------------------
Commit 9a9c2b27c6f2 ("scsi: sbitmap: Maintain allocation round_robin in sbitmap") and commit 72080b3cd02e ("scsi: sbitmap: Move allocation hint into sbitmap") move round_robin and allocation hint into sbitmap. Fix it by adding struct sbitmap_extend.
Fixes: 9a9c2b27c6f2 ("scsi: sbitmap: Maintain allocation round_robin in sbitmap") Fixes: 72080b3cd02e ("scsi: sbitmap: Move allocation hint into sbitmap") Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/sbitmap.h | 42 +++++++++++++++++++-------------- lib/sbitmap.c | 51 ++++++++++++++++++++++++----------------- 2 files changed, 55 insertions(+), 38 deletions(-)
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index c1db875f77f8..00789ea0f77c 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -40,6 +40,21 @@ struct sbitmap_word { spinlock_t swap_lock; } ____cacheline_aligned_in_smp;
+struct sbitmap_extend { + /** + * @alloc_hint: Cache of last successfully allocated or freed bit. + */ + bool round_robin; + + /** + * This is per-cpu, which allows multiple users to stick to different + * cachelines until the map is exhausted. + * + * @round_robin: Allocate bits in strict round-robin order. + */ + unsigned int __percpu *alloc_hint; +}; + /** * struct sbitmap - Scalable bitmap. * @@ -62,25 +77,12 @@ struct sbitmap { */ unsigned int map_nr;
- /** - * @round_robin: Allocate bits in strict round-robin order. - */ - bool round_robin; - /** * @map: Allocated bitmap. */ struct sbitmap_word *map;
- /* - * @alloc_hint: Cache of last successfully allocated or freed bit. - * - * This is per-cpu, which allows multiple users to stick to different - * cachelines until the map is exhausted. - */ - unsigned int __percpu *alloc_hint; - - KABI_RESERVE(1) + KABI_USE(1, struct sbitmap_extend *extend) };
#define SBQ_WAIT_QUEUES 8 @@ -116,6 +118,8 @@ struct sbitmap_queue { */ struct sbitmap sb;
+ KABI_DEPRECATE(unsigned int __percpu *, alloc_hint) + /** * @wake_batch: Number of bits which must be freed before we wake up any * waiters. @@ -137,6 +141,8 @@ struct sbitmap_queue { */ atomic_t ws_active;
+ KABI_DEPRECATE(bool, round_robin) + /** * @min_shallow_depth: The minimum shallow depth which may be passed to * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). @@ -169,7 +175,9 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, */ static inline void sbitmap_free(struct sbitmap *sb) { - free_percpu(sb->alloc_hint); + free_percpu(sb->extend->alloc_hint); + kfree(sb->extend); + sb->extend = NULL; kfree(sb->map); sb->map = NULL; } @@ -331,8 +339,8 @@ static inline void sbitmap_put(struct sbitmap *sb, unsigned int bitnr) { sbitmap_deferred_clear_bit(sb, bitnr);
- if (likely(sb->alloc_hint && !sb->round_robin && bitnr < sb->depth)) - *raw_cpu_ptr(sb->alloc_hint) = bitnr; + if (likely(sb->extend->alloc_hint && !sb->extend->round_robin && bitnr < sb->depth)) + *raw_cpu_ptr(sb->extend->alloc_hint) = bitnr; }
static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 62d3b5d5e009..9fe2aebc13da 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -13,15 +13,15 @@ static int init_alloc_hint(struct sbitmap *sb, gfp_t flags) { unsigned depth = sb->depth;
- sb->alloc_hint = alloc_percpu_gfp(unsigned int, flags); - if (!sb->alloc_hint) + sb->extend->alloc_hint = alloc_percpu_gfp(unsigned int, flags); + if (!sb->extend->alloc_hint) return -ENOMEM;
- if (depth && !sb->round_robin) { + if (depth && !sb->extend->round_robin) { int i;
for_each_possible_cpu(i) - *per_cpu_ptr(sb->alloc_hint, i) = prandom_u32() % depth; + *per_cpu_ptr(sb->extend->alloc_hint, i) = prandom_u32() % depth; } return 0; } @@ -31,10 +31,10 @@ static inline unsigned update_alloc_hint_before_get(struct sbitmap *sb, { unsigned hint;
- hint = this_cpu_read(*sb->alloc_hint); + hint = this_cpu_read(*sb->extend->alloc_hint); if (unlikely(hint >= depth)) { hint = depth ? prandom_u32() % depth : 0; - this_cpu_write(*sb->alloc_hint, hint); + this_cpu_write(*sb->extend->alloc_hint, hint); }
return hint; @@ -47,13 +47,13 @@ static inline void update_alloc_hint_after_get(struct sbitmap *sb, { if (nr == -1) { /* If the map is full, a hint won't do us much good. */ - this_cpu_write(*sb->alloc_hint, 0); - } else if (nr == hint || unlikely(sb->round_robin)) { + this_cpu_write(*sb->extend->alloc_hint, 0); + } else if (nr == hint || unlikely(sb->extend->round_robin)) { /* Only update the hint if we used it. */ hint = nr + 1; if (hint >= depth - 1) hint = 0; - this_cpu_write(*sb->alloc_hint, hint); + this_cpu_write(*sb->extend->alloc_hint, hint); } }
@@ -103,10 +103,14 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, if (bits_per_word > BITS_PER_LONG) return -EINVAL;
+ sb->extend = kzalloc(sizeof(*sb->extend), flags); + if (!sb->extend) + return -ENOMEM; + sb->shift = shift; sb->depth = depth; sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); - sb->round_robin = round_robin; + sb->extend->round_robin = round_robin;
if (depth == 0) { sb->map = NULL; @@ -114,15 +118,20 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, }
if (alloc_hint) { - if (init_alloc_hint(sb, flags)) + if (init_alloc_hint(sb, flags)) { + kfree(sb->extend); + sb->extend = NULL; return -ENOMEM; + } } else { - sb->alloc_hint = NULL; + sb->extend->alloc_hint = NULL; }
sb->map = kcalloc_node(sb->map_nr, sizeof(*sb->map), flags, node); if (!sb->map) { - free_percpu(sb->alloc_hint); + free_percpu(sb->extend->alloc_hint); + kfree(sb->extend); + sb->extend = NULL; return -ENOMEM; }
@@ -193,7 +202,7 @@ static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index,
do { nr = __sbitmap_get_word(&map->word, map->depth, alloc_hint, - !sb->round_robin); + !sb->extend->round_robin); if (nr != -1) break; if (!sbitmap_deferred_clear(map)) @@ -215,7 +224,7 @@ static int __sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint) * alloc_hint to find the right word index. No point in looping * twice in find_next_zero_bit() for that case. */ - if (sb->round_robin) + if (sb->extend->round_robin) alloc_hint = SB_NR_TO_BIT(sb, alloc_hint); else alloc_hint = 0; @@ -241,7 +250,7 @@ int sbitmap_get(struct sbitmap *sb) int nr; unsigned int hint, depth;
- if (WARN_ON_ONCE(unlikely(!sb->alloc_hint))) + if (WARN_ON_ONCE(unlikely(!sb->extend->alloc_hint))) return -1;
depth = READ_ONCE(sb->depth); @@ -293,7 +302,7 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth) int nr; unsigned int hint, depth;
- if (WARN_ON_ONCE(unlikely(!sb->alloc_hint))) + if (WARN_ON_ONCE(unlikely(!sb->extend->alloc_hint))) return -1;
depth = READ_ONCE(sb->depth); @@ -653,8 +662,8 @@ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, smp_mb__after_atomic(); sbitmap_queue_wake_up(sbq);
- if (likely(!sbq->sb.round_robin && nr < sbq->sb.depth)) - *per_cpu_ptr(sbq->sb.alloc_hint, cpu) = nr; + if (likely(!sbq->sb.extend->round_robin && nr < sbq->sb.depth)) + *per_cpu_ptr(sbq->sb.extend->alloc_hint, cpu) = nr; } EXPORT_SYMBOL_GPL(sbitmap_queue_clear);
@@ -692,7 +701,7 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m) if (!first) seq_puts(m, ", "); first = false; - seq_printf(m, "%u", *per_cpu_ptr(sbq->sb.alloc_hint, i)); + seq_printf(m, "%u", *per_cpu_ptr(sbq->sb.extend->alloc_hint, i)); } seq_puts(m, "}\n");
@@ -710,7 +719,7 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m) } seq_puts(m, "}\n");
- seq_printf(m, "round_robin=%d\n", sbq->sb.round_robin); + seq_printf(m, "round_robin=%d\n", sbq->sb.extend->round_robin); seq_printf(m, "min_shallow_depth=%u\n", sbq->min_shallow_depth); } EXPORT_SYMBOL_GPL(sbitmap_queue_show);
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
--------------------------------
Fix kabi broken in struct blk_mq_ops and struct scsi_cmnd.
Fixes: e18240654b53 ("scsi: blk-mq: Add callbacks for storing & retrieving budget token") Fixes: 12fb43d80453 ("scsi: blk-mq: Return budget token from .get_budget callback") Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- include/linux/blk-mq.h | 26 +++++++++++++------------- include/scsi/scsi_cmnd.h | 4 +--- 2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 24ea13c4e88d..84043f91d9ce 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -333,21 +333,14 @@ struct blk_mq_ops { * reserved budget. Also we have to handle failure case * of .get_budget for avoiding I/O deadlock. */ - int (*get_budget)(struct request_queue *); + KABI_REPLACE(bool (*get_budget)(struct request_queue *), + int (*get_budget)(struct request_queue *))
/** * @put_budget: Release the reserved budget. */ - void (*put_budget)(struct request_queue *, int); - - /** - * @set_rq_budget_token: store rq's budget token - */ - void (*set_rq_budget_token)(struct request *, int); - /** - * @get_rq_budget_token: retrieve rq's budget token - */ - int (*get_rq_budget_token)(struct request *); + KABI_REPLACE(void (*put_budget)(struct request_queue *), + void (*put_budget)(struct request_queue *, int))
/** * @timeout: Called on request timeout. @@ -420,8 +413,15 @@ struct blk_mq_ops { void (*show_rq)(struct seq_file *m, struct request *rq); #endif
- KABI_RESERVE(1) - KABI_RESERVE(2) + /** + * @set_rq_budget_token: store rq's budget token + */ + KABI_USE(1, void (*set_rq_budget_token)(struct request *, int)) + /** + * @get_rq_budget_token: retrieve rq's budget token + */ + KABI_USE(2, int (*get_rq_budget_token)(struct request *)) + KABI_RESERVE(3) KABI_RESERVE(4) KABI_RESERVE(5) diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 49e238cd477e..6e5bbe1e7dfa 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -76,8 +76,6 @@ struct scsi_cmnd {
int eh_eflags; /* Used by error handlr */
- int budget_token; - /* * This is set to jiffies as it was when the command was first * allocated. It is used to time how long the command has @@ -146,7 +144,7 @@ struct scsi_cmnd { unsigned char tag; /* SCSI-II queued command tag */ unsigned int extra_len; /* length of alignment and padding */
- KABI_RESERVE(1) + KABI_USE(1, int budget_token) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4)
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
--------------------------------
Commit cade2a2a1673 ("scsi: core: Replace sdev->device_busy with sbitmap") replace the atomic variable sdev->device_busy with an sbitmap for tracking the SCSI device queue depth. Fix the kabi broken in struct scsi_device.
Fixes: cade2a2a1673 ("scsi: core: Replace sdev->device_busy with sbitmap") Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi.c | 2 +- drivers/scsi/scsi_lib.c | 8 +++---- drivers/scsi/scsi_scan.c | 46 ++++++++++++++++++++++++++++---------- drivers/scsi/scsi_sysfs.c | 2 +- include/scsi/scsi_device.h | 9 +++++--- 5 files changed, 46 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index fd6f31a56a97..77cb26bb4df4 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -242,7 +242,7 @@ int scsi_change_queue_depth(struct scsi_device *sdev, int depth) if (sdev->request_queue) blk_set_queue_depth(sdev->request_queue, depth);
- sbitmap_resize(&sdev->budget_map, sdev->queue_depth); + sbitmap_resize(sdev->budget_map, sdev->queue_depth);
return sdev->queue_depth; } diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index e368f2724c2e..a8c115643af3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -333,7 +333,7 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd) if (starget->can_queue > 0) atomic_dec(&starget->target_busy);
- sbitmap_put(&sdev->budget_map, cmd->budget_token); + sbitmap_put(sdev->budget_map, cmd->budget_token); cmd->budget_token = -1; }
@@ -1266,7 +1266,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q, { int token;
- token = sbitmap_get(&sdev->budget_map); + token = sbitmap_get(sdev->budget_map); if (atomic_read(&sdev->device_blocked)) { if (token < 0) goto out; @@ -1286,7 +1286,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q, return token; out_dec: if (token >= 0) - sbitmap_put(&sdev->budget_map, token); + sbitmap_put(sdev->budget_map, token); out: return -1; } @@ -1615,7 +1615,7 @@ static void scsi_mq_put_budget(struct request_queue *q, int budget_token) { struct scsi_device *sdev = q->queuedata;
- sbitmap_put(&sdev->budget_map, budget_token); + sbitmap_put(sdev->budget_map, budget_token); }
static int scsi_mq_get_budget(struct request_queue *q) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 89fcb3ab8b9b..1be912544932 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -198,6 +198,38 @@ static void scsi_unlock_floptical(struct scsi_device *sdev, SCSI_TIMEOUT, 3, NULL); }
+static int scsi_alloc_and_init_budget_map(struct scsi_device *sdev, unsigned int depth) +{ + sdev->budget_map = kzalloc(sizeof(*sdev->budget_map), GFP_KERNEL); + if (!sdev->budget_map) + return -ENOMEM; + + /* + * Use .can_queue as budget map's depth because we have to + * support adjusting queue depth from sysfs. Meantime use + * default device queue depth to figure out sbitmap shift + * since we use this queue depth most of times. + */ + if (sbitmap_init_node(sdev->budget_map, + scsi_device_max_queue_depth(sdev), + sbitmap_calculate_shift(depth), + GFP_KERNEL, sdev->request_queue->node, + false, true)) { + kfree(sdev->budget_map); + sdev->budget_map = NULL; + return -ENOMEM; + } + + return 0; +} + +void scsi_free_budget_map(struct scsi_device *sdev) +{ + sbitmap_free(sdev->budget_map); + kfree(sdev->budget_map); + sdev->budget_map = NULL; +} + /** * scsi_alloc_sdev - allocate and setup a scsi_Device * @starget: which target to allocate a &scsi_device for @@ -279,17 +311,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
depth = sdev->host->cmd_per_lun ?: 1;
- /* - * Use .can_queue as budget map's depth because we have to - * support adjusting queue depth from sysfs. Meantime use - * default device queue depth to figure out sbitmap shift - * since we use this queue depth most of times. - */ - if (sbitmap_init_node(&sdev->budget_map, - scsi_device_max_queue_depth(sdev), - sbitmap_calculate_shift(depth), - GFP_KERNEL, sdev->request_queue->node, - false, true)) { + if (scsi_alloc_and_init_budget_map(sdev, depth)) { put_device(&starget->dev); kfree(sdev); goto out; @@ -998,7 +1020,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, scsi_attach_vpd(sdev);
sdev->max_queue_depth = sdev->queue_depth; - WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth); + WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map->depth); sdev->sdev_bflags = *bflags;
/* diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 80ad038c6f7f..842d7ca99ddb 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -480,7 +480,7 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev->request_queue = NULL;
- sbitmap_free(&sdev->budget_map); + scsi_free_budget_map(sdev);
mutex_lock(&sdev->inquiry_mutex); vpd_pg0 = rcu_replace_pointer(sdev->vpd_pg0, vpd_pg0, diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 929311440781..30cdc73175b5 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -108,7 +108,7 @@ struct scsi_device { struct list_head siblings; /* list of all devices on this host */ struct list_head same_target_siblings; /* just the devices sharing same target id */
- struct sbitmap budget_map; + KABI_DEPRECATE(atomic_t, device_busy) atomic_t device_blocked; /* Device returned QUEUE_FULL. */
atomic_t restarts; @@ -241,7 +241,7 @@ struct scsi_device { enum scsi_device_state sdev_state; struct task_struct *quiesced_by;
- KABI_RESERVE(1) + KABI_USE(1, struct sbitmap *budget_map) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) @@ -460,6 +460,9 @@ extern int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, unsigned char *sense, struct scsi_sense_hdr *sshdr, int timeout, int retries, u64 flags, req_flags_t rq_flags, int *resid); + +extern void scsi_free_budget_map(struct scsi_device *sdev); + /* Make sure any sense buffer is the correct size. */ #define scsi_execute(sdev, cmd, data_direction, buffer, bufflen, sense, \ sshdr, timeout, retries, flags, rq_flags, resid) \ @@ -609,7 +612,7 @@ static inline int scsi_device_supports_vpd(struct scsi_device *sdev)
static inline int scsi_device_busy(struct scsi_device *sdev) { - return sbitmap_weight(&sdev->budget_map); + return sbitmap_weight(sdev->budget_map); }
#define MODULE_ALIAS_SCSI_DEVICE(type) \
From: Sumit Saxena sumit.saxena@broadcom.com
mainline inclusion from mainline-v5.19-rc1 commit f9bdac31cf4b8609c1b241749b0ceb8f67c8685b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
-------------------------------------------
The maximum SCSI device queue depth of 1024 is not sufficient for RAID volumes configured behind Broadcom RAID controllers. For a 16-drive RAID volume with a device queue depth limit of 1024, only 64 I/Os (1024/16) can be issued per drive. That is not sufficient to saturate the device.
Link: https://lore.kernel.org/r/20220414103601.140687-1-sumit.saxena@broadcom.com Cc: Ming Lei ming.lei@redhat.com Cc: Hannes Reinecke hare@suse.de Cc: Bart Van Assche bvanassche@acm.org Cc: Sumanesh Samanta sumanesh.samanta@broadcom.com Signed-off-by: Sumit Saxena sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Zheng Qixing zhengqixing@huawei.com --- drivers/scsi/scsi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 77cb26bb4df4..3bf010fe6b00 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -216,11 +216,11 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
/* - * 1024 is big enough for saturating fast SCSI LUNs. + * 4096 is big enough for saturating fast SCSI LUNs. */ int scsi_device_max_queue_depth(struct scsi_device *sdev) { - return min_t(int, sdev->host->can_queue, 1024); + return min_t(int, sdev->host->can_queue, 4096); }
/**
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/13966 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/2...
FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/13966 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/2...