From: Jens Axboe axboe@kernel.dk
mainline inclusion from mainline-v6.9-rc1 commit da4c8c3d0975f031ef82d39927102e39fa6ddfac category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAGRKP CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Querying the current time is the most costly thing we do in the block layer per IO, and depending on kernel config settings, we may do it many times per IO.
None of the callers actually need nsec granularity. Take advantage of that by caching the current time in the plug, with the assumption here being that any time checking will be temporally close enough that the slight loss of precision doesn't matter.
If the block plug gets flushed, eg on preempt or schedule out, then we invalidate the cached clock.
On a basic peak IOPS test case with iostats enabled, this changes the performance from:
IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31 IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32 IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32 IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32 IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31 IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32 IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31
to
IOPS=118.79M, BW=58.00GiB/s, IOS/call=31/32 IOPS=118.62M, BW=57.92GiB/s, IOS/call=31/31 IOPS=118.80M, BW=58.01GiB/s, IOS/call=32/31 IOPS=118.78M, BW=58.00GiB/s, IOS/call=32/32 IOPS=118.69M, BW=57.95GiB/s, IOS/call=32/31 IOPS=118.62M, BW=57.92GiB/s, IOS/call=32/31 IOPS=118.63M, BW=57.92GiB/s, IOS/call=31/32
which is more than a 9% improvement in performance. Looking at perf diff, we can see a huge reduction in time overhead:
10.55% -9.88% [kernel.vmlinux] [k] read_tsc 1.31% -1.22% [kernel.vmlinux] [k] ktime_get
Note that since this relies on blk_plug for the caching, it's only applicable to the issue side. But this is where most of the time calls happen anyway. On the completion side, cached time stamping is done with struct io_comp patch, as long as the driver supports it.
It's also worth noting that the above testing doesn't enable any of the higher cost CPU items on the block layer side, like wbt, cgroups, iocost, etc, which all would add additional time querying and hence overhead. IOW, results would likely look even better in comparison with those enabled, as distros would do.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-core.c block/blk.h include/linux/blkdev.h [the definition of struct blk_plug is different from mainline.] Signed-off-by: Yu Kuai yukuai3@huawei.com --- block/blk-core.c | 1 + block/blk.h | 14 +++++++++++++- include/linux/blkdev.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c index 847fd7585952..d63d82a628ac 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -3919,6 +3919,7 @@ void blk_start_plug(struct blk_plug *plug) INIT_LIST_HEAD(&plug->list); INIT_LIST_HEAD(&plug->mq_list); INIT_LIST_HEAD(&plug->cb_list); + plug->cur_ktime = 0; /* * Store ordering should not be needed here, since a potential * preempt will imply a full memory barrier diff --git a/block/blk.h b/block/blk.h index 7dba4947ef02..0c8de338b391 100644 --- a/block/blk.h +++ b/block/blk.h @@ -481,6 +481,18 @@ static inline void blk_free_queue_dispatch_async(struct request_queue *q)
static inline u64 blk_time_get_ns(void) { - return ktime_get_ns(); + struct blk_plug *plug = current->plug; + + if (!plug) + return ktime_get_ns(); + + /* + * 0 could very well be a valid time, but rather than flag "this is + * a valid timestamp" separately, just accept that we'll do an extra + * ktime_get_ns() if we just happen to get 0 as the current time. + */ + if (!plug->cur_ktime) + plug->cur_ktime = ktime_get_ns(); + return plug->cur_ktime; } #endif /* BLK_INTERNAL_H */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c848f4205729..961bc66a0dd1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1372,6 +1372,7 @@ struct blk_plug { struct list_head list; /* requests */ struct list_head mq_list; /* blk-mq requests */ struct list_head cb_list; /* md requires an unplug callback */ + u64 cur_ktime; }; #define BLK_MAX_REQUEST_COUNT 16 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)