From: Yu Kuai <yukuai3(a)huawei.com>
Changes in v2:
- fix the problem that if 'nr_pending' is decreased to 0 in
wait_barrier_nolock() in patch 1, 'conf->barrier' is not waked, and
raise_barrier() can hang while waiting 'nr_pending' to be 0.
- only modify hot path in patch 2.
- use node 1 as default in patch 6.
In some architecture, for example KUNPENG 920, memory access latency is
very bad across nodes compare to local node. For consequence, io
performance is rather bad while users issue io from multiple nodes if
lock contention exist in the driver.
This patchset tries to avoid memory access across nodes in driver.
Test environment: aarch64 Huawei KUNPENG 920
Raid10 initialize:
mdadm --create /dev/md0 --level 10 --bitmap none --raid-devices 4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
Test cmd:
(task set -c 0-15) fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 -rwmixread=70 -refill_buffers -filename=/dev/md0 -numjobs=16 -runtime=60s -bs=4k -iodepth=256 -rw=randread
Test result:
before this patchset: 3.2 GiB/s
bind node before this patchset: 6.9 Gib/s
after this patchset: 7.9 Gib/s
bind node after this patchset: 8.0 Gib/s
Wang ShaoBo (1):
arm64/topology: Getting preferred sibling's cpumask supported by
platform
Yu Kuai (5):
md/raid10: convert resync_lock to use seqlock
md/raid10: prevent unnecessary calls to wake_up() in fast path
block: add new fields in request_queue
block: support to dispatch bio asynchronously
md: enable dispatching bio asynchronously by default
arch/arm64/Kconfig | 8 ++
arch/arm64/include/asm/smp_plat.h | 14 ++
arch/arm64/kernel/smp.c | 9 ++
arch/arm64/kernel/topology.c | 51 +++++++
block/blk-core.c | 212 +++++++++++++++++++++++++++++-
block/blk-sysfs.c | 40 ++++++
drivers/md/md.c | 5 +
drivers/md/raid10.c | 98 +++++++++-----
drivers/md/raid10.h | 2 +-
include/linux/arch_topology.h | 7 +
include/linux/blkdev.h | 6 +
11 files changed, 420 insertions(+), 32 deletions(-)
--
2.31.1