From: Yu Kuai yukuai3@huawei.com
Changes in v2: - fix the problem that if 'nr_pending' is decreased to 0 in wait_barrier_nolock() in patch 1, 'conf->barrier' is not waked, and raise_barrier() can hang while waiting 'nr_pending' to be 0. - only modify hot path in patch 2. - use node 1 as default in patch 6.
In some architecture, for example KUNPENG 920, memory access latency is very bad across nodes compare to local node. For consequence, io performance is rather bad while users issue io from multiple nodes if lock contention exist in the driver.
This patchset tries to avoid memory access across nodes in driver.
Test environment: aarch64 Huawei KUNPENG 920
Raid10 initialize: mdadm --create /dev/md0 --level 10 --bitmap none --raid-devices 4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
Test cmd: (task set -c 0-15) fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 -randseed=2022 -rwmixread=70 -refill_buffers -filename=/dev/md0 -numjobs=16 -runtime=60s -bs=4k -iodepth=256 -rw=randread
Test result:
before this patchset: 3.2 GiB/s bind node before this patchset: 6.9 Gib/s after this patchset: 7.9 Gib/s bind node after this patchset: 8.0 Gib/s
Wang ShaoBo (1): arm64/topology: Getting preferred sibling's cpumask supported by platform
Yu Kuai (5): md/raid10: convert resync_lock to use seqlock md/raid10: prevent unnecessary calls to wake_up() in fast path block: add new fields in request_queue block: support to dispatch bio asynchronously md: enable dispatching bio asynchronously by default
arch/arm64/Kconfig | 8 ++ arch/arm64/include/asm/smp_plat.h | 14 ++ arch/arm64/kernel/smp.c | 9 ++ arch/arm64/kernel/topology.c | 51 +++++++ block/blk-core.c | 212 +++++++++++++++++++++++++++++- block/blk-sysfs.c | 40 ++++++ drivers/md/md.c | 5 + drivers/md/raid10.c | 98 +++++++++----- drivers/md/raid10.h | 2 +- include/linux/arch_topology.h | 7 + include/linux/blkdev.h | 6 + 11 files changed, 420 insertions(+), 32 deletions(-)