From: Yufen Yu yuyufen@huawei.com
hulk inclusion category: feature bugzilla: 173267 CVE: NA ---------------------------
For hibench applications, include kmeans, wordcount and terasort, they will read whole blk_xxx and blk_xxx.meta from disk in sequential. And almost all of the read issued to disk are triggered by async readahead.
While sequential read of single thread does't means sequential io on disk when multiple threads cocurrently do that. Multiple threads interleaving sequentail read can make io issued into disk become random, which will limit disk IO throughput.
To reduce disk randomization, we can consider to increase readahead window. Then IO generated by filesystem will be bigger in each time of async readahead. But, limited by disk max_hw_sectors_kb, big IO will be split and the whole bio need to wait all split bios complete, which can cause longer io latency.
Our trace shows that many long latency in threads are caused by waiting async readahead IO complete when set readahead window with a big value. That means, thread read read speed is faster than async readahead io complete.
To improve performance, we try to provide a special async readahead method:
* On the one hand, we try to read more sequential data from disk, which can reduce disk randomization when multiple thread interleaving.
* On the other hand, size of each IO issued to disk is 2M, which can avoid big IO split and long io latency.
Signed-off-by: Yufen Yu yuyufen@huawei.com Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/readahead.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/mm/readahead.c b/mm/readahead.c index 89da1e7f0aee0..7a21199c6227d 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -25,6 +25,9 @@ #include "internal.h"
#define READAHEAD_FIRST_SIZE (2 * 1024 * 1024) +#define READAHEAD_MIN_SIZE (2 * 1024 * 1024) +#define READAHEAD_ASYNC_RATIO 2 +#define FILE_READAHEAD_TIMES 4 /* * Initialise a struct file's readahead state. Assumes that the caller has * memset *ra to zero. @@ -563,6 +566,30 @@ void page_cache_sync_readahead(struct address_space *mapping, } EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
+static void do_special_async_readahead(struct address_space *mapping, + struct file_ra_state *ra, struct file *filp) +{ + loff_t isize = i_size_read(file_inode(filp)); + unsigned long nrpages = DIV_ROUND_UP(isize, PAGE_SIZE); + unsigned long size = DIV_ROUND_UP(nrpages, FILE_READAHEAD_TIMES); + unsigned int each_ra_size = READAHEAD_MIN_SIZE / PAGE_SIZE; + unsigned long set_page_readahead = size / READAHEAD_ASYNC_RATIO; + + while (size > 0) { + if (ra->start + ra->size > nrpages) + break; + ra->start += ra->size; + ra->size = each_ra_size; + /* SetPageReadahead to do next async readahead */ + if (size == set_page_readahead) + ra->async_size = ra->size; + else + ra->async_size = 0; + ra_submit(ra, mapping, filp); + size -= min_t(unsigned long, size, each_ra_size); + } +} + /** * page_cache_async_readahead - file readahead for marked pages * @mapping: address_space which holds the pagecache and I/O vectors @@ -605,6 +632,11 @@ page_cache_async_readahead(struct address_space *mapping, if (blk_cgroup_congested()) return;
+ if (filp && (filp->f_mode & FMODE_SPC_READAHEAD)) { + do_special_async_readahead(mapping, ra, filp); + return; + } + /* do read-ahead */ ondemand_readahead(mapping, ra, filp, true, offset, req_size); }