[PATCH kernel-4.19 3/4] mm: support special async readahead

12 Jul 2021

From: Yufen Yu <yuyufen@huawei.com>

hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

For hibench applications, include kmeans, wordcount and terasort,
they will read whole blk_xxx and blk_xxx.meta from disk in sequential.
And almost all of the read issued to disk are triggered by async
readahead.

While sequential read of single thread does't means sequential io
on disk when multiple threads cocurrently do that. Multiple threads
interleaving sequentail read can make io issued into disk become
random, which will limit disk IO throughput.

To reduce disk randomization, we can consider to increase readahead
window. Then IO generated by filesystem will be bigger in each time
of async readahead. But, limited by disk max_hw_sectors_kb, big IO
will be split and the whole bio need to wait all split bios complete,
which can cause longer io latency.

Our trace shows that many long latency in threads are caused by waiting
async readahead IO complete when set readahead window with a big value.
That means, thread read read speed is faster than async readahead io
complete.

To improve performance, we try to provide a special async readahead
method:

    * On the one hand, we try to read more sequential data from disk,
      which can reduce disk randomization when multiple thread
      interleaving.

    * On the other hand, size of each IO issued to disk is 2M, which
      can avoid big IO split and long io latency.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 mm/readahead.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/mm/readahead.c b/mm/readahead.c
index 89da1e7f0aee0..7a21199c6227d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -25,6 +25,9 @@
 #include "internal.h"
 
 #define READAHEAD_FIRST_SIZE	(2 * 1024 * 1024)
+#define READAHEAD_MIN_SIZE	(2 * 1024 * 1024)
+#define READAHEAD_ASYNC_RATIO	2
+#define FILE_READAHEAD_TIMES	4
 /*
  * Initialise a struct file's readahead state.  Assumes that the caller has
  * memset *ra to zero.
@@ -563,6 +566,30 @@ void page_cache_sync_readahead(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
 
+static void do_special_async_readahead(struct address_space *mapping,
+			struct file_ra_state *ra, struct file *filp)
+{
+	loff_t isize = i_size_read(file_inode(filp));
+	unsigned long nrpages = DIV_ROUND_UP(isize, PAGE_SIZE);
+	unsigned long size = DIV_ROUND_UP(nrpages, FILE_READAHEAD_TIMES);
+	unsigned int each_ra_size = READAHEAD_MIN_SIZE / PAGE_SIZE;
+	unsigned long set_page_readahead = size / READAHEAD_ASYNC_RATIO;
+
+	while (size > 0) {
+		if (ra->start + ra->size > nrpages)
+			break;
+		ra->start += ra->size;
+		ra->size = each_ra_size;
+		/* SetPageReadahead to do next async readahead */
+		if (size == set_page_readahead)
+			ra->async_size = ra->size;
+		else
+			ra->async_size = 0;
+		ra_submit(ra, mapping, filp);
+		size -= min_t(unsigned long, size, each_ra_size);
+	}
+}
+
 /**
  * page_cache_async_readahead - file readahead for marked pages
  * @mapping: address_space which holds the pagecache and I/O vectors
@@ -605,6 +632,11 @@ page_cache_async_readahead(struct address_space *mapping,
 	if (blk_cgroup_congested())
 		return;
 
+	if (filp && (filp->f_mode & FMODE_SPC_READAHEAD)) {
+		do_special_async_readahead(mapping, ra, filp);
+		return;
+	}
+
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
 }
-- 
2.25.1