From: Yufen Yu yuyufen@huawei.com
hulk inclusion category: feature bugzilla: NA CVE: NA
-------------------------------------------------
In some scenario, likely spark-sql, almost all meta file's size is less then 2MB and applications read these smaller files in random mode. That means, it may issue multiple times random io to rotate disk, which can cause performance degradation.
To improve the small files random read, we try to read the first 2MB into pagecache on the first time of read. Then it can avoid multiple random io.
In fact, applications can call fadvise system with POSIX_FADV_WILLNEED to achieve this goal. But, some apps may cannot easily do that. So, we provide a new file flag FMODE_WILLNEED.
Signed-off-by: Yufen Yu yuyufen@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/fs.h | 3 +++ mm/readahead.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 36d828c741d5c..05c85ee240aff 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -169,6 +169,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File does not contribute to nr_files count */ #define FMODE_NOACCOUNT ((__force fmode_t)0x20000000)
+/* File will try to read head of the file into pagecache */ +#define FMODE_WILLNEED ((__force fmode_t)0x40000000) + /* * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector * that indicates that they should check the contents of the iovec are diff --git a/mm/readahead.c b/mm/readahead.c index 205ac348bb4ae..89da1e7f0aee0 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -24,6 +24,7 @@
#include "internal.h"
+#define READAHEAD_FIRST_SIZE (2 * 1024 * 1024) /* * Initialise a struct file's readahead state. Assumes that the caller has * memset *ra to zero. @@ -491,6 +492,35 @@ ondemand_readahead(struct address_space *mapping, return ra_submit(ra, mapping, filp); }
+/* + * Try to read first @ra_size from head of the file. + */ +static bool page_cache_readahead_from_head(struct address_space *mapping, + struct file *filp, pgoff_t offset, + unsigned long req_size, + unsigned long ra_size) +{ + struct backing_dev_info *bdi = inode_to_bdi(mapping->host); + struct file_ra_state *ra = &filp->f_ra; + unsigned long size = min_t(unsigned long, ra_size, + file_inode(filp)->i_size); + unsigned long nrpages = (size + PAGE_SIZE - 1) / PAGE_SIZE; + unsigned long max_pages; + unsigned int offs = 0; + + /* Cannot read date over target size, back to normal way */ + if (offset + req_size > nrpages) + return false; + + max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages); + max_pages = min(max_pages, nrpages); + while (offs < nrpages) { + force_page_cache_readahead(mapping, filp, offs, max_pages); + offs += max_pages; + } + return true; +} + /** * page_cache_sync_readahead - generic file readahead * @mapping: address_space which holds the pagecache and I/O vectors @@ -516,6 +546,12 @@ void page_cache_sync_readahead(struct address_space *mapping, if (blk_cgroup_congested()) return;
+ /* try to read first READAHEAD_FIRST_SIZE into pagecache */ + if (filp && (filp->f_mode & FMODE_WILLNEED) && + page_cache_readahead_from_head(mapping, filp, + offset, req_size, READAHEAD_FIRST_SIZE)) + return; + /* be dumb */ if (filp && (filp->f_mode & FMODE_RANDOM)) { force_page_cache_readahead(mapping, filp, offset, req_size);