[PATCH OLK-5.10 v4 0/2] Fix I/O high when memory almost met memcg limit

Recently, when install package in a docker which almost reached its memory limit, the installer has no respond severely for more than 15 minutes. During this period, I/O stays high(~1G/s) and influence the whole machine. I've constructed a use case as follows: 1. create a docker: $ cat test.sh #!/bin/bash docker rm centos7 --force docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init docker start centos7 sleep 1 docker cp ./alloc_page centos7:/ docker cp ./reproduce.sh centos7:/ docker exec -it centos7 /bin/bash 2. try reproduce the problem in docker: $ cat reproduce.sh #!/bin/bash while true; do flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l) if [ "$flag" -eq 0 ]; then /alloc_page & fi sleep 30 start_time=$(date +%s) yum install -y expect > /dev/null 2>&1 end_time=$(date +%s) elapsed_time=$((end_time - start_time)) echo "$elapsed_time seconds" yum remove -y expect > /dev/null 2>&1 done $ cat alloc_page.c: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #define SIZE 1*1024*1024 //1M int main() { void *addr = NULL; int i; for (i = 0; i < 1024 * 6 - 50;i++) { addr = (void *)malloc(SIZE); if (!addr) return -1; memset(addr, 0, SIZE); } sleep(99999); return 0; } We found that this problem is caused by a lot ot meaningless read-ahead. Since the docker is almost met memory limit, the page will be reclaimed immediately after read-ahead and will read-ahead again immediately. The program is executed slowly and waste a lot of I/O resource. These patches aim to break the read-ahead in above scenario. Liu Shixin (2): mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM mm/readahead: don't decrease mmap_miss when folio has workingset flags mm/filemap.c | 9 ++++++++- mm/readahead.c | 17 ++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) -- 2.25.1

mainlist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8EXN6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?... -------------------------------- Patch series "Fix I/O high when memory almost met memcg limit", v2. Recently, when install package in a docker which almost reached its memory limit, the installer has no respond severely for more than 15 minutes. During this period, I/O stays high(~1G/s) and influence the whole machine. I've constructed a use case as follows: 1. create a docker: $ cat test.sh #!/bin/bash docker rm centos7 --force docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init docker start centos7 sleep 1 docker cp ./alloc_page centos7:/ docker cp ./reproduce.sh centos7:/ docker exec -it centos7 /bin/bash 2. try reproduce the problem in docker: $ cat reproduce.sh #!/bin/bash while true; do flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l) if [ "$flag" -eq 0 ]; then /alloc_page & fi sleep 30 start_time=$(date +%s) yum install -y expect > /dev/null 2>&1 end_time=$(date +%s) elapsed_time=$((end_time - start_time)) echo "$elapsed_time seconds" yum remove -y expect > /dev/null 2>&1 done $ cat alloc_page.c: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #define SIZE 1*1024*1024 //1M int main() { void *addr = NULL; int i; for (i = 0; i < 1024 * 6 - 50;i++) { addr = (void *)malloc(SIZE); if (!addr) return -1; memset(addr, 0, SIZE); } sleep(99999); return 0; } We found that this problem is caused by a lot ot meaningless read-ahead. Since the docker is almost met memory limit, the page will be reclaimed immediately after read-ahead and will read-ahead again immediately. The program is executed slowly and waste a lot of I/O resource. These two patch aim to break the read-ahead in above scenario. [1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@huawei... [2] https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/ [3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/ This patch (of 2): When filemap_add_folio() return -ENOMEM, break read-ahead loop like what filemap_alloc_folio() does. Link: https://lkml.kernel.org/r/20240322093555.226789-1-liushixin2@huawei.com Link: https://lkml.kernel.org/r/20240322093555.226789-2-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/readahead.c Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/readahead.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index ed23d5dec123..c0042e33a743 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -220,11 +220,18 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (mapping->a_ops->readpages) { page->index = index + i; list_add(&page->lru, &page_pool); - } else if (add_to_page_cache_lru(page, mapping, index + i, - gfp_mask) < 0) { - put_page(page); - read_pages(ractl, &page_pool, true); - continue; + } else { + int ret; + + ret = add_to_page_cache_lru(page, mapping, index + i, + gfp_mask); + if (ret < 0) { + put_page(page); + if (ret == -ENOMEM) + break; + read_pages(ractl, &page_pool, true); + continue; + } } if (i == nr_to_read - lookahead_size) SetPageReadahead(page); -- 2.25.1

maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8EXN6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?... -------------------------------- If there are too many folios that are recently evicted in a file, then they will probably continue to be evicted. In such situation, there is no positive effect to read-ahead this file since it is only a waste of IO. The mmap_miss is increased in do_sync_mmap_readahead() and decreased in both do_async_mmap_readahead() and filemap_map_pages(). In order to skip read-ahead in above scenario, the mmap_miss have to increased exceed MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio has workingset flags. The async path is not to care because in above scenario, it's hard to run into the async path. Link: https://lkml.kernel.org/r/20240322093555.226789-3-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Conflicts: mm/filemap.c Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- mm/filemap.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 2eeb9978f39e..b48ec6fc8f4b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3168,7 +3168,14 @@ void filemap_map_pages(struct vm_fault *vmf, if (xas.xa_index >= max_idx) goto unlock; - if (mmap_miss > 0) + /* + * If there are too many pages that are recently evicted + * in a file, they will probably continue to be evicted. + * In such situation, read-ahead is only a waste of IO. + * Don't decrease mmap_miss in this scenario to make sure + * we can stop read-ahead. + */ + if (mmap_miss > 0 && !PageWorkingset(page)) mmap_miss--; vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT; -- 2.25.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5614 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5614 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D...

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/5616 邮件列表地址:https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/5616 Mailing list address: https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/D...
participants (2)
-
Liu Shixin
-
patchwork bot