From: Zizhi Wo wozizhi@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT
--------------------------------
In the current erofs read on demand scenario, the state of back_page is determined during cachefiles reading process. If the back_page status is Uptodate, this read request is completed. If the status is Error, an error is returned. When the back_page status is neither of the above, the page has probably been truncated, and the cachefiles_read_reissue() is called. If it is not truncated, the read_page function of the back-end file system is called again for reading.
When the back-end file system is ext4, if 100% fail_make_request failure is injected, it will cause the final call __read_end_io() to clear page's Uptodate and Error flags. Judging that page status is neither of the above, cachefiles_read_copier() calls the read_page function of the back-end file system again, resulting in an endless loop. The outer netfs_page waits for back_page to complete the read request, causing a kernel hung.
Fix this issue by adding an additional flag to the monitor as it corresponds to back_page. This flag is placed when the read_page is called for the first time, and when the next time it needs to be called, the flag is set to determine whether the process needs to end.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com --- fs/cachefiles/internal.h | 2 ++ fs/cachefiles/rdwr.c | 17 ++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 355fb9b79741..349b1e6bb7cd 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -141,6 +141,8 @@ struct cachefiles_one_read { struct page *netfs_page; /* netfs page we're going to fill */ struct fscache_retrieval *op; /* retrieval op covering this */ struct list_head op_link; /* link in op's todo list */ + unsigned long flags; +#define CACHEFILES_MONITOR_ENTER_READ 0 /* restrict calls to read_page */ };
/* diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 453bf7cc88b3..d95435ad5542 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -108,6 +108,13 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, * need a second */ put_page(backpage2);
+ /* + * end the process if the page was not truncated + * and we have already read it before + */ + if (test_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags)) + return -EIO; + INIT_LIST_HEAD(&monitor->op_link); add_page_wait_queue(backpage, &monitor->monitor);
@@ -120,6 +127,8 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, goto unlock_discard;
_debug("reissue read"); + if (data_new_version(object->fscache.cookie)) + set_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto discard; @@ -190,7 +199,11 @@ static void cachefiles_read_copier(struct fscache_operation *_op) error = cachefiles_read_reissue(object, monitor); if (error == -EINPROGRESS) goto next; - goto recheck; + if (!data_new_version(object->fscache.cookie) || !error) + goto recheck; + pr_warn("%s, read error: %d, at page %lu, flags: %lx\n", + __func__, error, monitor->back_page->index, + (unsigned long) monitor->back_page->flags); } else { cachefiles_io_error_obj( object, @@ -284,6 +297,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, newpage = NULL;
read_backing_page: + if (data_new_version(object->fscache.cookie)) + set_bit(CACHEFILES_MONITOR_ENTER_READ, &monitor->flags); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto read_error;