From: Johannes Weiner hannes@cmpxchg.org
mainline inclusion from mainline-v5.2-rc1 commit def0fdae813dbbbbb588bfc5f52856be2e842b35 category: bugfix bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I CVE: NA
-------------------------------------------------
When a cgroup is reclaimed on behalf of a configured limit, reclaim needs to round-robin through all NUMA nodes that hold pages of the memcg in question. However, when assembling the mask of candidate NUMA nodes, the code only consults the *local* cgroup LRU counters, not the recursive counters for the entire subtree. Cgroup limits are frequently configured against intermediate cgroups that do not have memory on their own LRUs. In this case, the node mask will always come up empty and reclaim falls back to scanning only the current node.
If a cgroup subtree has some memory on one node but the processes are bound to another node afterwards, the limit reclaim will never age or reclaim that memory anymore.
To fix this, use the recursive LRU counts for a cgroup subtree to determine which nodes hold memory of that cgroup.
The code has been broken like this forever, so it doesn't seem to be a problem in practice. I just noticed it while reviewing the way the LRU counters are used in general.
Link: http://lkml.kernel.org/r/20190412151507.2769-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner hannes@cmpxchg.org Reviewed-by: Shakeel Butt shakeelb@google.com Reviewed-by: Roman Gushchin guro@fb.com Cc: Michal Hocko mhocko@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Chen Zhou chenzhou10@huawei.com Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com (cherry picked from commit 01bd66c080ee6c7ae41af4c0ba226c0b84f1bfc8) Signed-off-by: Lu Jialin lujialin4@huawei.com Reviewed-by: Jing Xiangfeng jingxiangfeng@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/memcontrol.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d7570cd5c2078..90cf8e2ba15fc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1524,13 +1524,13 @@ static bool test_mem_cgroup_node_reclaimable(struct mem_cgroup *memcg, { struct lruvec *lruvec = mem_cgroup_lruvec(NODE_DATA(nid), memcg);
- if (lruvec_page_state_local(lruvec, NR_INACTIVE_FILE) || - lruvec_page_state_local(lruvec, NR_ACTIVE_FILE)) + if (lruvec_page_state(lruvec, NR_INACTIVE_FILE) || + lruvec_page_state(lruvec, NR_ACTIVE_FILE)) return true; if (noswap || !total_swap_pages) return false; - if (lruvec_page_state_local(lruvec, NR_INACTIVE_ANON) || - lruvec_page_state_local(lruvec, NR_ACTIVE_ANON)) + if (lruvec_page_state(lruvec, NR_INACTIVE_ANON) || + lruvec_page_state(lruvec, NR_ACTIVE_ANON)) return true; return false;