From: Roman Gushchin guro@fb.com
mainline inclusion from mainline-5.7-rc1 commit 48fe267c503ec22014ba4e83d002b07caad034d0 category: bugfix bugzilla: 33351 CVE: NA
------------------------------------------------- If a task is getting moved out of the OOMing cgroup, it might result in unexpected OOM killings if memory.oom.group is used anywhere in the cgroup tree.
Imagine the following example:
A (oom.group = 1) / \ (OOM) B C
Let's say B's memory.max is exceeded and it's OOMing. The OOM killer selects a task in B as a victim, but someone asynchronously moves the task into C. mem_cgroup_get_oom_group() will iterate over all ancestors of C up to the root cgroup. In theory it had to stop at the oom_domain level - the memory cgroup which is OOMing. But because B is not an ancestor of C, it's not happening. Instead it chooses A (because it's oom.group is set), and kills all tasks in A. This behavior is wrong because the OOM happened in B, so there is no reason to kill anything outside.
Fix this by checking it the memory cgroup to which the task belongs is a descendant of the oom_domain. If not, memory.oom.group should be ignored, and the OOM killer should kill only the victim task.
Reported-by: Dan Schatzberg dschatzberg@fb.com Signed-off-by: Roman Gushchin guro@fb.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Acked-by: Johannes Weiner hannes@cmpxchg.org Link: http://lkml.kernel.org/r/20200316223510.3176148-1-guro@fb.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org (cherry picked from commit 48fe267c503ec22014ba4e83d002b07caad034d0) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/memcontrol.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1342b9540476..8611f3301686 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1856,6 +1856,14 @@ struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, if (memcg == root_mem_cgroup) goto out;
+ /* + * If the victim task has been asynchronously moved to a different + * memory cgroup, we might end up killing tasks outside oom_domain. + * In this case it's better to ignore memory.group.oom. + */ + if (unlikely(!mem_cgroup_is_descendant(memcg, oom_domain))) + goto out; + /* * Traverse the memory cgroup hierarchy from the victim task's * cgroup up to the OOMing cgroup (or root) to find the