From: Charan Teja Kalla quic_charante@quicinc.com
mainline inclusion from mainline-v6.8-rc1 commit d68e39fc45f70e35eb74df2128d315c1d91e4dc4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9OCYO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: page_alloc: fixes for high atomic reserve caluculations", v3.
The state of the system where the issue exposed shown in oom kill logs:
[ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kBlocal_pcp:492kB free_cma:0kB [ 295.998656] lowmem_reserve[]: 0 32 [ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH) 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7752kB
From the above, it is seen that ~16MB of memory reserved for high atomic reserves against the expectation of 1% reserves which is fixed in the 1st patch.
Don't reserve the high atomic page blocks if 1% of zone memory size is below a pageblock size.
This patch (of 2):
reserve_highatomic_pageblock() aims to reserve the 1% of the managed pages of a zone, which is used for the high order atomic allocations.
It uses the below calculation to reserve: static void reserve_highatomic_pageblock(struct page *page, ....) {
....... max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
if (zone->nr_reserved_highatomic >= max_managed) goto out;
zone->nr_reserved_highatomic += pageblock_nr_pages; set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
out: .... }
Since we are always appending the 1% of zone managed pages count to pageblock_nr_pages, the minimum it is turning into 2 pageblocks as the nr_reserved_highatomic is incremented/decremented in pageblock sizes.
Encountered a system(actually a VM running on the Linux kernel) with the below zone configuration: Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB managed:49224kB
The existing calculations making it to reserve the 8MB(with pageblock size of 4MB) i.e. 16% of the zone managed memory. Reserving such high amount of memory can easily exert memory pressure in the system thus may lead into unnecessary reclaims till unreserving of high atomic reserves.
Since high atomic reserves are managed in pageblock size granules, as MIGRATE_HIGHATOMIC is set for such pageblock, fix the calculations for high atomic reserves as, minimum is pageblock size , maximum is approximately 1% of the zone managed pages.
Link: https://lkml.kernel.org/r/cover.1700821416.git.quic_charante@quicinc.com Link: https://lkml.kernel.org/r/1660034138397b82a0a8b6ae51cbe96bd583d89e.170082141... Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Acked-by: Mel Gorman mgorman@techsingularity.net Acked-by: David Rientjes rientjes@google.com Cc: David Hildenbrand david@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@suse.com Cc: Pavankumar Kondeti quic_pkondeti@quicinc.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Liu Shixin liushixin2@huawei.com --- mm/page_alloc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e92509d4a5d6..e911f79473bb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1912,10 +1912,11 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone) unsigned long max_managed, flags;
/* - * Limit the number reserved to 1 pageblock or roughly 1% of a zone. + * The number reserved as: minimum is 1 pageblock, maximum is + * roughly 1% of a zone. * Check is race-prone but harmless. */ - max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages; + max_managed = ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_pages); if (zone->nr_reserved_highatomic >= max_managed) return;