[PATCH OLK-5.10 0/3] wakeup kswapd during node_reclaim
Wakeup kswapd during node_reclaim and add new bits for zone recalim to enable this kind of behavior. Wupeng Ma (3): arm64/numa: Make node reclaim distance adjustment always available mm: vmscan: wakeup kswapd during node_reclaim mm: node_reclaim: add wakeup kswapd mode Documentation/admin-guide/sysctl/vm.rst | 1 + arch/arm64/mm/numa.c | 40 ++++++++++++------------- mm/internal.h | 5 ++-- mm/page_alloc.c | 3 +- mm/vmscan.c | 8 ++++- 5 files changed, 33 insertions(+), 24 deletions(-) -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- The 'node_reclaim_distance' parameter influences both scheduling decisions and memory reclamation behavior. Currently, this value is fixed to its default setting and cannot be configured. During node reclaim operations, memory reclamation is only permitted when the distance between nodes is no bigger than the threshold defined by 'zone_allows_reclaim'. This restriction limits the flexibility of the node reclaim mechanism. This patch introduces the capability to adjust the 'node_reclaim_distance' parameter, allowing for more fine-grained control over node reclaim behavior. Commit 932fcc5ecb9a ("arm64/numa: Support node_reclaim_distance adjust for arch") introduced node_reclaim_distance adjustment for arm64. However, this feature was disabled by default in commit f5aef354d409 ("config: Disable CONFIG_ARCH_CUSTOM_NUMA_DISTANCE for arm64"). Since zone reclaim already depends on zone_allows_reclaim() for fine-grained control, move this logic outside of CONFIG_ARCH_CUSTOM_NUMA_DISTANCE to make it available by default on arm64. This gives users the ability to precisely tune zone reclaim behavior without requiring special configuration. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- arch/arm64/mm/numa.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c index 99a746e14f2b..139e5a5c6434 100644 --- a/arch/arm64/mm/numa.c +++ b/arch/arm64/mm/numa.c @@ -616,8 +616,27 @@ static int __init numa_register_nodes(void) return 0; } -#ifdef CONFIG_ARCH_CUSTOM_NUMA_DISTANCE #define DISTANCE_MAX (1 << DISTANCE_BITS) +static int __init node_reclaim_distance_setup(char *str) +{ + int val; + + if (kstrtoint(str, 0, &val)) + return 0; + + if (val < LOCAL_DISTANCE || val >= DISTANCE_MAX) + return 0; + + if (val != RECLAIM_DISTANCE) { + node_reclaim_distance = val; + pr_info("Force set node_reclaim_distance to %d\n", val); + } + + return 0; +} +early_param("node_reclaim_distance", node_reclaim_distance_setup); + +#ifdef CONFIG_ARCH_CUSTOM_NUMA_DISTANCE static void get_numa_distance_info(int *numa_levels, int *max_distance) { DECLARE_BITMAP(distance_map, DISTANCE_MAX); @@ -648,25 +667,6 @@ static void get_numa_distance_info(int *numa_levels, int *max_distance) *max_distance = max; } -static int __init node_reclaim_distance_setup(char *str) -{ - int val; - - if (kstrtoint(str, 0, &val)) - return 0; - - if (val < LOCAL_DISTANCE || val >= DISTANCE_MAX) - return 0; - - if (val != RECLAIM_DISTANCE) { - node_reclaim_distance = val; - pr_info("Force set node_reclaim_distance to %d\n", val); - } - - return 0; -} -early_param("node_reclaim_distance", node_reclaim_distance_setup); - static void __init node_reclaim_distance_adjust(void) { unsigned int model = read_cpuid_id() & MIDR_CPU_MODEL_MASK; -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- During testing, we observed that memory allocation with node_reclaim_mode enabled becomes extremely slow when a large allocation is attempted on a node whose free memory is mostly occupied by clean page cache. The slowness arises because during node reclaim, only direct reclaim-like behavior is triggered - recycling only 32 pages at a time - without waking kswapd, even when the watermark levels and alloc_flags already satisfy the condition to activate kswapd. This patch wakes kswapd during node reclaim, allowing background reclaim to bring free memory up to the high watermark and avoid excessive node reclaim overhead. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- mm/internal.h | 5 +++-- mm/page_alloc.c | 3 ++- mm/vmscan.c | 6 +++++- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 0f49e0e7a0aa..9a20d4e74568 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -565,10 +565,11 @@ static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn, #define NODE_RECLAIM_SUCCESS 1 #ifdef CONFIG_NUMA -extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int); +int node_reclaim(struct pglist_data *pgdat, gfp_t mask, unsigned int order, + int alloc_flags, struct zone *zone); #else static inline int node_reclaim(struct pglist_data *pgdat, gfp_t mask, - unsigned int order) + unsigned int order, int alloc_flags, struct zone *zone) { return NODE_RECLAIM_NOSCAN; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7cf3cd1d028b..d6b23b59ae42 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3952,7 +3952,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, !zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; - ret = node_reclaim(zone->zone_pgdat, gfp_mask, order); + ret = node_reclaim(zone->zone_pgdat, gfp_mask, order, + alloc_flags, zone); switch (ret) { case NODE_RECLAIM_NOSCAN: /* did not scan */ diff --git a/mm/vmscan.c b/mm/vmscan.c index e82d7995b548..3cb9bfce2031 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4530,7 +4530,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in return sc.nr_reclaimed >= nr_pages; } -int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) +int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order, + int alloc_flags, struct zone *zone) { int ret; @@ -4567,6 +4568,9 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) return NODE_RECLAIM_NOSCAN; + if (alloc_flags & ALLOC_KSWAPD) + wakeup_kswapd(zone, gfp_mask, order, gfp_zone(gfp_mask)); + ret = __node_reclaim(pgdat, gfp_mask, order); clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- To enable this: echo 8 > /proc/sys/vm/zone_reclaim_mode This bit can be combined with other bits (e.g., RECLAIM_UNMAP). For example, a value of 12 represents RECLAIM_UNMAP | RECLAIM_KSWAPD. When node_reclaim is enabled, it will not only attempt to unmap file pages during memory reclamation but will also wake up kswapd to trigger asynchronous memory reclamation. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- Documentation/admin-guide/sysctl/vm.rst | 1 + mm/vmscan.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 3e043b9aa4b4..f67cfc8106d6 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -1040,6 +1040,7 @@ This is value OR'ed together of 1 Zone reclaim on 2 Zone reclaim writes dirty pages out 4 Zone reclaim swaps pages +8 Zone reclaim wakeup kswapd = =================================== zone_reclaim_mode is disabled by default. For file servers or workloads diff --git a/mm/vmscan.c b/mm/vmscan.c index 3cb9bfce2031..d2cb061d7829 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4404,6 +4404,7 @@ int node_reclaim_mode __read_mostly; #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ +#define RECLAIM_KSWAPD (1<<3) /* Wakup kswapd during reclaim */ /* * Priority for NODE_RECLAIM. This determines the fraction of pages @@ -4568,7 +4569,8 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order, if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) return NODE_RECLAIM_NOSCAN; - if (alloc_flags & ALLOC_KSWAPD) + if ((node_reclaim_mode & RECLAIM_KSWAPD) && + (alloc_flags & ALLOC_KSWAPD)) wakeup_kswapd(zone, gfp_mask, order, gfp_zone(gfp_mask)); ret = __node_reclaim(pgdat, gfp_mask, order); -- 2.43.0
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/18878 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZYP... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/18878 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZYP...
participants (2)
-
patchwork bot -
Wupeng Ma