[PATCH OLK-6.6 0/3] wakeup kswapd during node_reclaim
wakeup kswapd during node_reclaim and add new bits for zone recalim to enable this kind of swap. Wupeng Ma (3): arm64: support node_reclaim_distance adjust for arch mm: vmscan: wakeup kswapd during node_reclaim mm: node_reclaim: add wakeup kswapd mode Documentation/admin-guide/sysctl/vm.rst | 1 + arch/arm64/mm/init.c | 20 ++++++++++++++++++++ include/linux/swap.h | 3 ++- include/uapi/linux/mempolicy.h | 1 + mm/internal.h | 5 +++-- mm/page_alloc.c | 3 ++- mm/vmscan.c | 7 ++++++- 7 files changed, 35 insertions(+), 5 deletions(-) -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- The 'node_reclaim_distance' parameter influences both scheduling decisions and memory reclamation behavior. Currently, this value is fixed to its default setting and cannot be configured. During node reclaim operations, memory reclamation is only permitted when the distance between nodes is no bigger than the threshold defined by 'zone_allows_reclaim'. This restriction limits the flexibility of the node reclaim mechanism. This patch introduces the capability to adjust the 'node_reclaim_distance' parameter, allowing for more fine-grained control over node reclaim behavior. Commit 932fcc5ecb9a ("arm64/numa: Support node_reclaim_distance adjust for arch") introduced node_reclaim_distance adjustment for arm64. However, this feature was disabled by default in commit f5aef354d409 ("config: Disable CONFIG_ARCH_CUSTOM_NUMA_DISTANCE for arm64"). Since zone reclaim already depends on zone_allows_reclaim() for fine-grained control, move this logic outside of CONFIG_ARCH_CUSTOM_NUMA_DISTANCE to make it available by default on arm64. This gives users the ability to precisely tune zone reclaim behavior without requiring special configuration. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- arch/arm64/mm/init.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index da75dd9d964b..ea8d67661dea 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -686,3 +686,23 @@ void dump_mem_limit(void) pr_emerg("Memory Limit: none\n"); } } + +#define DISTANCE_MAX (1 << DISTANCE_BITS) +static int __init node_reclaim_distance_setup(char *str) +{ + int val; + + if (kstrtoint(str, 0, &val)) + return 0; + + if (val < LOCAL_DISTANCE || val >= DISTANCE_MAX) + return 0; + + if (val != RECLAIM_DISTANCE) { + node_reclaim_distance = val; + pr_info("force set node_reclaim_distance to %d\n", val); + } + + return 0; +} +early_param("node_reclaim_distance", node_reclaim_distance_setup); -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- During testing, we observed that memory allocation with node_reclaim_mode enabled becomes extremely slow when a large allocation is attempted on a node whose free memory is mostly occupied by clean page cache. The slowness arises because during node reclaim, only direct reclaim-like behavior is triggered - recycling only 32 pages at a time - without waking kswapd, even when the watermark levels and alloc_flags already satisfy the condition to activate kswapd. This patch wakes kswapd during node reclaim, allowing background reclaim to bring free memory up to the high watermark and avoid excessive node reclaim overhead. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- mm/internal.h | 5 +++-- mm/page_alloc.c | 3 ++- mm/vmscan.c | 6 +++++- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 71e6f523175d..d9963389cbff 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1077,11 +1077,12 @@ static inline void mminit_verify_zonelist(void) #define NODE_RECLAIM_SUCCESS 1 #ifdef CONFIG_NUMA -extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int); +int node_reclaim(struct pglist_data *pgdat, gfp_t mask, unsigned int order, + int alloc_flags, struct zone *zone); extern int find_next_best_node(int node, nodemask_t *used_node_mask); #else static inline int node_reclaim(struct pglist_data *pgdat, gfp_t mask, - unsigned int order) + unsigned int order, int alloc_flags, struct zone *zone) { return NODE_RECLAIM_NOSCAN; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ce0203f660e8..77aef44655ad 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3501,7 +3501,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, !zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; - ret = node_reclaim(zone->zone_pgdat, gfp_mask, order); + ret = node_reclaim(zone->zone_pgdat, gfp_mask, order, + alloc_flags, zone); switch (ret) { case NODE_RECLAIM_NOSCAN: /* did not scan */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 2cecc9a173aa..5231320c5559 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -7512,7 +7512,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in return sc.nr_reclaimed >= nr_pages; } -int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) +int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order, + int alloc_flags, struct zone *zone) { int ret; @@ -7549,6 +7550,9 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) return NODE_RECLAIM_NOSCAN; + if (alloc_flags & ALLOC_KSWAPD) + wakeup_kswapd(zone, gfp_mask, order, gfp_zone(gfp_mask)); + ret = __node_reclaim(pgdat, gfp_mask, order); clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); -- 2.43.0
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ID4GC1 -------------------------------- To enable this: echo 8 > /proc/sys/vm/zone_reclaim_mode This bit can be combined with other bits (e.g., RECLAIM_UNMAP). For example, a value of 12 represents RECLAIM_UNMAP | RECLAIM_KSWAPD. When node_reclaim is enabled, it will not only attempt to unmap file pages during memory reclamation but will also wake up kswapd to trigger asynchronous memory reclamation. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> --- Documentation/admin-guide/sysctl/vm.rst | 1 + include/linux/swap.h | 3 ++- include/uapi/linux/mempolicy.h | 1 + mm/vmscan.c | 3 ++- 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 45fe1813edfb..2ad3bf320c55 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -1035,6 +1035,7 @@ This is value OR'ed together of 1 Zone reclaim on 2 Zone reclaim writes dirty pages out 4 Zone reclaim swaps pages +8 Zone reclaim wakeup kswapd = =================================== zone_reclaim_mode is disabled by default. For file servers or workloads diff --git a/include/linux/swap.h b/include/linux/swap.h index 0b7ebe9b3e2c..f11b96cec851 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -450,7 +450,8 @@ extern int sysctl_min_slab_ratio; static inline bool node_reclaim_enabled(void) { /* Is any node_reclaim_mode bit set? */ - return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP); + return node_reclaim_mode & + (RECLAIM_ZONE | RECLAIM_WRITE | RECLAIM_UNMAP | RECLAIM_KSWAPD); } void check_move_unevictable_folios(struct folio_batch *fbatch); diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 1f9bb10d1a47..d7c2d453a691 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -72,5 +72,6 @@ enum { #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ +#define RECLAIM_KSWAPD (1<<3) /* Wakup kswapd during reclaim */ #endif /* _UAPI_LINUX_MEMPOLICY_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 5231320c5559..9e4aedbfd85e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -7550,7 +7550,8 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order, if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) return NODE_RECLAIM_NOSCAN; - if (alloc_flags & ALLOC_KSWAPD) + if ((node_reclaim_mode & RECLAIM_KSWAPD) && + (alloc_flags & ALLOC_KSWAPD)) wakeup_kswapd(zone, gfp_mask, order, gfp_zone(gfp_mask)); ret = __node_reclaim(pgdat, gfp_mask, order); -- 2.43.0
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://gitee.com/openeuler/kernel/pulls/18941 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/Y26... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://gitee.com/openeuler/kernel/pulls/18941 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/Y26...
participants (2)
-
patchwork bot -
Wupeng Ma