
Hi, In Arm processors, there is a hardware PMU (Performance Monitoring Unit) facility called Statistical Profiling Extension (SPE) that can gather memory access metrics. In this patchset, SPE is exploited as an access information sampling approach to drive NUMA balancing. This sampling approach is introducedto replace the method based on address space scanning and hint faults with the access information provided by the hardware. With this, it is no longer necessary for NUMA balancing to scan over the address space periodically and rely on task-to-page association built by NUMA hint faults. Instead, the access samples obtained from hardware PMU are fed to NUMA balancing as equivalents to page fault. Except for the replaced sampling approach, the rest of the NUMA balancing policy is retained to do pages and tasks migrations according to the samples. Profiling based on SPE is an valid alternative sampling approach in NUMA balancing for the optimal page and task placement. This can be also extended to other architectures as long as there is a hardware PMU that supports memory access profiling. An abstract layer mem_sampling is introduced to reserve support for other kernel features and different types of hardware PMU. To help evaluate performance of this approach in system, syctl interfaces are added to enable/disable hardware mem sampling. NUMA balancing sampling approach can be also switched back to hint-faults- based approach dynamically. TODOs Currently, SPE for NUMA balance does not support PMD-level page migration, but it will be supported in later version. changes since v4: -- patch 4 introduce helper function. -- fix commit issue. Ze Zuo (19): mm_monitor/mm_spe: Introduce standalone SPE profiling framework mm_monitor/mm_spe: Init per-CPU buffers and SPE state mm_monitor/mm_spe: Add PMU based memory sampling abstract layer mm_monitor/mm_spe: Introduce arm_spe_user to abstract SPE usage mm/mem_sampling: Add eBPF interface for memory access tracing mm/mem_sampling: Add sched switch hook to control sampling state sched: Enable per-process mem_sampling from sched switch path mm/mem_sampling:: Add proc and cmdline interface to control sampling enable mm/numa: Use mem_sampling framework for NUMA balancing mm/numa: Enable mem_sampling-based access tracking for NUMA balancing mm/mem_sampling: Add sysctl control for NUMA balancing integration mm/numa: Add tracepoints for access sampling and NUMA page migration mm/damon/vaddr: Support hardware-assisted memory access sampling mm/damon/vaddr: Extend mem_sampling sysctl to support DAMON mm/damon/vaddr: Add demotion interface for migrating cold pages to target nodemask arm-spe: Boost SPE add TLB hot page and remote access tracking arm-spe: Add kernel cmdline option to enable SPE boost arm-spe: Export boost SPE sampling info via tracefs tracepoint config: Enable memory sampling based pmu for numa balance and damon by default arch/arm64/configs/openeuler_defconfig | 4 + drivers/Kconfig | 2 + drivers/Makefile | 2 + drivers/arm/Kconfig | 2 + drivers/arm/mm_monitor/Kconfig | 20 + drivers/arm/mm_monitor/Makefile | 2 + drivers/arm/mm_monitor/mm_spe.c | 537 +++++++++++++++ drivers/arm/mm_monitor/mm_spe.h | 102 +++ drivers/arm/mm_monitor/spe-decoder/Makefile | 2 + .../mm_monitor/spe-decoder/arm-spe-decoder.c | 224 +++++++ .../mm_monitor/spe-decoder/arm-spe-decoder.h | 75 +++ .../spe-decoder/arm-spe-pkt-decoder.c | 227 +++++++ .../spe-decoder/arm-spe-pkt-decoder.h | 158 +++++ drivers/perf/arm_pmu_acpi.c | 31 + drivers/perf/arm_spe_pmu.c | 112 +++- include/linux/damon.h | 8 + include/linux/mem_sampling.h | 133 ++++ include/linux/mempolicy.h | 2 + include/linux/migrate_mode.h | 1 + include/linux/mm_types.h | 4 + include/trace/events/kmem.h | 112 ++++ include/trace/events/migrate.h | 3 +- kernel/fork.c | 3 + kernel/sched/core.c | 2 + kernel/sched/fair.c | 13 + mm/Kconfig | 28 + mm/Makefile | 1 + mm/damon/Kconfig | 14 + mm/damon/core.c | 34 + mm/damon/sysfs-schemes.c | 40 ++ mm/damon/vaddr.c | 127 ++++ mm/mem_sampling.c | 614 ++++++++++++++++++ mm/mempolicy.c | 146 +++++ 33 files changed, 2783 insertions(+), 2 deletions(-) create mode 100644 drivers/arm/Kconfig create mode 100644 drivers/arm/mm_monitor/Kconfig create mode 100644 drivers/arm/mm_monitor/Makefile create mode 100644 drivers/arm/mm_monitor/mm_spe.c create mode 100644 drivers/arm/mm_monitor/mm_spe.h create mode 100644 drivers/arm/mm_monitor/spe-decoder/Makefile create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-decoder.c create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-decoder.h create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-pkt-decoder.c create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-pkt-decoder.h create mode 100644 include/linux/mem_sampling.h create mode 100644 mm/mem_sampling.c -- 2.25.1