New subject: [PATCH OLK-6.6 v2] mm_monitor/mm_spe: Introduce standalone SPE profiling framework

18 May 2025

      Hi,

In Arm processors, there is a hardware PMU (Performance Monitoring
Unit) facility called Statistical Profiling Extension (SPE) that can
gather memory access metrics.

In this patchset, SPE is exploited as an access information sampling
approach to drive NUMA balancing. This sampling approach is
introducedto replace the method based on address space scanning and
hint faults with the access information provided by the hardware.
With this, it is no longer necessary for NUMA balancing to scan over
the address space periodically and rely on task-to-page association
built by NUMA hint faults. Instead, the access samples obtained from
hardware PMU are fed to NUMA balancing as equivalents to page fault.
Except for the replaced sampling approach, the rest of the NUMA
balancing policy is retained to do pages and tasks migrations
according to the samples.

Profiling based on SPE is an valid alternative sampling approach in
NUMA balancing for the optimal page and task placement. This can be
also extended to other architectures as long as there is a hardware
PMU that supports memory access profiling.

An abstract layer mem_sampling is introduced to reserve support for
other kernel features and different types of hardware PMU.

To help evaluate performance of this approach in system, syctl
interfaces are added to enable/disable hardware mem sampling. NUMA
balancing sampling approach can be also switched back to hint-faults-
based approach dynamically.

TODOs
Currently, SPE for NUMA balance does not support PMD-level page
migration, but it will be supported in later version.

changes since v4:
-- patch 4 introduce helper function.
-- fix commit issue.

Ze Zuo (19):
  mm_monitor/mm_spe: Introduce standalone SPE profiling framework
  mm_monitor/mm_spe: Init per-CPU buffers and SPE state
  mm_monitor/mm_spe: Add PMU based memory sampling abstract layer
  mm_monitor/mm_spe: Introduce arm_spe_user to abstract SPE usage
  mm/mem_sampling: Add eBPF interface for memory access tracing
  mm/mem_sampling: Add sched switch hook to control sampling state
  sched: Enable per-process mem_sampling from sched switch path
  mm/mem_sampling:: Add proc and cmdline interface to control sampling
    enable
  mm/numa: Use mem_sampling framework for NUMA balancing
  mm/numa: Enable mem_sampling-based access tracking for NUMA balancing
  mm/mem_sampling: Add sysctl control for NUMA balancing integration
  mm/numa: Add tracepoints for access sampling and NUMA page migration
  mm/damon/vaddr: Support hardware-assisted memory access sampling
  mm/damon/vaddr: Extend mem_sampling sysctl to support DAMON
  mm/damon/vaddr: Add demotion interface for migrating cold pages to
    target nodemask
  arm-spe: Boost SPE add TLB hot page and remote access tracking
  arm-spe: Add kernel cmdline option to enable SPE boost
  arm-spe: Export boost SPE sampling info via tracefs tracepoint
  config: Enable memory sampling based pmu for numa balance and damon by
    default

 arch/arm64/configs/openeuler_defconfig        |   4 +
 drivers/Kconfig                               |   2 +
 drivers/Makefile                              |   2 +
 drivers/arm/Kconfig                           |   2 +
 drivers/arm/mm_monitor/Kconfig                |  20 +
 drivers/arm/mm_monitor/Makefile               |   2 +
 drivers/arm/mm_monitor/mm_spe.c               | 537 +++++++++++++++
 drivers/arm/mm_monitor/mm_spe.h               | 102 +++
 drivers/arm/mm_monitor/spe-decoder/Makefile   |   2 +
 .../mm_monitor/spe-decoder/arm-spe-decoder.c  | 224 +++++++
 .../mm_monitor/spe-decoder/arm-spe-decoder.h  |  75 +++
 .../spe-decoder/arm-spe-pkt-decoder.c         | 227 +++++++
 .../spe-decoder/arm-spe-pkt-decoder.h         | 158 +++++
 drivers/perf/arm_pmu_acpi.c                   |  31 +
 drivers/perf/arm_spe_pmu.c                    | 112 +++-
 include/linux/damon.h                         |   8 +
 include/linux/mem_sampling.h                  | 133 ++++
 include/linux/mempolicy.h                     |   2 +
 include/linux/migrate_mode.h                  |   1 +
 include/linux/mm_types.h                      |   4 +
 include/trace/events/kmem.h                   | 112 ++++
 include/trace/events/migrate.h                |   3 +-
 kernel/fork.c                                 |   3 +
 kernel/sched/core.c                           |   2 +
 kernel/sched/fair.c                           |  13 +
 mm/Kconfig                                    |  28 +
 mm/Makefile                                   |   1 +
 mm/damon/Kconfig                              |  14 +
 mm/damon/core.c                               |  34 +
 mm/damon/sysfs-schemes.c                      |  40 ++
 mm/damon/vaddr.c                              | 127 ++++
 mm/mem_sampling.c                             | 614 ++++++++++++++++++
 mm/mempolicy.c                                | 146 +++++
 33 files changed, 2783 insertions(+), 2 deletions(-)
 create mode 100644 drivers/arm/Kconfig
 create mode 100644 drivers/arm/mm_monitor/Kconfig
 create mode 100644 drivers/arm/mm_monitor/Makefile
 create mode 100644 drivers/arm/mm_monitor/mm_spe.c
 create mode 100644 drivers/arm/mm_monitor/mm_spe.h
 create mode 100644 drivers/arm/mm_monitor/spe-decoder/Makefile
 create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-decoder.c
 create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-decoder.h
 create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-pkt-decoder.c
 create mode 100644 drivers/arm/mm_monitor/spe-decoder/arm-spe-pkt-decoder.h
 create mode 100644 include/linux/mem_sampling.h
 create mode 100644 mm/mem_sampling.c

-- 
2.25.1

[PATCH OLK-6.6 v2] Memory access profiler(SPE) driven NUMA balancing and damon

tags

participants (2)