New subject: [PATCH openEuler-25.03 1/1] mm: Support NUMA-aware replication of read-only data and translation tables of user space applications

26 Dec 2025

      This patchset implements support of userspace translation tables and
private read-only data replication for AArch64 and is going to
improve latency and memory bandwidth by reducing cross-NUMA memory accesses.

openEuler 25.03 is used as a baseline.

Current implementation supports next functionality:
1. Per-NUMA node replication of userspace translation tables and private read-only data.
We replicate only __private read-only__ data to avoid dealing with
replicas coherence and consistency support. Translation tables, in turn, are able to be
replicated for any kind of underlying data.
2. Ability to enable userspace replication for a certain process via procfs or
for a group of processes via memory cgroup.
3. 4K and 64K pages are supported.
4. Replicated data pages can't be a ksm, migration or swap/reclaim candidates by design.
But for other pages these work as well with replicated translation tables support.

Once the user replication was enabled for a process via either procfs or memory cgroup,
all it's existing private read-only data will be immediately replicated
with translation tables for them. Later, as the process running, __any__ page fault occured
will cause replicating of translation tables related to the faulted address.
Also there is a mechanism implemented on the top of numa-balancer that
will replicate private read-only pages on NUMA faults, as the process running
(numa balancer should be enabled for the mechanism to work).

Known problems:
1. Current implementation doesn't support huge pages,
so you have to build the kernel with huge pages disabled for user replication to work.
Huge pages support will be added in the nearest future.
2. mremap syscall doesn't work with replicated memory yet.
3. page_idle, uprobes and userfaultfd support replicated translation tables,
but not replicated data. Be responsible using these features with userspace replication enabled.
4. When replicating translation tables during page faults,
there should be enough space on __each__ NUMA node for table allocations.
Otherwise it will cause OOM-killer.

Despite the problems above, they are mostly not related to workloads assumed
to benefit from user replication feature,
and such workloads will work properly with the feature enabled.

Nikita Panov (1):
  mm: Support NUMA-aware replication of read-only data and translation
    tables of user space applications

 arch/arm64/include/asm/numa_replication.h |    3 +
 arch/arm64/mm/init.c                      |    2 +-
 arch/arm64/mm/pgd.c                       |   13 +-
 fs/exec.c                                 |   18 +
 fs/proc/base.c                            |   76 +
 fs/proc/task_mmu.c                        |  111 +-
 include/asm-generic/pgalloc.h             |   19 +-
 include/asm-generic/tlb.h                 |   22 +
 include/linux/cgroup.h                    |    1 +
 include/linux/gfp_types.h                 |   12 +-
 include/linux/memcontrol.h                |    4 +
 include/linux/mm.h                        |   75 +-
 include/linux/mm_inline.h                 |    2 +-
 include/linux/mm_types.h                  |   49 +-
 include/linux/numa_kernel_replication.h   |  185 ++-
 include/linux/numa_user_replication.h     |  738 ++++++++++
 include/linux/page-flags.h                |   18 +-
 include/trace/events/mmflags.h            |   10 +-
 include/uapi/asm-generic/mman-common.h    |    2 +
 kernel/cgroup/cgroup.c                    |    2 +-
 kernel/events/uprobes.c                   |    5 +-
 kernel/fork.c                             |   41 +
 kernel/sched/fair.c                       |    8 +-
 mm/Kconfig                                |   13 +
 mm/Makefile                               |    1 +
 mm/gup.c                                  |    3 +-
 mm/ksm.c                                  |   15 +-
 mm/madvise.c                              |   18 +-
 mm/memcontrol.c                           |  137 +-
 mm/memory.c                               |  544 +++++--
 mm/mempolicy.c                            |    5 +
 mm/migrate.c                              |   11 +-
 mm/migrate_device.c                       |   17 +-
 mm/mlock.c                                |   31 +
 mm/mmap.c                                 |   32 +
 mm/mmu_gather.c                           |   55 +-
 mm/mprotect.c                             |  409 +++---
 mm/mremap.c                               |   97 +-
 mm/numa_kernel_replication.c              |    4 +-
 mm/numa_user_replication.c                | 1577 +++++++++++++++++++++
 mm/page_alloc.c                           |    8 +-
 mm/page_idle.c                            |    3 +-
 mm/page_vma_mapped.c                      |    3 +-
 mm/rmap.c                                 |   41 +-
 mm/swap.c                                 |    7 +-
 mm/swapfile.c                             |    3 +-
 mm/userfaultfd.c                          |    7 +-
 mm/userswap.c                             |   11 +-
 48 files changed, 4049 insertions(+), 419 deletions(-)
 create mode 100644 include/linux/numa_user_replication.h
 create mode 100644 mm/numa_user_replication.c

-- 
2.34.1

[PATCH openEuler-25.03 0/1] NUMA ro-data replication for userspace applications

Nikita Panov

Nikita Panov

patchwork bot

tags

participants (2)