From: Ryan Roberts ryan.roberts@arm.com
mainline inclusion from mainline-v6.9-rc1 commit c6ec76a2ebc5829e5826b218d2e1475ec11b333e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9CUEQ CVE: NA
-------------------------------------------------
Some architectures (e.g. arm64) can tell from looking at a pte, if some follow-on ptes also map contiguous physical memory with the same pgprot. (for arm64, these are contpte mappings).
Take advantage of this knowledge to optimize folio_pte_batch() so that it can skip these ptes when scanning to create a batch. By default, if an arch does not opt-in, folio_pte_batch() returns a compile-time 1, so the changes are optimized out and the behaviour is as before.
arm64 will opt-in to providing this hint in the next patch, which will greatly reduce the cost of ptep_get() when scanning a range of contptes.
Link: https://lkml.kernel.org/r/20240215103205.2607016-16-ryan.roberts@arm.com Signed-off-by: Ryan Roberts ryan.roberts@arm.com Acked-by: David Hildenbrand david@redhat.com Tested-by: John Hubbard jhubbard@nvidia.com Cc: Alistair Popple apopple@nvidia.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Ard Biesheuvel ardb@kernel.org Cc: Barry Song 21cnbao@gmail.com Cc: Borislav Petkov (AMD) bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: James Morse james.morse@arm.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Marc Zyngier maz@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit c6ec76a2ebc5829e5826b218d2e1475ec11b333e) Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com --- include/linux/pgtable.h | 21 +++++++++++++++++++++ mm/memory.c | 19 ++++++++++++------- 2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 636ed1edf952..87cd26a48038 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -205,6 +205,27 @@ static inline int pmd_young(pmd_t pmd) #define arch_flush_lazy_mmu_mode() do {} while (0) #endif
+#ifndef pte_batch_hint +/** + * pte_batch_hint - Number of pages that can be added to batch without scanning. + * @ptep: Page table pointer for the entry. + * @pte: Page table entry. + * + * Some architectures know that a set of contiguous ptes all map the same + * contiguous memory with the same permissions. In this case, it can provide a + * hint to aid pte batching without the core code needing to scan every pte. + * + * An architecture implementation may ignore the PTE accessed state. Further, + * the dirty state must apply atomically to all the PTEs described by the hint. + * + * May be overridden by the architecture, else pte_batch_hint is always 1. + */ +static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte) +{ + return 1; +} +#endif + #ifndef pte_advance_pfn static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr) { diff --git a/mm/memory.c b/mm/memory.c index 283fff8ba901..0f463c803d4e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -993,16 +993,20 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, { unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); const pte_t *end_ptep = start_ptep + max_nr; - pte_t expected_pte = __pte_batch_clear_ignored(pte_next_pfn(pte), flags); - pte_t *ptep = start_ptep + 1; + pte_t expected_pte, *ptep; bool writable; + int nr;
if (any_writable) *any_writable = false;
VM_WARN_ON_FOLIO(!pte_present(pte), folio);
- while (ptep != end_ptep) { + nr = pte_batch_hint(start_ptep, pte); + expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags); + ptep = start_ptep + nr; + + while (ptep < end_ptep) { pte = ptep_get(ptep); if (any_writable) writable = !!pte_write(pte); @@ -1016,17 +1020,18 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, * corner cases the next PFN might fall into a different * folio. */ - if (pte_pfn(pte) == folio_end_pfn) + if (pte_pfn(pte) >= folio_end_pfn) break;
if (any_writable) *any_writable |= writable;
- expected_pte = pte_next_pfn(expected_pte); - ptep++; + nr = pte_batch_hint(ptep, pte); + expected_pte = pte_advance_pfn(expected_pte, nr); + ptep += nr; }
- return ptep - start_ptep; + return min(ptep - start_ptep, max_nr); }
/*