From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.12-rc4 commit 963756aac1f011d904ddd9548ae82286d3a91f96 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBE0E0
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma".
During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit.
For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Boqiao Fu bfu@redhat.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: include/linux/huge_mm.h [ Context conflicts. ] Signed-off-by: Liu Shixin liushixin2@huawei.com --- include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 13 +------------ mm/shmem.c | 7 +------ 3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 27bd7ec3a546..02910e742bb1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -330,6 +330,24 @@ struct thpsize { (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_FILE_MTHP_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + /* + * Explicitly disabled through madvise or prctl, or some + * architectures may disable THP for some mappings, for + * example, s390 kvm. + */ + return (vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3c58e95a33eb..3df13ed40214 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -105,18 +105,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, if (!vma->vm_mm) /* vdso */ return 0;
- /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/shmem.c b/mm/shmem.c index 65553342a16f..f9e48c353a9d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1686,12 +1686,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, loff_t i_size; int order;
- if (vma && ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) - return 0; - - /* If the hardware/firmware marked hugepage support disabled. */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) return 0;
global_huge = shmem_huge_global_enabled(inode, index, write_end,