From: Alistair Popple apopple@nvidia.com
mainline inclusion from mainline-v6.1-rc1 commit 16ce101db85db694a91380aa4c89b25530871d33 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5VZ0L CVE: CVE-2022-3523
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "Fix several device private page reference counting issues", v2
This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed.
During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory.
This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems.
This patch (of 8):
When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages().
Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not.
[mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.166... Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.166436629... Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Lyude Paul lyude@redhat.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Alex Sierra alex.sierra@amd.com Cc: Ben Skeggs bskeggs@redhat.com Cc: Christian König christian.koenig@amd.com Cc: Dan Williams dan.j.williams@intel.com Cc: David Hildenbrand david@redhat.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: arch/powerpc/kvm/book3s_hv_uvmem.c include/linux/migrate.h lib/test_hmm.c mm/migrate.c Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/powerpc/kvm/book3s_hv_uvmem.c | 19 ++++++++------ include/linux/migrate.h | 9 +++++++ lib/test_hmm.c | 5 ++-- mm/memory.c | 16 +++++++++++- mm/migrate.c | 42 ++++++++++++++++++++---------- 5 files changed, 66 insertions(+), 25 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 3dd58b4ee33e..db17e5f5d431 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -506,10 +506,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm) static int __kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long page_shift, - struct kvm *kvm, unsigned long gpa) + struct kvm *kvm, unsigned long gpa, struct page *fault_page) { unsigned long src_pfn, dst_pfn = 0; - struct migrate_vma mig; + struct migrate_vma mig = { 0 }; struct page *dpage, *spage; struct kvmppc_uvmem_page_pvt *pvt; unsigned long pfn; @@ -523,6 +523,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, mig.dst = &dst_pfn; mig.pgmap_owner = &kvmppc_uvmem_pgmap; mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + mig.fault_page = fault_page;
/* The requested page is already paged-out, nothing to do */ if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL)) @@ -578,12 +579,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, static inline int kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long page_shift, - struct kvm *kvm, unsigned long gpa) + struct kvm *kvm, unsigned long gpa, + struct page *fault_page) { int ret;
mutex_lock(&kvm->arch.uvmem_lock); - ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa); + ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa, + fault_page); mutex_unlock(&kvm->arch.uvmem_lock);
return ret; @@ -632,7 +635,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, pvt->remove_gfn = true;
if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE, - PAGE_SHIFT, kvm, pvt->gpa)) + PAGE_SHIFT, kvm, pvt->gpa, NULL)) pr_err("Can't page out gpa:0x%lx addr:0x%lx\n", pvt->gpa, addr); } else { @@ -735,7 +738,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, bool pagein) { unsigned long src_pfn, dst_pfn = 0; - struct migrate_vma mig; + struct migrate_vma mig = { 0 }; struct page *spage; unsigned long pfn; struct page *dpage; @@ -993,7 +996,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
if (kvmppc_svm_page_out(vmf->vma, vmf->address, vmf->address + PAGE_SIZE, PAGE_SHIFT, - pvt->kvm, pvt->gpa)) + pvt->kvm, pvt->gpa, vmf->page)) return VM_FAULT_SIGBUS; else return 0; @@ -1064,7 +1067,7 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa, if (!vma || vma->vm_start > start || vma->vm_end < end) goto out;
- if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa)) + if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa, NULL)) ret = H_SUCCESS; out: mmap_read_unlock(kvm->mm); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 0f8d1583fa8e..a9de6d3ae07d 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -36,6 +36,9 @@ extern const char *migrate_reason_names[MR_TYPES]; #ifdef CONFIG_MIGRATION
extern void putback_movable_pages(struct list_head *l); +extern int migrate_page_extra(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode, int extra_count); extern int migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode); @@ -190,6 +193,12 @@ struct migrate_vma { */ void *pgmap_owner; unsigned long flags; + + /* + * Set to vmf->page if this is being called to migrate a page as part of + * a migrate_to_ram() callback. + */ + struct page *fault_page; };
int migrate_vma_setup(struct migrate_vma *args); diff --git a/lib/test_hmm.c b/lib/test_hmm.c index a85613068d60..58d1e8c41889 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -671,7 +671,7 @@ static int dmirror_migrate(struct dmirror *dmirror, unsigned long src_pfns[64]; unsigned long dst_pfns[64]; struct dmirror_bounce bounce; - struct migrate_vma args; + struct migrate_vma args = { 0 }; unsigned long next; int ret;
@@ -1048,7 +1048,7 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) { - struct migrate_vma args; + struct migrate_vma args = { 0 }; unsigned long src_pfns; unsigned long dst_pfns; struct page *rpage; @@ -1071,6 +1071,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) args.dst = &dst_pfns; args.pgmap_owner = dmirror->mdevice; args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + args.fault_page = vmf->page;
if (migrate_vma_setup(&args)) return VM_FAULT_SIGBUS; diff --git a/mm/memory.c b/mm/memory.c index 3667ec456ace..14778b665982 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3383,7 +3383,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->address); } else if (is_device_private_entry(entry)) { vmf->page = device_private_entry_to_page(entry); - ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + spin_unlock(vmf->ptl); + goto out; + } + + /* + * Get a page reference while we know the page can't be + * freed. + */ + get_page(vmf->page); + pte_unmap_unlock(vmf->pte, vmf->ptl); + vmf->page->pgmap->ops->migrate_to_ram(vmf); + put_page(vmf->page); } else if (is_hwpoison_entry(entry)) { ret = VM_FAULT_HWPOISON; } else { diff --git a/mm/migrate.c b/mm/migrate.c index 6cd51f3817b6..ebbc34d7c509 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -689,21 +689,15 @@ EXPORT_SYMBOL(migrate_page_copy); * Migration functions ***********************************************************/
-/* - * Common logic to directly migrate a single LRU page suitable for - * pages that do not use PagePrivate/PagePrivate2. - * - * Pages are locked upon entry and exit. - */ -int migrate_page(struct address_space *mapping, +int migrate_page_extra(struct address_space *mapping, struct page *newpage, struct page *page, - enum migrate_mode mode) + enum migrate_mode mode, int extra_count) { int rc;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
- rc = migrate_page_move_mapping(mapping, newpage, page, 0); + rc = migrate_page_move_mapping(mapping, newpage, page, extra_count);
if (rc != MIGRATEPAGE_SUCCESS) return rc; @@ -714,6 +708,19 @@ int migrate_page(struct address_space *mapping, migrate_page_states(newpage, page); return MIGRATEPAGE_SUCCESS; } + +/* + * Common logic to directly migrate a single LRU page suitable for + * pages that do not use PagePrivate/PagePrivate2. + * + * Pages are locked upon entry and exit. + */ +int migrate_page(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode) +{ + return migrate_page_extra(mapping, newpage, page, mode, 0); +} EXPORT_SYMBOL(migrate_page);
#ifdef CONFIG_BLOCK @@ -2524,14 +2531,14 @@ static void migrate_vma_collect(struct migrate_vma *migrate) * migrate_page_move_mapping(), except that here we allow migration of a * ZONE_DEVICE page. */ -static bool migrate_vma_check_page(struct page *page) +static bool migrate_vma_check_page(struct page *page, struct page *fault_page) { /* * One extra ref because caller holds an extra reference, either from * isolate_lru_page() for a regular page, or migrate_vma_collect() for * a device page. */ - int extra = 1; + int extra = 1 + (page == fault_page);
/* * FIXME support THP (transparent huge page), it is bit more complex to @@ -2639,7 +2646,7 @@ static void migrate_vma_prepare(struct migrate_vma *migrate) put_page(page); }
- if (!migrate_vma_check_page(page)) { + if (!migrate_vma_check_page(page, migrate->fault_page)) { if (remap) { migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; migrate->cpages--; @@ -2707,7 +2714,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) goto restore; }
- if (migrate_vma_check_page(page)) + if (migrate_vma_check_page(page, migrate->fault_page)) continue;
restore: @@ -2817,6 +2824,8 @@ int migrate_vma_setup(struct migrate_vma *args) return -EINVAL; if (!args->src || !args->dst) return -EINVAL; + if (args->fault_page && !is_device_private_page(args->fault_page)) + return -EINVAL;
memset(args->src, 0, sizeof(*args->src) * nr_pages); args->cpages = 0; @@ -3047,7 +3056,12 @@ void migrate_vma_pages(struct migrate_vma *migrate) } }
- r = migrate_page(mapping, newpage, page, MIGRATE_SYNC_NO_COPY); + if (migrate->fault_page == page) + r = migrate_page_extra(mapping, newpage, page, + MIGRATE_SYNC_NO_COPY, 1); + else + r = migrate_page(mapping, newpage, page, + MIGRATE_SYNC_NO_COPY); if (r != MIGRATEPAGE_SUCCESS) migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; }