From: Fabiano Rosas farosas@linux.ibm.com
stable inclusion from stable-5.10.67 commit bda5602c1c35fb22847951607776d49352322e47 bugzilla: 182619 https://gitee.com/openeuler/kernel/issues/I4EWO7
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 5d7d6dac8fe99ed59eee2300e4a03370f94d5222 ]
The __kvmhv_copy_tofrom_guest_radix function was introduced along with nested HV guest support. It uses the platform's Radix MMU quadrants to provide a nested hypervisor with fast access to its nested guests memory (H_COPY_TOFROM_GUEST hypercall). It has also since been added as a fast path for the kvmppc_ld/st routines which are used during instruction emulation.
The commit def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()") changed the low level copy function from raw_copy_from_user to probe_user_read, which adds a check to access_ok. In powerpc that is:
static inline bool __access_ok(unsigned long addr, unsigned long size) { return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr; }
and TASK_SIZE_MAX is 0x0010000000000000UL for 64-bit, which means that setting the two MSBs of the effective address (which correspond to the quadrant) now cause access_ok to reject the access.
This was not caught earlier because the most common code path via kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to succeed for L1 guests. For nested guests there is no fallback.
Another issue is that probe_user_read (now __copy_from_user_nofault) does not return the number of bytes not copied in case of failure, so the destination memory is not being cleared anymore in kvmhv_copy_from_guest_radix:
ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n); if (ret > 0) <-- always false! memset(to + (n - ret), 0, ret);
This patch fixes both issues by skipping access_ok and open-coding the low level __copy_to/from_user_inatomic.
Fixes: def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()") Signed-off-by: Fabiano Rosas farosas@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210805212616.2641017-2-farosas@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Chen Jun chenjun102@huawei.com Acked-by: Weilong Chen chenweilong@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index bb35490400e9..04028f905e50 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -64,10 +64,12 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, } isync();
+ pagefault_disable(); if (is_load) - ret = copy_from_user_nofault(to, (const void __user *)from, n); + ret = __copy_from_user_inatomic(to, (const void __user *)from, n); else - ret = copy_to_user_nofault((void __user *)to, from, n); + ret = __copy_to_user_inatomic((void __user *)to, from, n); + pagefault_enable();
/* switch the pid first to avoid running host with unallocated pid */ if (quadrant == 1 && pid != old_pid)