From: Sean Christopherson seanjc@google.com
stable inclusion from stable-5.10.71 commit 249e5e5a501ee69b26d55b68f863671396e62f7f bugzilla: 182981 https://gitee.com/openeuler/kernel/issues/I4I3KD
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 8646e53633f314e4d746a988240d3b951a92f94a upstream.
Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to transferring to a KVM guest, which is roughly equivalent to an exit to userspace and processes many of the same pending actions. While the task cannot be in an rseq critical section as the KVM path is reachable only by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a critical section still apply, e.g. the current CPU needs to be updated if the task is migrated.
Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults and other badness in userspace VMMs that use rseq in combination with KVM, e.g. due to the CPU ID being stale after task migration.
Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function") Reported-by: Peter Foley pefoley@google.com Bisected-by: Doug Evans dje@google.com Acked-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Shakeel Butt shakeelb@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210901203030.1292304-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [sean: Resolve benign conflict due to unrelated access_ok() check in 5.10] Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chen Jun chenjun102@huawei.com Acked-by: Weilong Chen chenweilong@huawei.com
Signed-off-by: Chen Jun chenjun102@huawei.com --- kernel/entry/kvm.c | 4 +++- kernel/rseq.c | 13 ++++++++++--- 2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c index b6678a5e3cf6..2a3139dab109 100644 --- a/kernel/entry/kvm.c +++ b/kernel/entry/kvm.c @@ -16,8 +16,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work) if (ti_work & _TIF_NEED_RESCHED) schedule();
- if (ti_work & _TIF_NOTIFY_RESUME) + if (ti_work & _TIF_NOTIFY_RESUME) { tracehook_notify_resume(NULL); + rseq_handle_notify_resume(NULL, NULL); + }
ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work); if (ret) diff --git a/kernel/rseq.c b/kernel/rseq.c index a4f86a9d6937..0077713bf240 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -268,9 +268,16 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) return; if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) goto error; - ret = rseq_ip_fixup(regs); - if (unlikely(ret < 0)) - goto error; + /* + * regs is NULL if and only if the caller is in a syscall path. Skip + * fixup and leave rseq_cs as is so that rseq_sycall() will detect and + * kill a misbehaving userspace on debug kernels. + */ + if (regs) { + ret = rseq_ip_fixup(regs); + if (unlikely(ret < 0)) + goto error; + } if (unlikely(rseq_update_cpu_id(t))) goto error; return;