From: Claudio Imbrenda imbrenda@linux.ibm.com
stable inclusion from stable-v6.6.11 commit 42db0099eca3bee8459b8c0121975dc8d7cc1c75 bugzilla: https://gitee.com/openeuler/kernel/issues/I99TJK
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 80aea01c48971a1fffc0252d036995572d84950d ]
When the host invalidates a guest page, it will also check if the page was used to map the prefix of any guest CPUs, in which case they are stopped and marked as needing a prefix refresh. Upon starting the affected CPUs again, their prefix pages are explicitly faulted in and revalidated if they had been invalidated. A bit in the PGSTEs indicates whether or not a page might contain a prefix. The bit is allowed to overindicate. Pages above 2G are skipped, because they cannot be prefixes, since KVM runs all guests with MSO = 0.
The same applies for nested guests (VSIE). When the host invalidates a guest page that maps the prefix of the nested guest, it has to stop the affected nested guest CPUs and mark them as needing a prefix refresh. The same PGSTE bit used for the guest prefix is also used for the nested guest. Pages above 2G are skipped like for normal guests, which is the source of the bug.
The nested guest runs is the guest primary address space. The guest could be running the nested guest using MSO != 0. If the MSO + prefix for the nested guest is above 2G, the check for nested prefix will skip it. This will cause the invalidation notifier to not stop the CPUs of the nested guest and not mark them as needing refresh. When the nested guest is run again, its prefix will not be refreshed, since it has not been marked for refresh. This will cause a fatal validity intercept with VIR code 37.
Fix this by removing the check for 2G for nested guests. Now all invalidations of pages with the notify bit set will always scan the existing VSIE shadow state descriptors.
This allows to catch invalidations of nested guest prefix mappings even when the prefix is above 2G in the guest virtual address space.
Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Tested-by: Nico Boehr nrb@linux.ibm.com Reviewed-by: Nico Boehr nrb@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Message-ID: 20231102153549.53984-1-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: ZhangPeng zhangpeng362@huawei.com --- arch/s390/kvm/vsie.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 61499293c2ac..e55f489e1fb7 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -587,10 +587,6 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start,
if (!gmap_is_shadow(gmap)) return; - if (start >= 1UL << 31) - /* We are only interested in prefix pages */ - return; - /* * Only new shadow blocks are added to the list during runtime, * therefore we can safely reference them all the time.