[PATCH kernel-4.19 14/25] KVM: arm/arm64: Report bringup timeout by a WARN_ON()

22 Dec 2020

From: Zenghui Yu <yuzenghui@huawei.com>

euleros inclusion
category: bugfix
bugzilla: 46842
CVE: NA

-------------------------------------------------

Doing a full D-cache clean will require to go through the
stage2 page table and flush the entries one by one. If a
large number of memory is mapped, we have to flush guest
with hundreds of MB, thus takes a very long time (Linux
timeout during CPU bring).

If timeout in kvm_toggle_cache(), trigger a WARN_ON() like:

[  614.952893] WARNING: CPU: 61 PID: 23370 at arch/arm64/kvm/../../../virt/kvm/arm/mmu.c:2467 kvm_toggle_cache+0x188/0x218
[  614.952902] Modules linked in: ...
[  614.952992] CPU: 61 PID: 23370 Comm: CPU 1/KVM Not tainted 5.1.0-rc4+ #83
[  614.952997] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
[  614.953003] pstate: 60000005 (nZCv daif -PAN -UAO)
[  614.953009] pc : kvm_toggle_cache+0x188/0x218
[  614.953014] lr : kvm_toggle_cache+0x9c/0x218
[  614.953018] sp : ffff00002f1f3910
[  614.953023] x29: ffff00002f1f3910 x28: 0000000000000002
[  614.953029] x27: ffff000011ab7bd8 x26: ffff000011ab5000
[  614.953036] x25: ffff000011ab7bd8 x24: 0000000000000001
[  614.953042] x23: 0000000000000000 x22: ffff80a77a3c0000
[  614.953048] x21: 0000000000000005 x20: ffff000011aab000
[  614.953054] x19: fffffffffffffe97 x18: ffffffffffffffff
[  614.953061] x17: 0000000000000000 x16: 0000000000000000
[  614.953067] x15: ffff000011ab5b48 x14: ffff000012969e88
[  614.953073] x13: ffff000012969adf x12: 0000000005f5e100
[  614.953079] x11: 0000000005f5e0ff x10: 000000000000002a
[  614.953086] x9 : abcc77118461cefd x8 : 6d69742033303036
[  614.953092] x7 : ffff000011ae45b0 x6 : 000000023c75873c
[  614.953099] x5 : 0000000000000000 x4 : 0000000000000001
[  614.953105] x3 : fffffffffffffe80 x2 : 5631d047c2fe0900
[  614.953111] x1 : 0000000000000000 x0 : 0000000100013373
[  614.953118] Call trace:
[  614.953125]  kvm_toggle_cache+0x188/0x218
[  614.953131]  access_vm_reg+0x88/0x110
[  614.953136]  perform_access+0x7c/0x1f0
[  614.953142]  kvm_handle_sys_reg+0x130/0x358
[  614.953147]  handle_exit+0x14c/0x1c8
[  614.953153]  kvm_arch_vcpu_ioctl_run+0x324/0xa40
[  614.953159]  kvm_vcpu_ioctl+0x3c8/0xa30
[  614.953169]  do_vfs_ioctl+0xc4/0x7f0
[  614.953175]  ksys_ioctl+0x8c/0xa0
[  614.953180]  __arm64_sys_ioctl+0x28/0x38
[  614.953187]  el0_svc_handler+0xd8/0x1a0
[  614.953194]  el0_svc+0x8/0xc
[  614.953232] ---[ end trace f036f6168107fdfd ]---

which will help in OM.

This commit introduces no functional changes.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Ying Fang <fangying1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 virt/kvm/arm/mmu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index aef836a93086..3a74cf59f5c8 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2511,6 +2511,7 @@ static bool kvm_need_flush_vm(struct kvm_vcpu *vcpu)
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 {
 	bool now_enabled = vcpu_has_cache_enabled(vcpu);
+	unsigned long timeout = jiffies + HZ;
 
 	/*
 	 * If switching the MMU+caches on, need to invalidate the caches.
@@ -2524,5 +2525,12 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 	if (now_enabled)
 		*vcpu_hcr(vcpu) &= ~HCR_TVM;
 
+	/*
+	 * Guest's APs will fail to online after waiting for 1 second.
+	 * Tell luser about this issue if already timeout here (mostly
+	 * due to the bad cache maintenance performance).
+	 */
+	WARN_ON(time_after(jiffies, timeout));
+
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
-- 
2.25.1