From: Sean Christopherson sean.j.christopherson@intel.com
mainline inclusion from mainline-v5.1-rc1 commit 361209e054a2c9f34da090ee1ee4c1e8bfe76a64 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4MKP4 CVE: NA
--------------------------------
KVM uses bit 0 of the memslots generation as an "update in-progress" flag, which is used by x86 to prevent caching MMIO access while the memslots are changing. Although the intended behavior is flag-like, e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid caching data from in-flux memslots, the implementation oftentimes treats the bit as part of the generation number itself, e.g. incrementing the generation increments twice, once to set the flag and once to clear it.
Prior to commit 4bd518f1598d ("KVM: use separate generations for each address space"), incorporating the "update in-progress" bit into the generation number largely made sense, e.g. "real" generations are even, "bogus" generations are odd, most code doesn't need to be aware of the bit, etc...
Now that unique memslots generation numbers are assigned to each address space, stealthing the in-progress status into the generation number results in a wide variety of subtle code, e.g. kvm_create_vm() jumps over bit 0 when initializing the memslots generation without any hint as to why.
Explicitly define the flag and convert as much code as possible (which isn't much) to actually treat it like a flag. This paves the way for eventually using a different bit for "update in-progress" so that it can be a flag in truth instead of a awkward extension to the generation number.
Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Jackie Liu liuyun01@kylinos.cn Signed-off-by: Laibin Qiu qiulaibin@huawei.com --- arch/x86/kvm/x86.h | 2 +- include/linux/kvm_host.h | 21 +++++++++++++++++++++ virt/kvm/kvm_main.c | 26 +++++++++++++------------- 3 files changed, 35 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 608e5f8c5d0a..40476fa647cf 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -188,7 +188,7 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu, { u64 gen = kvm_memslots(vcpu->kvm)->generation;
- if (unlikely(gen & 1)) + if (unlikely(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS)) return;
/* diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 27e31d972dfb..4ae67f641951 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -49,6 +49,27 @@ */ #define KVM_MEMSLOT_INVALID (1UL << 16)
+/* + * Bit 0 of the memslot generation number is an "update in-progress flag", + * e.g. is temporarily set for the duration of install_new_memslots(). + * This flag effectively creates a unique generation number that is used to + * mark cached memslot data, e.g. MMIO accesses, as potentially being stale, + * i.e. may (or may not) have come from the previous memslots generation. + * + * This is necessary because the actual memslots update is not atomic with + * respect to the generation number update. Updating the generation number + * first would allow a vCPU to cache a spte from the old memslots using the + * new generation number, and updating the generation number after switching + * to the new memslots would allow cache hits using the old generation number + * to reference the defunct memslots. + * + * This mechanism is used to prevent getting hits in KVM's caches while a + * memslot update is in-progress, and to prevent cache hits *after* updating + * the actual generation number against accesses that were inserted into the + * cache *before* the memslots were updated. + */ +#define KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS BIT_ULL(0) + /* Two fragments for cross MMIO pages. */ #define KVM_MAX_MMIO_FRAGMENTS 2
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 78bcd465c4f4..8aecda31801f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -927,30 +927,30 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, int as_id, struct kvm_memslots *slots) { struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id); - u64 gen; + u64 gen = old_memslots->generation;
- /* - * Set the low bit in the generation, which disables SPTE caching - * until the end of synchronize_srcu_expedited. - */ - WARN_ON(old_memslots->generation & 1); - slots->generation = old_memslots->generation + 1; + WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS); + slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
rcu_assign_pointer(kvm->memslots[as_id], slots); synchronize_srcu_expedited(&kvm->srcu);
/* - * Increment the new memslot generation a second time. This prevents - * vm exits that race with memslot updates from caching a memslot - * generation that will (potentially) be valid forever. - * + * Increment the new memslot generation a second time, dropping the + * update in-progress flag and incrementing then generation based on + * the number of address spaces. This provides a unique and easily + * identifiable generation number while the memslots are in flux. + */ + gen = slots->generation & ~KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS; + + /* * Generations must be unique even across address spaces. We do not need * a global counter for that, instead the generation space is evenly split * across address spaces. For example, with two address spaces, address - * space 0 will use generations 0, 4, 8, ... while * address space 1 will + * space 0 will use generations 0, 4, 8, ... while address space 1 will * use generations 2, 6, 10, 14, ... */ - gen = slots->generation + KVM_ADDRESS_SPACE_NUM * 2 - 1; + gen += KVM_ADDRESS_SPACE_NUM * 2;
kvm_arch_memslots_updated(kvm, gen);