KVM/arm updates for openEuler.
Andrew Murray (10): arm64: arm_pmu: Remove unnecessary isb instruction arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data arm64: KVM: Add accessors to track guest/host only counters arm64: arm_pmu: Add !VHE support for exclude_host/exclude_guest attributes arm64: KVM: Enable !VHE support for :G/:H perf event modifiers arm64: KVM: Enable VHE support for :G/:H perf event modifiers arm64: KVM: Avoid isb's by using direct pmxevtyper sysreg arm64: docs: Document perf event attributes arm64: KVM: Fix perf cycle counter support for VHE KVM: arm/arm64: Re-create event when setting counter value
Christoffer Dall (8): KVM: arm64: Safety check PSTATE when entering guest and handle IL KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELS KVM: arm/arm64: vgic: Consider priority and active state for pending irq KVM: arm/arm64: Fixup the kvm_exit tracepoint KVM: arm/arm64: Remove arch timer workqueue KVM: arm64: Make vcpu const in vcpu_read_sys_reg KVM: arm/arm64: Fix unintended stage 2 PMD mappings KVM: arm/arm64: Simplify bg_timer programming
Eric Auger (1): KVM: arm/arm64: vgic: Use a single IO device per redistributor
Feng Tiantian (1): fbcon: fix ypos over boundary issue
Heyi Guo (2): irqchip/gic-v3-its: Set VPENDING table as inner-shareable kvm/vgic-its: flush pending LPIs when nuking DT
James Morse (1): KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation
Julien Thierry (2): KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlock
Kristina Martsenko (1): vgic: Add support for 52bit guest physical address
Marc Zyngier (21): arm64: KVM: Add trapped system register access tracepoint arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macro arm: KVM: Add S2_PMD_{MASK, SIZE} constants arm: KVM: Add missing kvm_stage2_has_pmd() helper arm/arm64: KVM: Statically configure the host's view of MPIDR arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is enabled KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIs KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register KVM: arm/arm64: vgic: Add LPI translation cache definition KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitive KVM: arm/arm64: vgic-its: Add MSI-LPI translation cache invalidation KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on specific commands KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on disabling LPIs KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on ITS disable KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on vgic teardown KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translation KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI injection KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomic KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence irqchip/gic-v3-its: Make vlpi_lock a spinlock KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE
Mark Rutland (1): KVM: arm/arm64: Log PSTATE for unhandled sysregs
Punit Agrawal (8): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault KVM: arm/arm64: Introduce helpers to manipulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose (20): kvm: arm/arm64: Remove spurious WARN_ON kvm: arm64: Add helper for loading the stage2 setting for a VM arm64: Add a helper for PARange to physical shift conversion kvm: arm64: Clean up VTCR_EL2 initialisation kvm: arm/arm64: Allow arch specific configurations for VM kvm: arm64: Configure VTCR_EL2 per VM kvm: arm/arm64: Prepare for VM specific stage2 translations kvm: arm64: Prepare for dynamic stage2 page table layout kvm: arm64: Make stage2 page table layout dynamic kvm: arm64: Dynamic configuration of VTTBR mask kvm: arm64: Configure VTCR_EL2.SL0 per VM kvm: arm64: Switch to per VM IPA limit kvm: arm64: Add 52bit support for PAR to HPFAR conversoin kvm: arm64: Set a limit on the IPA size kvm: arm64: Limit the minimum number of page table levels kvm: arm64: Allow tuning the physical address size for VM KVM: arm64: Relax the restriction on using stage2 PUD huge mapping KVM: arm/arm64: Enforce PTE mappings at stage2 when needed KVM: arm/arm64: Fix handling of stage2 huge mappings kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
Xiangyou Xie (2): KVM: arm/arm64: vgic-its: Introduce multiple LPI translation caches KVM: arm/arm64: vgic-its: Do not execute invalidate MSI-LPI translation cache on movi command
Yiwen Jiang (1): kvm: arm/arm64: add irqsave for lpi_cache_lock
Zenghui Yu (7): KVM: arm/arm64: Add support for probing Hisi ncsnp capability KVM: arm/arm64: Adjust entry/exit and trap related tracepoints perf tools arm64: Add support for get_cpuid() function perf, kvm/arm64: Add stat support on arm64 perf, kvm/arm64: perf-kvm-stat to report VM TRAP KVM: arm/arm64: Avoid the unnecessary stage 2 translation faults KVM: arm/arm64: Only probe CPU type and ncsnp info in hypervisor
honghao (1): KVM: arm/arm64: Probe Hisi CPU TYPE from ACPI/DTB
Documentation/arm64/perf.txt | 85 ++++ Documentation/virtual/kvm/api.txt | 43 +- arch/arm/include/asm/hisi_cpu_model.h | 18 + arch/arm/include/asm/kvm_arm.h | 4 +- arch/arm/include/asm/kvm_asm.h | 4 + arch/arm/include/asm/kvm_host.h | 25 +- arch/arm/include/asm/kvm_mmu.h | 76 ++- arch/arm/include/asm/stage2_pgtable.h | 69 ++- arch/arm/include/uapi/asm/kvm.h | 4 +- arch/arm/kvm/coproc.c | 4 +- arch/arm/kvm/hyp/cp15-sr.c | 1 - arch/arm64/include/asm/cpufeature.h | 21 + arch/arm64/include/asm/hisi_cpu_model.h | 21 + arch/arm64/include/asm/kvm_arm.h | 160 ++++-- arch/arm64/include/asm/kvm_asm.h | 13 +- arch/arm64/include/asm/kvm_host.h | 61 ++- arch/arm64/include/asm/kvm_hyp.h | 10 + arch/arm64/include/asm/kvm_mmu.h | 84 +++- arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + arch/arm64/include/asm/ptrace.h | 3 + arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 -- arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 -- arch/arm64/include/asm/stage2_pgtable.h | 242 ++++++--- arch/arm64/include/uapi/asm/kvm.h | 4 +- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/perf_event.c | 50 +- arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/handle_exit.c | 10 + arch/arm64/kvm/hyp/Makefile | 1 - arch/arm64/kvm/hyp/hyp-entry.S | 16 +- arch/arm64/kvm/hyp/s2-setup.c | 90 ---- arch/arm64/kvm/hyp/switch.c | 49 +- arch/arm64/kvm/hyp/sysreg-sr.c | 20 +- arch/arm64/kvm/hyp/tlb.c | 4 +- arch/arm64/kvm/pmu.c | 201 ++++++++ arch/arm64/kvm/reset.c | 103 ++++ arch/arm64/kvm/sys_regs.c | 15 +- arch/arm64/kvm/sys_regs.h | 4 + arch/arm64/kvm/trace.h | 70 +++ drivers/irqchip/irq-gic-v3-its.c | 20 +- drivers/video/fbdev/core/fbcon.h | 5 +- include/kvm/arm_arch_timer.h | 7 - include/kvm/arm_vgic.h | 17 +- include/linux/irqchip/arm-gic-v3.h | 8 + include/uapi/linux/kvm.h | 11 + tools/perf/arch/arm64/Makefile | 2 + tools/perf/arch/arm64/util/Build | 1 + tools/perf/arch/arm64/util/aarch64_guest_exits.h | 95 ++++ tools/perf/arch/arm64/util/header.c | 74 +-- tools/perf/arch/arm64/util/kvm-stat.c | 112 +++++ virt/kvm/arm/arch_timer.c | 67 +-- virt/kvm/arm/arm.c | 41 +- virt/kvm/arm/hisi_cpu_model.c | 116 +++++ virt/kvm/arm/hyp/vgic-v3-sr.c | 4 +- virt/kvm/arm/mmu.c | 592 ++++++++++++++++------- virt/kvm/arm/pmu.c | 42 +- virt/kvm/arm/trace.h | 27 +- virt/kvm/arm/vgic/vgic-debug.c | 4 +- virt/kvm/arm/vgic/vgic-init.c | 18 +- virt/kvm/arm/vgic/vgic-irqfd.c | 36 +- virt/kvm/arm/vgic/vgic-its.c | 356 ++++++++++++-- virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- virt/kvm/arm/vgic/vgic-mmio-v2.c | 14 +- virt/kvm/arm/vgic/vgic-mmio-v3.c | 100 ++-- virt/kvm/arm/vgic/vgic-mmio.c | 34 +- virt/kvm/arm/vgic/vgic-v2.c | 4 +- virt/kvm/arm/vgic/vgic-v3.c | 8 +- virt/kvm/arm/vgic/vgic.c | 170 ++++--- virt/kvm/arm/vgic/vgic.h | 6 + virt/kvm/kvm_main.c | 6 +- 71 files changed, 2807 insertions(+), 875 deletions(-) create mode 100644 Documentation/arm64/perf.txt create mode 100644 arch/arm/include/asm/hisi_cpu_model.h create mode 100644 arch/arm64/include/asm/hisi_cpu_model.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c create mode 100644 arch/arm64/kvm/pmu.c create mode 100644 tools/perf/arch/arm64/util/aarch64_guest_exits.h create mode 100644 tools/perf/arch/arm64/util/kvm-stat.c create mode 100644 virt/kvm/arm/hisi_cpu_model.c
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 7788a28062aca211979ecd2c334e06e2ef5281cc category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
On a 4-level page table pgd entry can be empty, unlike a 3-level page table. Remove the spurious WARN_ON() in stage_get_pud().
Acked-by: Christoffer Dall cdall@kernel.org Acked-by: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index bf330b4..3ec643d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1005,7 +1005,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pud_t *pud;
pgd = kvm->arch.pgd + stage2_pgd_index(addr); - if (WARN_ON(stage2_pgd_none(*pgd))) { + if (stage2_pgd_none(*pgd)) { if (!cache) return NULL; pud = mmu_memory_cache_alloc(cache);
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 9f98ddd6686cc9469fb73b11ddd403271d65cbdf category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
We load the stage2 context of a guest for different operations, including running the guest and tlb maintenance on behalf of the guest. As of now only the vttbr is private to the guest, but this is about to change with IPA per VM. Add a helper to load the stage2 configuration for a VM, which could do the right thing with the future changes.
Cc: Christoffer Dall cdall@kernel.org Cc: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit c987876a80e7 upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_hyp.h | 9 +++++++++ arch/arm64/kvm/hyp/switch.c | 2 +- arch/arm64/kvm/hyp/tlb.c | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index f150c0f..d47526d 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -155,5 +155,14 @@ u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt); void __noreturn __hyp_do_panic(unsigned long, ...);
+/* + * Must be called from hyp code running at EL2 with an updated VTTBR + * and interrupts disabled. + */ +static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) +{ + write_sysreg(kvm->arch.vttbr, vttbr_el2); +} + #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 6aff2c3..d82af97 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -202,7 +202,7 @@ void deactivate_traps_vhe_put(void)
static void __hyp_text __activate_vm(struct kvm *kvm) { - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); }
static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c index c041eab..7fcc9c1 100644 --- a/arch/arm64/kvm/hyp/tlb.c +++ b/arch/arm64/kvm/hyp/tlb.c @@ -35,7 +35,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm, * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so * let's flip TGE before executing the TLB operation. */ - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); val = read_sysreg(hcr_el2); val &= ~HCR_TGE; write_sysreg(val, hcr_el2); @@ -45,7 +45,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm, static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm, unsigned long *flags) { - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); isb(); }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit ce00e3cb4fb496683708db6bfce470e5c7710ddc category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical Address range supported by the CPU. Add a helper to decode this to actual physical shift. If we hit an unallocated value, return the maximum range supported by the kernel. This will be used by KVM to set the VTCR_EL2.T0SZ, as it is about to move its place. Having this helper keeps the code movement cleaner.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: James Morse james.morse@arm.com Cc: Christoffer Dall cdall@kernel.org Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/cpufeature.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9f1946a..103ce1d 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -543,6 +543,27 @@ static inline int arm64_get_ssbd_state(void)
void arm64_set_ssbd_mitigation(bool state);
+static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) +{ + switch (parange) { + case 0: return 32; + case 1: return 36; + case 2: return 40; + case 3: return 42; + case 4: return 44; + case 5: return 48; + case 6: return 52; + /* + * A future PE could use a value unknown to the kernel. + * However, by the "D10.1.4 Principles of the ID scheme + * for fields in ID registers", ARM DDI 0487C.a, any new + * value is guaranteed to be higher than what we know already. + * As a safe limit, we return the limit supported by the kernel. + */ + default: return CONFIG_ARM64_PA_BITS; + } +} + #endif /* __ASSEMBLY__ */
#endif
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit b2df44ffba363bd90274340e0adfd8137f5cd878 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Use the new helper for converting the parange to the physical shift. Also, add the missing definitions for the VTCR_EL2 register fields and use them instead of hard coding numbers.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit df655b75c43f upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 3 +++ arch/arm64/kvm/hyp/s2-setup.c | 34 ++++++++-------------------------- 2 files changed, 11 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 8b284cb..cc43ff9 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -110,6 +110,7 @@ #define VTCR_EL2_RES1 (1U << 31) #define VTCR_EL2_HD (1 << 22) #define VTCR_EL2_HA (1 << 21) +#define VTCR_EL2_PS_SHIFT TCR_EL2_PS_SHIFT #define VTCR_EL2_PS_MASK TCR_EL2_PS_MASK #define VTCR_EL2_TG0_MASK TCR_TG0_MASK #define VTCR_EL2_TG0_4K TCR_TG0_4K @@ -130,6 +131,8 @@ #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
+#define VTCR_EL2_T0SZ(x) TCR_T0SZ(x) + /* * We configure the Stage-2 page tables to always restrict the IPA space to be * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c index 603e1ee..e1ca672 100644 --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -19,45 +19,27 @@ #include <asm/kvm_arm.h> #include <asm/kvm_asm.h> #include <asm/kvm_hyp.h> +#include <asm/cpufeature.h>
u32 __hyp_text __init_stage2_translation(void) { u64 val = VTCR_EL2_FLAGS; u64 parange; + u32 phys_shift; u64 tmp;
/* * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS - * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, while - * PS is only 3. Fortunately, bit 19 is RES0 in VTCR_EL2... + * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, but the + * allocated values are limited to 3bits. */ parange = read_sysreg(id_aa64mmfr0_el1) & 7; if (parange > ID_AA64MMFR0_PARANGE_MAX) parange = ID_AA64MMFR0_PARANGE_MAX; - val |= parange << 16; + val |= parange << VTCR_EL2_PS_SHIFT;
/* Compute the actual PARange... */ - switch (parange) { - case 0: - parange = 32; - break; - case 1: - parange = 36; - break; - case 2: - parange = 40; - break; - case 3: - parange = 42; - break; - case 4: - parange = 44; - break; - case 5: - default: - parange = 48; - break; - } + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
/* * ... and clamp it to 40 bits, unless we have some braindead @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void) * return that value for the rest of the kernel to decide what * to do. */ - val |= 64 - (parange > 40 ? 40 : parange); + val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift);
/* * Check the availability of Hardware Access Flag / Dirty Bit @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
write_sysreg(val, vtcr_el2);
- return parange; + return phys_shift; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 5b6c6742b5350a6fb5c631fb99a6bc046a62739c category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Allow the arch backends to perform VM specific initialisation. This will be later used to handle IPA size configuration and per-VM VTCR configuration on arm64.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit e761a927bc9a upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_host.h | 7 +++++++ arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/reset.c | 7 +++++++ virt/kvm/arm/arm.c | 5 +++-- 4 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index d0d0227..6b5a519 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -364,4 +364,11 @@ static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {} struct kvm *kvm_arch_alloc_vm(void); void kvm_arch_free_vm(struct kvm *kvm);
+static inline int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) +{ + if (type) + return -EINVAL; + return 0; +} + #endif /* __ARM_KVM_HOST_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index b3a5c49..d471ee8 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -551,4 +551,6 @@ static inline int kvm_arm_have_ssbd(void) struct kvm *kvm_arch_alloc_vm(void); void kvm_arch_free_vm(struct kvm *kvm);
+int kvm_arm_config_vm(struct kvm *kvm, unsigned long type); + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 0688816..848d56b 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -179,3 +179,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) preempt_enable(); return ret; } + +int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) +{ + if (type) + return -EINVAL; + return 0; +} diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index d982650..d55659d 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -120,8 +120,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { int ret, cpu;
- if (type) - return -EINVAL; + ret = kvm_arm_config_vm(kvm, type); + if (ret) + return ret;
kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran)); if (!kvm->arch.last_vcpu_ran)
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 7665f3a8491b0ed3c6f65c0bc3a5424ea8f87731 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Add support for setting the VTCR_EL2 per VM, rather than hard coding a value at boot time per CPU. This would allow us to tune the stage2 page table parameters per VM in the later changes.
We compute the VTCR fields based on the system wide sanitised feature registers, except for the hardware management of Access Flags (VTCR_EL2.HA). It is fine to run a system with a mix of CPUs that may or may not update the page table Access Flags. Since the bit is RES0 on CPUs that don't support it, the bit should be ignored on them.
Suggested-by: Marc Zyngier marc.zyngier@arm.com Acked-by: Christoffer Dall cdall@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit e761a927bc9a upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 3 +- arch/arm64/include/asm/kvm_asm.h | 2 -- arch/arm64/include/asm/kvm_host.h | 12 ++++--- arch/arm64/include/asm/kvm_hyp.h | 1 + arch/arm64/kvm/hyp/Makefile | 1 - arch/arm64/kvm/hyp/s2-setup.c | 72 --------------------------------------- arch/arm64/kvm/reset.c | 35 +++++++++++++++++++ 7 files changed, 45 insertions(+), 81 deletions(-) delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index cc43ff9..de37143 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -138,8 +138,7 @@ * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are * not known to exist and will break with this configuration. * - * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time - * (see hyp-init.S). + * The VTCR_EL2 is configured per VM and is initialised in kvm_arm_config_vm(). * * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 102b5a5..0b53c72 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -72,8 +72,6 @@
extern u32 __kvm_get_mdcr_el2(void);
-extern u32 __init_stage2_translation(void); - /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ #define __hyp_this_cpu_ptr(sym) \ ({ \ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index d471ee8..73f264d 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -63,11 +63,13 @@ struct kvm_arch { u64 vmid_gen; u32 vmid;
- /* 1-level 2nd stage table, protected by kvm->mmu_lock */ + /* stage2 entry level table */ pgd_t *pgd;
/* VTTBR value associated with above pgd and vmid */ u64 vttbr; + /* VTCR_EL2 value for this VM */ + u64 vtcr;
/* The last vcpu id that ran on each physical CPU */ int __percpu *last_vcpu_ran; @@ -465,10 +467,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
static inline void __cpu_init_stage2(void) { - u32 parange = kvm_call_hyp(__init_stage2_translation); + u32 ps;
- WARN_ONCE(parange < 40, - "PARange is %d bits, unsupported configuration!", parange); + /* Sanity check for minimum IPA size support */ + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7); + WARN_ONCE(ps < 40, + "PARange is %d bits, unsupported configuration!", ps); }
/* Guest/host FPSIMD coordination helpers */ diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index d47526d..c7113bf 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -161,6 +161,7 @@ */ static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) { + write_sysreg(kvm->arch.vtcr, vtcr_el2); write_sysreg(kvm->arch.vttbr, vttbr_el2); }
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile index feef06f..ea710f6 100644 --- a/arch/arm64/kvm/hyp/Makefile +++ b/arch/arm64/kvm/hyp/Makefile @@ -20,7 +20,6 @@ obj-$(CONFIG_KVM_ARM_HOST) += switch.o obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o obj-$(CONFIG_KVM_ARM_HOST) += tlb.o obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o -obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o
# KVM code is run at a different exception code with a different map, so # compiler instrumentation that inserts callbacks or checks into the code may diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c deleted file mode 100644 index e1ca672..00000000 --- a/arch/arm64/kvm/hyp/s2-setup.c +++ /dev/null @@ -1,72 +0,0 @@ -/* - * Copyright (C) 2016 - ARM Ltd - * Author: Marc Zyngier marc.zyngier@arm.com - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ - -#include <linux/types.h> -#include <asm/kvm_arm.h> -#include <asm/kvm_asm.h> -#include <asm/kvm_hyp.h> -#include <asm/cpufeature.h> - -u32 __hyp_text __init_stage2_translation(void) -{ - u64 val = VTCR_EL2_FLAGS; - u64 parange; - u32 phys_shift; - u64 tmp; - - /* - * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS - * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, but the - * allocated values are limited to 3bits. - */ - parange = read_sysreg(id_aa64mmfr0_el1) & 7; - if (parange > ID_AA64MMFR0_PARANGE_MAX) - parange = ID_AA64MMFR0_PARANGE_MAX; - val |= parange << VTCR_EL2_PS_SHIFT; - - /* Compute the actual PARange... */ - phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); - - /* - * ... and clamp it to 40 bits, unless we have some braindead - * HW that implements less than that. In all cases, we'll - * return that value for the rest of the kernel to decide what - * to do. - */ - val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift); - - /* - * Check the availability of Hardware Access Flag / Dirty Bit - * Management in ID_AA64MMFR1_EL1 and enable the feature in VTCR_EL2. - */ - tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_HADBS_SHIFT) & 0xf; - if (tmp) - val |= VTCR_EL2_HA; - - /* - * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS - * bit in VTCR_EL2. - */ - tmp = (read_sysreg(id_aa64mmfr1_el1) >> ID_AA64MMFR1_VMIDBITS_SHIFT) & 0xf; - val |= (tmp == ID_AA64MMFR1_VMIDBITS_16) ? - VTCR_EL2_VS_16BIT : - VTCR_EL2_VS_8BIT; - - write_sysreg(val, vtcr_el2); - - return phys_shift; -} diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 848d56b..e764a48 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -26,6 +26,7 @@
#include <kvm/arm_arch_timer.h>
+#include <asm/cpufeature.h> #include <asm/cputype.h> #include <asm/ptrace.h> #include <asm/kvm_arm.h> @@ -180,9 +181,43 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) return ret; }
+/* + * Configure the VTCR_EL2 for this VM. The VTCR value is common + * across all the physical CPUs on the system. We use system wide + * sanitised values to fill in different fields, except for Hardware + * Management of Access Flags. HA Flag is set unconditionally on + * all CPUs, as it is safe to run with or without the feature and + * the bit is RES0 on CPUs that don't support it. + */ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) { + u64 vtcr = VTCR_EL2_FLAGS; + u32 parange, phys_shift; + if (type) return -EINVAL; + + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7; + if (parange > ID_AA64MMFR0_PARANGE_MAX) + parange = ID_AA64MMFR0_PARANGE_MAX; + vtcr |= parange << VTCR_EL2_PS_SHIFT; + + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); + if (phys_shift > KVM_PHYS_SHIFT) + phys_shift = KVM_PHYS_SHIFT; + vtcr |= VTCR_EL2_T0SZ(phys_shift); + + /* + * Enable the Hardware Access Flag management, unconditionally + * on all CPUs. The features is RES0 on CPUs without the support + * and must be ignored by the CPUs. + */ + vtcr |= VTCR_EL2_HA; + + /* Set the vmid bits */ + vtcr |= (kvm_get_vmid_bits() == 16) ? + VTCR_EL2_VS_16BIT : + VTCR_EL2_VS_8BIT; + kvm->arch.vtcr = vtcr; return 0; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit e55cac5bf2a9cc86b57a9533d6b9e5005bc19b5c category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions in arm32 to align with the arm64. Also drop the _AC() specifier constants wherever possible.
Cc: Christoffer Dall cdall@kernel.org Acked-by: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_arm.h | 3 +- arch/arm/include/asm/kvm_mmu.h | 13 +-- arch/arm/include/asm/stage2_pgtable.h | 54 +++++++----- arch/arm64/include/asm/kvm_mmu.h | 7 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++-- arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++-- arch/arm64/include/asm/stage2_pgtable.h | 58 +++++++------ virt/kvm/arm/arm.c | 2 +- virt/kvm/arm/mmu.c | 119 +++++++++++++------------- virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- 10 files changed, 158 insertions(+), 134 deletions(-)
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 3ab8b37..c3f1f9b 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -133,8 +133,7 @@ * space. */ #define KVM_PHYS_SHIFT (40) -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) + #define PTRS_PER_S2_PGD (_AC(1, ULL) << (KVM_PHYS_SHIFT - 30))
/* Virtualization Translation Control Register (VTCR) bits */ diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 523c499..04ff168 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -35,16 +35,12 @@ addr; \ })
-/* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. - */ -#define KVM_MMU_CACHE_MIN_PAGES 2 - #ifndef __ASSEMBLY__
#include <linux/highmem.h> #include <asm/cacheflush.h> #include <asm/cputype.h> +#include <asm/kvm_arm.h> #include <asm/kvm_hyp.h> #include <asm/pgalloc.h> #include <asm/stage2_pgtable.h> @@ -52,6 +48,13 @@ /* Ensure compatibility with arm64 */ #define VA_BITS 32
+#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT +#define kvm_phys_size(kvm) (1ULL << kvm_phys_shift(kvm)) +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - 1ULL) +#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK + +#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) + int create_hyp_mappings(void *from, void *to, pgprot_t prot); int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, void __iomem **kaddr, diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index 460d616..f6a7ea8 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -19,43 +19,53 @@ #ifndef __ARM_S2_PGTABLE_H_ #define __ARM_S2_PGTABLE_H_
-#define stage2_pgd_none(pgd) pgd_none(pgd) -#define stage2_pgd_clear(pgd) pgd_clear(pgd) -#define stage2_pgd_present(pgd) pgd_present(pgd) -#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud) -#define stage2_pud_offset(pgd, address) pud_offset(pgd, address) -#define stage2_pud_free(pud) pud_free(NULL, pud) - -#define stage2_pud_none(pud) pud_none(pud) -#define stage2_pud_clear(pud) pud_clear(pud) -#define stage2_pud_present(pud) pud_present(pud) -#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd) -#define stage2_pmd_offset(pud, address) pmd_offset(pud, address) -#define stage2_pmd_free(pmd) pmd_free(NULL, pmd) - -#define stage2_pud_huge(pud) pud_huge(pud) +/* + * kvm_mmu_cache_min_pages() is the number of pages required + * to install a stage-2 translation. We pre-allocate the entry + * level table at VM creation. Since we have a 3 level page-table, + * we need only two pages to add a new mapping. + */ +#define kvm_mmu_cache_min_pages(kvm) 2 + +#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) +#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) +#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) +#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) +#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) +#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) + +#define stage2_pud_none(kvm, pud) pud_none(pud) +#define stage2_pud_clear(kvm, pud) pud_clear(pud) +#define stage2_pud_present(kvm, pud) pud_present(pud) +#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) +#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) +#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) + +#define stage2_pud_huge(kvm, pud) pud_huge(pud)
/* Open coded p*d_addr_end that can deal with 64bit addresses */ -static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end) +static inline phys_addr_t +stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
return (boundary - 1 < end - 1) ? boundary : end; }
-#define stage2_pud_addr_end(addr, end) (end) +#define stage2_pud_addr_end(kvm, addr, end) (end)
-static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) +static inline phys_addr_t +stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { phys_addr_t boundary = (addr + PMD_SIZE) & PMD_MASK;
return (boundary - 1 < end - 1) ? boundary : end; }
-#define stage2_pgd_index(addr) pgd_index(addr) +#define stage2_pgd_index(kvm, addr) pgd_index(addr)
-#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep) -#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp) -#define stage2_pud_table_empty(pudp) false +#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) +#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) +#define stage2_pud_table_empty(kvm, pudp) false
#endif /* __ARM_S2_PGTABLE_H_ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b255844..b44d753 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -141,8 +141,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v) * We currently only support a 40bit IPA. */ #define KVM_PHYS_SHIFT (40) -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) + +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) +#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK
#include <asm/stage2_pgtable.h>
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h index 2656a0f..0280ded 100644 --- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h +++ b/arch/arm64/include/asm/stage2_pgtable-nopmd.h @@ -26,17 +26,17 @@ #define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) #define S2_PMD_MASK (~(S2_PMD_SIZE-1))
-#define stage2_pud_none(pud) (0) -#define stage2_pud_present(pud) (1) -#define stage2_pud_clear(pud) do { } while (0) -#define stage2_pud_populate(pud, pmd) do { } while (0) -#define stage2_pmd_offset(pud, address) ((pmd_t *)(pud)) +#define stage2_pud_none(kvm, pud) (0) +#define stage2_pud_present(kvm, pud) (1) +#define stage2_pud_clear(kvm, pud) do { } while (0) +#define stage2_pud_populate(kvm, pud, pmd) do { } while (0) +#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud))
-#define stage2_pmd_free(pmd) do { } while (0) +#define stage2_pmd_free(kvm, pmd) do { } while (0)
-#define stage2_pmd_addr_end(addr, end) (end) +#define stage2_pmd_addr_end(kvm, addr, end) (end)
-#define stage2_pud_huge(pud) (0) -#define stage2_pmd_table_empty(pmdp) (0) +#define stage2_pud_huge(kvm, pud) (0) +#define stage2_pmd_table_empty(kvm, pmdp) (0)
#endif diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h index 5ee87b5..cd6304e 100644 --- a/arch/arm64/include/asm/stage2_pgtable-nopud.h +++ b/arch/arm64/include/asm/stage2_pgtable-nopud.h @@ -24,16 +24,16 @@ #define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT) #define S2_PUD_MASK (~(S2_PUD_SIZE-1))
-#define stage2_pgd_none(pgd) (0) -#define stage2_pgd_present(pgd) (1) -#define stage2_pgd_clear(pgd) do { } while (0) -#define stage2_pgd_populate(pgd, pud) do { } while (0) +#define stage2_pgd_none(kvm, pgd) (0) +#define stage2_pgd_present(kvm, pgd) (1) +#define stage2_pgd_clear(kvm, pgd) do { } while (0) +#define stage2_pgd_populate(kvm, pgd, pud) do { } while (0)
-#define stage2_pud_offset(pgd, address) ((pud_t *)(pgd)) +#define stage2_pud_offset(kvm, pgd, address) ((pud_t *)(pgd))
-#define stage2_pud_free(x) do { } while (0) +#define stage2_pud_free(kvm, x) do { } while (0)
-#define stage2_pud_addr_end(addr, end) (end) -#define stage2_pud_table_empty(pmdp) (0) +#define stage2_pud_addr_end(kvm, addr, end) (end) +#define stage2_pud_table_empty(kvm, pmdp) (0)
#endif diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 8b68099..1189161 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -55,7 +55,7 @@
/* S2_PGDIR_SHIFT is the size mapped by top-level stage2 entry */ #define S2_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - STAGE2_PGTABLE_LEVELS) -#define S2_PGDIR_SIZE (_AC(1, UL) << S2_PGDIR_SHIFT) +#define S2_PGDIR_SIZE (1UL << S2_PGDIR_SHIFT) #define S2_PGDIR_MASK (~(S2_PGDIR_SIZE - 1))
/* @@ -65,28 +65,30 @@ #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT))
/* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation - * levels in addition to the PGD. + * kvm_mmmu_cache_min_pages() is the number of pages required to install + * a stage-2 translation. We pre-allocate the entry level page table at + * the VM creation. */ -#define KVM_MMU_CACHE_MIN_PAGES (STAGE2_PGTABLE_LEVELS - 1) +#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1)
#if STAGE2_PGTABLE_LEVELS > 3
#define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1) -#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT) +#define S2_PUD_SIZE (1UL << S2_PUD_SHIFT) #define S2_PUD_MASK (~(S2_PUD_SIZE - 1))
-#define stage2_pgd_none(pgd) pgd_none(pgd) -#define stage2_pgd_clear(pgd) pgd_clear(pgd) -#define stage2_pgd_present(pgd) pgd_present(pgd) -#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud) -#define stage2_pud_offset(pgd, address) pud_offset(pgd, address) -#define stage2_pud_free(pud) pud_free(NULL, pud) +#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) +#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) +#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) +#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) +#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) +#define stage2_pud_free(kvm, pud) pud_free(NULL, pud)
-#define stage2_pud_table_empty(pudp) kvm_page_empty(pudp) +#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp)
-static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end) +static inline phys_addr_t +stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
@@ -99,20 +101,21 @@ static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end) #if STAGE2_PGTABLE_LEVELS > 2
#define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) -#define S2_PMD_SIZE (_AC(1, UL) << S2_PMD_SHIFT) +#define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) #define S2_PMD_MASK (~(S2_PMD_SIZE - 1))
-#define stage2_pud_none(pud) pud_none(pud) -#define stage2_pud_clear(pud) pud_clear(pud) -#define stage2_pud_present(pud) pud_present(pud) -#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd) -#define stage2_pmd_offset(pud, address) pmd_offset(pud, address) -#define stage2_pmd_free(pmd) pmd_free(NULL, pmd) +#define stage2_pud_none(kvm, pud) pud_none(pud) +#define stage2_pud_clear(kvm, pud) pud_clear(pud) +#define stage2_pud_present(kvm, pud) pud_present(pud) +#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) +#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) +#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd)
-#define stage2_pud_huge(pud) pud_huge(pud) -#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp) +#define stage2_pud_huge(kvm, pud) pud_huge(pud) +#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
-static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) +static inline phys_addr_t +stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;
@@ -121,7 +124,7 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)
#endif /* STAGE2_PGTABLE_LEVELS > 2 */
-#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep) +#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)
#if STAGE2_PGTABLE_LEVELS == 2 #include <asm/stage2_pgtable-nopmd.h> @@ -129,10 +132,13 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end) #include <asm/stage2_pgtable-nopud.h> #endif
+#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))
-#define stage2_pgd_index(addr) (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) +#define stage2_pgd_index(kvm, addr) \ + (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))
-static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end) +static inline phys_addr_t +stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index d55659d..826e0ed 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -555,7 +555,7 @@ static void update_vttbr(struct kvm *kvm)
/* update vttbr to be used with the new vmid */ pgd_phys = virt_to_phys(kvm->arch.pgd); - BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK); + BUG_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)); vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK(kvm_vmid_bits); kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 3ec643d..aad4db4 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -45,7 +45,6 @@
static unsigned long io_map_base;
-#define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t)) #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
#define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0) @@ -150,20 +149,20 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr) { - pud_t *pud_table __maybe_unused = stage2_pud_offset(pgd, 0UL); - stage2_pgd_clear(pgd); + pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL); + stage2_pgd_clear(kvm, pgd); kvm_tlb_flush_vmid_ipa(kvm, addr); - stage2_pud_free(pud_table); + stage2_pud_free(kvm, pud_table); put_page(virt_to_page(pgd)); }
static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr) { - pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(pud, 0); - VM_BUG_ON(stage2_pud_huge(*pud)); - stage2_pud_clear(pud); + pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0); + VM_BUG_ON(stage2_pud_huge(kvm, *pud)); + stage2_pud_clear(kvm, pud); kvm_tlb_flush_vmid_ipa(kvm, addr); - stage2_pmd_free(pmd_table); + stage2_pmd_free(kvm, pmd_table); put_page(virt_to_page(pud)); }
@@ -252,7 +251,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd, } } while (pte++, addr += PAGE_SIZE, addr != end);
- if (stage2_pte_table_empty(start_pte)) + if (stage2_pte_table_empty(kvm, start_pte)) clear_stage2_pmd_entry(kvm, pmd, start_addr); }
@@ -262,9 +261,9 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud, phys_addr_t next, start_addr = addr; pmd_t *pmd, *start_pmd;
- start_pmd = pmd = stage2_pmd_offset(pud, addr); + start_pmd = pmd = stage2_pmd_offset(kvm, pud, addr); do { - next = stage2_pmd_addr_end(addr, end); + next = stage2_pmd_addr_end(kvm, addr, end); if (!pmd_none(*pmd)) { if (pmd_thp_or_huge(*pmd)) { pmd_t old_pmd = *pmd; @@ -281,7 +280,7 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud, } } while (pmd++, addr = next, addr != end);
- if (stage2_pmd_table_empty(start_pmd)) + if (stage2_pmd_table_empty(kvm, start_pmd)) clear_stage2_pud_entry(kvm, pud, start_addr); }
@@ -291,14 +290,14 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, phys_addr_t next, start_addr = addr; pud_t *pud, *start_pud;
- start_pud = pud = stage2_pud_offset(pgd, addr); + start_pud = pud = stage2_pud_offset(kvm, pgd, addr); do { - next = stage2_pud_addr_end(addr, end); - if (!stage2_pud_none(*pud)) { - if (stage2_pud_huge(*pud)) { + next = stage2_pud_addr_end(kvm, addr, end); + if (!stage2_pud_none(kvm, *pud)) { + if (stage2_pud_huge(kvm, *pud)) { pud_t old_pud = *pud;
- stage2_pud_clear(pud); + stage2_pud_clear(kvm, pud); kvm_tlb_flush_vmid_ipa(kvm, addr); kvm_flush_dcache_pud(old_pud); put_page(virt_to_page(pud)); @@ -308,7 +307,7 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, } } while (pud++, addr = next, addr != end);
- if (stage2_pud_table_empty(start_pud)) + if (stage2_pud_table_empty(kvm, start_pud)) clear_stage2_pgd_entry(kvm, pgd, start_addr); }
@@ -332,7 +331,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) assert_spin_locked(&kvm->mmu_lock); WARN_ON(size & ~PAGE_MASK);
- pgd = kvm->arch.pgd + stage2_pgd_index(addr); + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); do { /* * Make sure the page table is still active, as another thread @@ -341,8 +340,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) */ if (!READ_ONCE(kvm->arch.pgd)) break; - next = stage2_pgd_addr_end(addr, end); - if (!stage2_pgd_none(*pgd)) + next = stage2_pgd_addr_end(kvm, addr, end); + if (!stage2_pgd_none(kvm, *pgd)) unmap_stage2_puds(kvm, pgd, addr, next); /* * If the range is too large, release the kvm->mmu_lock @@ -371,9 +370,9 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud, pmd_t *pmd; phys_addr_t next;
- pmd = stage2_pmd_offset(pud, addr); + pmd = stage2_pmd_offset(kvm, pud, addr); do { - next = stage2_pmd_addr_end(addr, end); + next = stage2_pmd_addr_end(kvm, addr, end); if (!pmd_none(*pmd)) { if (pmd_thp_or_huge(*pmd)) kvm_flush_dcache_pmd(*pmd); @@ -389,11 +388,11 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd, pud_t *pud; phys_addr_t next;
- pud = stage2_pud_offset(pgd, addr); + pud = stage2_pud_offset(kvm, pgd, addr); do { - next = stage2_pud_addr_end(addr, end); - if (!stage2_pud_none(*pud)) { - if (stage2_pud_huge(*pud)) + next = stage2_pud_addr_end(kvm, addr, end); + if (!stage2_pud_none(kvm, *pud)) { + if (stage2_pud_huge(kvm, *pud)) kvm_flush_dcache_pud(*pud); else stage2_flush_pmds(kvm, pud, addr, next); @@ -409,10 +408,10 @@ static void stage2_flush_memslot(struct kvm *kvm, phys_addr_t next; pgd_t *pgd;
- pgd = kvm->arch.pgd + stage2_pgd_index(addr); + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); do { - next = stage2_pgd_addr_end(addr, end); - if (!stage2_pgd_none(*pgd)) + next = stage2_pgd_addr_end(kvm, addr, end); + if (!stage2_pgd_none(kvm, *pgd)) stage2_flush_puds(kvm, pgd, addr, next); } while (pgd++, addr = next, addr != end); } @@ -898,7 +897,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) }
/* Allocate the HW PGD, making sure that each page gets its own refcount */ - pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO); + pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO); if (!pgd) return -ENOMEM;
@@ -987,7 +986,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
spin_lock(&kvm->mmu_lock); if (kvm->arch.pgd) { - unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); + unmap_stage2_range(kvm, 0, kvm_phys_size(kvm)); pgd = READ_ONCE(kvm->arch.pgd); kvm->arch.pgd = NULL; } @@ -995,7 +994,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
/* Free the HW pgd, one page at a time */ if (pgd) - free_pages_exact(pgd, S2_PGD_SIZE); + free_pages_exact(pgd, stage2_pgd_size(kvm)); }
static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, @@ -1004,16 +1003,16 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pgd_t *pgd; pud_t *pud;
- pgd = kvm->arch.pgd + stage2_pgd_index(addr); - if (stage2_pgd_none(*pgd)) { + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); + if (stage2_pgd_none(kvm, *pgd)) { if (!cache) return NULL; pud = mmu_memory_cache_alloc(cache); - stage2_pgd_populate(pgd, pud); + stage2_pgd_populate(kvm, pgd, pud); get_page(virt_to_page(pgd)); }
- return stage2_pud_offset(pgd, addr); + return stage2_pud_offset(kvm, pgd, addr); }
static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, @@ -1026,15 +1025,15 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache if (!pud) return NULL;
- if (stage2_pud_none(*pud)) { + if (stage2_pud_none(kvm, *pud)) { if (!cache) return NULL; pmd = mmu_memory_cache_alloc(cache); - stage2_pud_populate(pud, pmd); + stage2_pud_populate(kvm, pud, pmd); get_page(virt_to_page(pud)); }
- return stage2_pmd_offset(pud, addr); + return stage2_pmd_offset(kvm, pud, addr); }
static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache @@ -1208,8 +1207,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, if (writable) pte = kvm_s2pte_mkwrite(pte);
- ret = mmu_topup_memory_cache(&cache, KVM_MMU_CACHE_MIN_PAGES, - KVM_NR_MEM_OBJS); + ret = mmu_topup_memory_cache(&cache, + kvm_mmu_cache_min_pages(kvm), + KVM_NR_MEM_OBJS); if (ret) goto out; spin_lock(&kvm->mmu_lock); @@ -1303,19 +1303,21 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
/** * stage2_wp_pmds - write protect PUD range + * kvm: kvm instance for the VM * @pud: pointer to pud entry * @addr: range start address * @end: range end address */ -static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) +static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, + phys_addr_t addr, phys_addr_t end) { pmd_t *pmd; phys_addr_t next;
- pmd = stage2_pmd_offset(pud, addr); + pmd = stage2_pmd_offset(kvm, pud, addr);
do { - next = stage2_pmd_addr_end(addr, end); + next = stage2_pmd_addr_end(kvm, addr, end); if (!pmd_none(*pmd)) { if (pmd_thp_or_huge(*pmd)) { if (!kvm_s2pmd_readonly(pmd)) @@ -1335,18 +1337,19 @@ static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) * * Process PUD entries, for a huge PUD we cause a panic. */ -static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) +static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, + phys_addr_t addr, phys_addr_t end) { pud_t *pud; phys_addr_t next;
- pud = stage2_pud_offset(pgd, addr); + pud = stage2_pud_offset(kvm, pgd, addr); do { - next = stage2_pud_addr_end(addr, end); - if (!stage2_pud_none(*pud)) { + next = stage2_pud_addr_end(kvm, addr, end); + if (!stage2_pud_none(kvm, *pud)) { /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(*pud)); - stage2_wp_pmds(pud, addr, next); + BUG_ON(stage2_pud_huge(kvm, *pud)); + stage2_wp_pmds(kvm, pud, addr, next); } } while (pud++, addr = next, addr != end); } @@ -1362,7 +1365,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) pgd_t *pgd; phys_addr_t next;
- pgd = kvm->arch.pgd + stage2_pgd_index(addr); + pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); do { /* * Release kvm_mmu_lock periodically if the memory region is @@ -1376,9 +1379,9 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) cond_resched_lock(&kvm->mmu_lock); if (!READ_ONCE(kvm->arch.pgd)) break; - next = stage2_pgd_addr_end(addr, end); - if (stage2_pgd_present(*pgd)) - stage2_wp_puds(pgd, addr, next); + next = stage2_pgd_addr_end(kvm, addr, end); + if (stage2_pgd_present(kvm, *pgd)) + stage2_wp_puds(kvm, pgd, addr, next); } while (pgd++, addr = next, addr != end); }
@@ -1527,7 +1530,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, up_read(¤t->mm->mmap_sem);
/* We need minimum second+third level pages */ - ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES, + ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm), KVM_NR_MEM_OBJS); if (ret) return ret; @@ -1770,7 +1773,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) }
/* Userspace should not be able to register out-of-bounds IPAs */ - VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE); + VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
if (fault_status == FSC_ACCESS) { handle_access_fault(vcpu, fault_ipa); @@ -2069,7 +2072,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, * space addressable by the KVM guest IPA space. */ if (memslot->base_gfn + memslot->npages >= - (KVM_PHYS_SIZE >> PAGE_SHIFT)) + (kvm_phys_size(kvm) >> PAGE_SHIFT)) return -EFAULT;
down_read(¤t->mm->mmap_sem); diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c index 6ada243..114dce9 100644 --- a/virt/kvm/arm/vgic/vgic-kvm-device.c +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c @@ -25,7 +25,7 @@ int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, phys_addr_t addr, phys_addr_t alignment) { - if (addr & ~KVM_PHYS_MASK) + if (addr & ~kvm_phys_mask(kvm)) return -E2BIG;
if (!IS_ALIGNED(addr, alignment))
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 865b30cdd9b286820aef44393bafc4c0ccee53d0 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Our stage2 page table helpers are statically defined based on the fixed IPA of 40bits and the host page size. As we are about to add support for configurable IPA size for VMs, we need to make the page table checks for each VM. This patch prepares the stage2 helpers to make the transition to a VM dependent table layout easier. Instead of statically defining the table helpers based on the page table levels, we now check the page table levels in the helpers to do the right thing. In effect, it simply converts the macros to static inline functions.
Cc: Eric Auger eric.auger@redhat.com Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_mmu.h | 12 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ------- arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 ------ arch/arm64/include/asm/stage2_pgtable.h | 167 ++++++++++++++++++++------ 4 files changed, 136 insertions(+), 124 deletions(-) delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b44d753..f63f710 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -147,6 +147,12 @@ static inline unsigned long __kern_hyp_va(unsigned long v) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) #define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK
+static inline bool kvm_page_empty(void *ptr) +{ + struct page *ptr_page = virt_to_page(ptr); + return page_count(ptr_page) == 1; +} + #include <asm/stage2_pgtable.h>
int create_hyp_mappings(void *from, void *to, pgprot_t prot); @@ -241,12 +247,6 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); }
-static inline bool kvm_page_empty(void *ptr) -{ - struct page *ptr_page = virt_to_page(ptr); - return page_count(ptr_page) == 1; -} - #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
#ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h deleted file mode 100644 index 0280ded..00000000 --- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Copyright (C) 2016 - ARM Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ - -#ifndef __ARM64_S2_PGTABLE_NOPMD_H_ -#define __ARM64_S2_PGTABLE_NOPMD_H_ - -#include <asm/stage2_pgtable-nopud.h> - -#define __S2_PGTABLE_PMD_FOLDED - -#define S2_PMD_SHIFT S2_PUD_SHIFT -#define S2_PTRS_PER_PMD 1 -#define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) -#define S2_PMD_MASK (~(S2_PMD_SIZE-1)) - -#define stage2_pud_none(kvm, pud) (0) -#define stage2_pud_present(kvm, pud) (1) -#define stage2_pud_clear(kvm, pud) do { } while (0) -#define stage2_pud_populate(kvm, pud, pmd) do { } while (0) -#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud)) - -#define stage2_pmd_free(kvm, pmd) do { } while (0) - -#define stage2_pmd_addr_end(kvm, addr, end) (end) - -#define stage2_pud_huge(kvm, pud) (0) -#define stage2_pmd_table_empty(kvm, pmdp) (0) - -#endif diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h deleted file mode 100644 index cd6304e..00000000 --- a/arch/arm64/include/asm/stage2_pgtable-nopud.h +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Copyright (C) 2016 - ARM Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ - -#ifndef __ARM64_S2_PGTABLE_NOPUD_H_ -#define __ARM64_S2_PGTABLE_NOPUD_H_ - -#define __S2_PGTABLE_PUD_FOLDED - -#define S2_PUD_SHIFT S2_PGDIR_SHIFT -#define S2_PTRS_PER_PUD 1 -#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT) -#define S2_PUD_MASK (~(S2_PUD_SIZE-1)) - -#define stage2_pgd_none(kvm, pgd) (0) -#define stage2_pgd_present(kvm, pgd) (1) -#define stage2_pgd_clear(kvm, pgd) do { } while (0) -#define stage2_pgd_populate(kvm, pgd, pud) do { } while (0) - -#define stage2_pud_offset(kvm, pgd, address) ((pud_t *)(pgd)) - -#define stage2_pud_free(kvm, x) do { } while (0) - -#define stage2_pud_addr_end(kvm, addr, end) (end) -#define stage2_pud_table_empty(kvm, pmdp) (0) - -#endif diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 1189161..384f9e9 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -19,6 +19,7 @@ #ifndef __ARM64_S2_PGTABLE_H_ #define __ARM64_S2_PGTABLE_H_
+#include <linux/hugetlb.h> #include <asm/pgtable.h>
/* @@ -71,71 +72,163 @@ */ #define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1)
- -#if STAGE2_PGTABLE_LEVELS > 3 - +/* Stage2 PUD definitions when the level is present */ +#define STAGE2_PGTABLE_HAS_PUD (STAGE2_PGTABLE_LEVELS > 3) #define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1) #define S2_PUD_SIZE (1UL << S2_PUD_SHIFT) #define S2_PUD_MASK (~(S2_PUD_SIZE - 1))
-#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) -#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) -#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) -#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) -#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) -#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) +static inline bool stage2_pgd_none(struct kvm *kvm, pgd_t pgd) +{ + if (STAGE2_PGTABLE_HAS_PUD) + return pgd_none(pgd); + else + return 0; +}
-#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp) +static inline void stage2_pgd_clear(struct kvm *kvm, pgd_t *pgdp) +{ + if (STAGE2_PGTABLE_HAS_PUD) + pgd_clear(pgdp); +}
-static inline phys_addr_t -stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) +static inline bool stage2_pgd_present(struct kvm *kvm, pgd_t pgd) { - phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK; + if (STAGE2_PGTABLE_HAS_PUD) + return pgd_present(pgd); + else + return 1; +}
- return (boundary - 1 < end - 1) ? boundary : end; +static inline void stage2_pgd_populate(struct kvm *kvm, pgd_t *pgd, pud_t *pud) +{ + if (STAGE2_PGTABLE_HAS_PUD) + pgd_populate(NULL, pgd, pud); }
-#endif /* STAGE2_PGTABLE_LEVELS > 3 */ +static inline pud_t *stage2_pud_offset(struct kvm *kvm, + pgd_t *pgd, unsigned long address) +{ + if (STAGE2_PGTABLE_HAS_PUD) + return pud_offset(pgd, address); + else + return (pud_t *)pgd; +}
+static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud) +{ + if (STAGE2_PGTABLE_HAS_PUD) + pud_free(NULL, pud); +} + +static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) +{ + if (STAGE2_PGTABLE_HAS_PUD) + return kvm_page_empty(pudp); + else + return false; +}
-#if STAGE2_PGTABLE_LEVELS > 2 +static inline phys_addr_t +stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) +{ + if (STAGE2_PGTABLE_HAS_PUD) { + phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
+ return (boundary - 1 < end - 1) ? boundary : end; + } else { + return end; + } +} + +/* Stage2 PMD definitions when the level is present */ +#define STAGE2_PGTABLE_HAS_PMD (STAGE2_PGTABLE_LEVELS > 2) #define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) #define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) #define S2_PMD_MASK (~(S2_PMD_SIZE - 1))
-#define stage2_pud_none(kvm, pud) pud_none(pud) -#define stage2_pud_clear(kvm, pud) pud_clear(pud) -#define stage2_pud_present(kvm, pud) pud_present(pud) -#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) -#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) -#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) +static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud) +{ + if (STAGE2_PGTABLE_HAS_PMD) + return pud_none(pud); + else + return 0; +}
-#define stage2_pud_huge(kvm, pud) pud_huge(pud) -#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) +static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pud) +{ + if (STAGE2_PGTABLE_HAS_PMD) + pud_clear(pud); +}
-static inline phys_addr_t -stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) +static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud) { - phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK; + if (STAGE2_PGTABLE_HAS_PMD) + return pud_present(pud); + else + return 1; +}
- return (boundary - 1 < end - 1) ? boundary : end; +static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd) +{ + if (STAGE2_PGTABLE_HAS_PMD) + pud_populate(NULL, pud, pmd); }
-#endif /* STAGE2_PGTABLE_LEVELS > 2 */ +static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, + pud_t *pud, unsigned long address) +{ + if (STAGE2_PGTABLE_HAS_PMD) + return pmd_offset(pud, address); + else + return (pmd_t *)pud; +}
-#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) +static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd) +{ + if (STAGE2_PGTABLE_HAS_PMD) + pmd_free(NULL, pmd); +}
-#if STAGE2_PGTABLE_LEVELS == 2 -#include <asm/stage2_pgtable-nopmd.h> -#elif STAGE2_PGTABLE_LEVELS == 3 -#include <asm/stage2_pgtable-nopud.h> -#endif +static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud) +{ + if (STAGE2_PGTABLE_HAS_PMD) + return pud_huge(pud); + else + return 0; +} + +static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp) +{ + if (STAGE2_PGTABLE_HAS_PMD) + return kvm_page_empty(pmdp); + else + return 0; +} + +static inline phys_addr_t +stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) +{ + if (STAGE2_PGTABLE_HAS_PMD) { + phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK; + + return (boundary - 1 < end - 1) ? boundary : end; + } else { + return end; + } +} + +static inline bool stage2_pte_table_empty(struct kvm *kvm, pte_t *ptep) +{ + return kvm_page_empty(ptep); +}
#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))
-#define stage2_pgd_index(kvm, addr) \ - (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) +static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr) +{ + return (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)); +}
static inline phys_addr_t stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 61fa5a867b6521b4c55865b88bf70c93078d0cf8 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Switch to dynamic stage2 page table layout based on the given VM. So far we had a common stage2 table layout determined at compile time. Make decision based on the VM instance depending on the IPA limit for the VM. Adds helpers to compute the stage2 parameters based on the guest's IPA and uses them to make the decisions.
The IPA limit is still fixed to 40bits and the build time check to ensure the stage2 doesn't exceed the host kernels page table levels is retained. Also make sure that we use the pud/pmd level helpers from the host only when they are not folded.
Cc: Christoffer Dall cdall@kernel.org Cc: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/stage2_pgtable.h | 84 ++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 32 deletions(-)
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 384f9e9..36a0a11 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -23,6 +23,13 @@ #include <asm/pgtable.h>
/* + * PGDIR_SHIFT determines the size a top-level page table entry can map + * and depends on the number of levels in the page table. Compute the + * PGDIR_SHIFT for a given number of levels. + */ +#define pt_levels_pgdir_shift(lvls) ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (lvls)) + +/* * The hardware supports concatenation of up to 16 tables at stage2 entry level * and we use the feature whenever possible. * @@ -30,11 +37,13 @@ * On arm64, the smallest PAGE_SIZE supported is 4k, which means * (PAGE_SHIFT - 3) > 4 holds for all page sizes. * This implies, the total number of page table levels at stage2 expected - * by the hardware is actually the number of levels required for (KVM_PHYS_SHIFT - 4) + * by the hardware is actually the number of levels required for (IPA_SHIFT - 4) * in normal translations(e.g, stage1), since we cannot have another level in - * the range (KVM_PHYS_SHIFT, KVM_PHYS_SHIFT - 4). + * the range (IPA_SHIFT, IPA_SHIFT - 4). */ -#define STAGE2_PGTABLE_LEVELS ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4) +#define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4) +#define STAGE2_PGTABLE_LEVELS stage2_pgtable_levels(KVM_PHYS_SHIFT) +#define kvm_stage2_levels(kvm) stage2_pgtable_levels(kvm_phys_shift(kvm))
/* * With all the supported VA_BITs and 40bit guest IPA, the following condition @@ -54,33 +63,42 @@ #error "Unsupported combination of guest IPA and host VA_BITS." #endif
-/* S2_PGDIR_SHIFT is the size mapped by top-level stage2 entry */ -#define S2_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - STAGE2_PGTABLE_LEVELS) -#define S2_PGDIR_SIZE (1UL << S2_PGDIR_SHIFT) -#define S2_PGDIR_MASK (~(S2_PGDIR_SIZE - 1)) + +/* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */ +#define stage2_pgdir_shift(kvm) pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) +#define stage2_pgdir_size(kvm) (1ULL << stage2_pgdir_shift(kvm)) +#define stage2_pgdir_mask(kvm) ~(stage2_pgdir_size(kvm) - 1)
/* * The number of PTRS across all concatenated stage2 tables given by the * number of bits resolved at the initial level. */ -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT)) +#define __s2_pgd_ptrs(ipa, lvls) (1 << ((ipa) - pt_levels_pgdir_shift((lvls)))) +#define __s2_pgd_size(ipa, lvls) (__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t)) + +#define stage2_pgd_ptrs(kvm) __s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)) +#define stage2_pgd_size(kvm) __s2_pgd_size(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
/* * kvm_mmmu_cache_min_pages() is the number of pages required to install * a stage-2 translation. We pre-allocate the entry level page table at * the VM creation. */ -#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1) +#define kvm_mmu_cache_min_pages(kvm) (kvm_stage2_levels(kvm) - 1)
/* Stage2 PUD definitions when the level is present */ -#define STAGE2_PGTABLE_HAS_PUD (STAGE2_PGTABLE_LEVELS > 3) +static inline bool kvm_stage2_has_pud(struct kvm *kvm) +{ + return (CONFIG_PGTABLE_LEVELS > 3) && (kvm_stage2_levels(kvm) > 3); +} + #define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1) #define S2_PUD_SIZE (1UL << S2_PUD_SHIFT) #define S2_PUD_MASK (~(S2_PUD_SIZE - 1))
static inline bool stage2_pgd_none(struct kvm *kvm, pgd_t pgd) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) return pgd_none(pgd); else return 0; @@ -88,13 +106,13 @@ static inline bool stage2_pgd_none(struct kvm *kvm, pgd_t pgd)
static inline void stage2_pgd_clear(struct kvm *kvm, pgd_t *pgdp) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) pgd_clear(pgdp); }
static inline bool stage2_pgd_present(struct kvm *kvm, pgd_t pgd) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) return pgd_present(pgd); else return 1; @@ -102,14 +120,14 @@ static inline bool stage2_pgd_present(struct kvm *kvm, pgd_t pgd)
static inline void stage2_pgd_populate(struct kvm *kvm, pgd_t *pgd, pud_t *pud) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) pgd_populate(NULL, pgd, pud); }
static inline pud_t *stage2_pud_offset(struct kvm *kvm, pgd_t *pgd, unsigned long address) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) return pud_offset(pgd, address); else return (pud_t *)pgd; @@ -117,13 +135,13 @@ static inline pud_t *stage2_pud_offset(struct kvm *kvm,
static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) pud_free(NULL, pud); }
static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) { - if (STAGE2_PGTABLE_HAS_PUD) + if (kvm_stage2_has_pud(kvm)) return kvm_page_empty(pudp); else return false; @@ -132,7 +150,7 @@ static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) static inline phys_addr_t stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { - if (STAGE2_PGTABLE_HAS_PUD) { + if (kvm_stage2_has_pud(kvm)) { phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
return (boundary - 1 < end - 1) ? boundary : end; @@ -142,14 +160,18 @@ static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) }
/* Stage2 PMD definitions when the level is present */ -#define STAGE2_PGTABLE_HAS_PMD (STAGE2_PGTABLE_LEVELS > 2) +static inline bool kvm_stage2_has_pmd(struct kvm *kvm) +{ + return (CONFIG_PGTABLE_LEVELS > 2) && (kvm_stage2_levels(kvm) > 2); +} + #define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) #define S2_PMD_SIZE (1UL << S2_PMD_SHIFT) #define S2_PMD_MASK (~(S2_PMD_SIZE - 1))
static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) return pud_none(pud); else return 0; @@ -157,13 +179,13 @@ static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud)
static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pud) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) pud_clear(pud); }
static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) return pud_present(pud); else return 1; @@ -171,14 +193,14 @@ static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud)
static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) pud_populate(NULL, pud, pmd); }
static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, pud_t *pud, unsigned long address) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) return pmd_offset(pud, address); else return (pmd_t *)pud; @@ -186,13 +208,13 @@ static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) pmd_free(NULL, pmd); }
static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) return pud_huge(pud); else return 0; @@ -200,7 +222,7 @@ static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)
static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp) { - if (STAGE2_PGTABLE_HAS_PMD) + if (kvm_stage2_has_pmd(kvm)) return kvm_page_empty(pmdp); else return 0; @@ -209,7 +231,7 @@ static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp) static inline phys_addr_t stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { - if (STAGE2_PGTABLE_HAS_PMD) { + if (kvm_stage2_has_pmd(kvm)) { phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;
return (boundary - 1 < end - 1) ? boundary : end; @@ -223,17 +245,15 @@ static inline bool stage2_pte_table_empty(struct kvm *kvm, pte_t *ptep) return kvm_page_empty(ptep); }
-#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) - static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr) { - return (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)); + return (((addr) >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1)); }
static inline phys_addr_t stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { - phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK; + phys_addr_t boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm);
return (boundary - 1 < end - 1) ? boundary : end; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 595583306434caafac4a71102dca5a8d32d1a769 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 translation table. The Arm ARM mandates that the bits BADDR[x-1:0] should be 0, where 'x' is defined for a given IPA Size and the number of levels for a translation granule size. It is defined using some magical constants. This patch is a reverse engineered implementation to calculate the 'x' at runtime for a given ipa and number of page table levels. See patch for more details.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 73 +++++++++++++++++++++++++++++++++++----- arch/arm64/include/asm/kvm_mmu.h | 25 +++++++++++++- 2 files changed, 88 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index de37143..4454546 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -126,7 +126,6 @@ #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) #define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) #define VTCR_EL2_T0SZ_MASK 0x3f -#define VTCR_EL2_T0SZ_40B 24 #define VTCR_EL2_VS_SHIFT 19 #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) @@ -143,11 +142,8 @@ * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. * - * The magic numbers used for VTTBR_X in this patch can be found in Tables - * D4-23 and D4-25 in ARM DDI 0487A.b. */
-#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
@@ -158,7 +154,6 @@ * 2 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC 38 #elif defined(CONFIG_ARM64_16K_PAGES) /* * Stage2 translation configuration: @@ -166,7 +161,6 @@ * 2 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC 42 #else /* 4K */ /* * Stage2 translation configuration: @@ -174,13 +168,74 @@ * 3 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC 37 #endif
#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) -#define VTTBR_X (VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA) +/* + * ARM VMSAv8-64 defines an algorithm for finding the translation table + * descriptors in section D4.2.8 in ARM DDI 0487C.a. + * + * The algorithm defines the expectations on the translation table + * addresses for each level, based on PAGE_SIZE, entry level + * and the translation table size (T0SZ). The variable "x" in the + * algorithm determines the alignment of a table base address at a given + * level and thus determines the alignment of VTTBR:BADDR for stage2 + * page table entry level. + * Since the number of bits resolved at the entry level could vary + * depending on the T0SZ, the value of "x" is defined based on a + * Magic constant for a given PAGE_SIZE and Entry Level. The + * intermediate levels must be always aligned to the PAGE_SIZE (i.e, + * x = PAGE_SHIFT). + * + * The value of "x" for entry level is calculated as : + * x = Magic_N - T0SZ + * + * where Magic_N is an integer depending on the page size and the entry + * level of the page table as below: + * + * -------------------------------------------- + * | Entry level | 4K 16K 64K | + * -------------------------------------------- + * | Level: 0 (4 levels) | 28 | - | - | + * -------------------------------------------- + * | Level: 1 (3 levels) | 37 | 31 | 25 | + * -------------------------------------------- + * | Level: 2 (2 levels) | 46 | 42 | 38 | + * -------------------------------------------- + * | Level: 3 (1 level) | - | 53 | 51 | + * -------------------------------------------- + * + * We have a magic formula for the Magic_N below: + * + * Magic_N(PAGE_SIZE, Level) = 64 - ((PAGE_SHIFT - 3) * Number_of_levels) + * + * where Number_of_levels = (4 - Level). We are only interested in the + * value for Entry_Level for the stage2 page table. + * + * So, given that T0SZ = (64 - IPA_SHIFT), we can compute 'x' as follows: + * + * x = (64 - ((PAGE_SHIFT - 3) * Number_of_levels)) - (64 - IPA_SHIFT) + * = IPA_SHIFT - ((PAGE_SHIFT - 3) * Number of levels) + * + * Here is one way to explain the Magic Formula: + * + * x = log2(Size_of_Entry_Level_Table) + * + * Since, we can resolve (PAGE_SHIFT - 3) bits at each level, and another + * PAGE_SHIFT bits in the PTE, we have : + * + * Bits_Entry_level = IPA_SHIFT - ((PAGE_SHIFT - 3) * (n - 1) + PAGE_SHIFT) + * = IPA_SHIFT - (PAGE_SHIFT - 3) * n - 3 + * where n = number of levels, and since each pointer is 8bytes, we have: + * + * x = Bits_Entry_Level + 3 + * = IPA_SHIFT - (PAGE_SHIFT - 3) * n + * + * The only constraint here is that, we have to find the number of page table + * levels for a given IPA size (which we do, see stage2_pt_levels()) + */ +#define ARM64_VTTBR_X(ipa, levels) ((ipa) - ((levels) * (PAGE_SHIFT - 3)))
-#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_X) #define VTTBR_VMID_SHIFT (UL(48)) #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index f63f710..f1e3cbb 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -145,7 +145,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v) #define kvm_phys_shift(kvm) KVM_PHYS_SHIFT #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) -#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK
static inline bool kvm_page_empty(void *ptr) { @@ -531,5 +530,29 @@ static inline int hyp_map_aux_data(void)
#define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)
+/* + * Get the magic number 'x' for VTTBR:BADDR of this KVM instance. + * With v8.2 LVA extensions, 'x' should be a minimum of 6 with + * 52bit IPS. + */ +static inline int arm64_vttbr_x(u32 ipa_shift, u32 levels) +{ + int x = ARM64_VTTBR_X(ipa_shift, levels); + + return (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && x < 6) ? 6 : x; +} + +static inline u64 vttbr_baddr_mask(u32 ipa_shift, u32 levels) +{ + unsigned int x = arm64_vttbr_x(ipa_shift, levels); + + return GENMASK_ULL(PHYS_MASK_SHIFT - 1, x); +} + +static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm) +{ + return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)); +} + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 7e8130456e067f49693fdcc25b5ba242a9c5568b category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
VTCR_EL2 holds the following key stage2 translation table parameters: SL0 - Entry level in the page table lookup. T0SZ - Denotes the size of the memory addressed by the table.
We have been using fixed values for the SL0 depending on the page size as we have a fixed IPA size. But since we are about to make it dynamic, we need to calculate the SL0 at runtime per VM. This patch adds a helper to compute the value of SL0 for a VM based on the IPA size.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 67 ++++++++++++++++++++++++++++------------ arch/arm64/kvm/reset.c | 1 + 2 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 4454546..9de8137 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -124,7 +124,6 @@ #define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA #define VTCR_EL2_SL0_SHIFT 6 #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) -#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) #define VTCR_EL2_T0SZ_MASK 0x3f #define VTCR_EL2_VS_SHIFT 19 #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) @@ -147,30 +146,60 @@ #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
-#ifdef CONFIG_ARM64_64K_PAGES /* - * Stage2 translation configuration: - * 64kB pages (TG0 = 1) - * 2 level page tables (SL = 1) + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. + * Interestingly, it depends on the page size. + * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a + * + * ----------------------------------------- + * | Entry level | 4K | 16K/64K | + * ------------------------------------------ + * | Level: 0 | 2 | - | + * ------------------------------------------ + * | Level: 1 | 1 | 2 | + * ------------------------------------------ + * | Level: 2 | 0 | 1 | + * ------------------------------------------ + * | Level: 3 | - | 0 | + * ------------------------------------------ + * + * The table roughly translates to : + * + * SL0(PAGE_SIZE, Entry_level) = TGRAN_SL0_BASE - Entry_Level + * + * Where TGRAN_SL0_BASE is a magic number depending on the page size: + * TGRAN_SL0_BASE(4K) = 2 + * TGRAN_SL0_BASE(16K) = 3 + * TGRAN_SL0_BASE(64K) = 3 + * provided we take care of ruling out the unsupported cases and + * Entry_Level = 4 - Number_of_levels. + * */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) +#ifdef CONFIG_ARM64_64K_PAGES + +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K +#define VTCR_EL2_TGRAN_SL0_BASE 3UL + #elif defined(CONFIG_ARM64_16K_PAGES) -/* - * Stage2 translation configuration: - * 16kB pages (TG0 = 2) - * 2 level page tables (SL = 1) - */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) + +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K +#define VTCR_EL2_TGRAN_SL0_BASE 3UL + #else /* 4K */ -/* - * Stage2 translation configuration: - * 4kB pages (TG0 = 0) - * 3 level page tables (SL = 1) - */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) + +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K +#define VTCR_EL2_TGRAN_SL0_BASE 2UL + #endif
-#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) +#define VTCR_EL2_LVLS_TO_SL0(levels) \ + ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT) +#define VTCR_EL2_SL0_TO_LVLS(sl0) \ + ((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE) +#define VTCR_EL2_LVLS(vtcr) \ + VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT) + +#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) /* * ARM VMSAv8-64 defines an algorithm for finding the translation table * descriptors in section D4.2.8 in ARM DDI 0487C.a. diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index e764a48..587ed62 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -206,6 +206,7 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) if (phys_shift > KVM_PHYS_SHIFT) phys_shift = KVM_PHYS_SHIFT; vtcr |= VTCR_EL2_T0SZ(phys_shift); + vtcr |= VTCR_EL2_LVLS_TO_SL0(kvm_stage2_levels(kvm));
/* * Enable the Hardware Access Flag management, unconditionally
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 13ac4bbcc457d3925b4031cc70e3031fd8b9c3b7 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Now that we can manage the stage2 page table per VM, switch the configuration details to per VM instance. The VTCR is updated with the values specific to the VM based on the configuration. We store the IPA size and the number of stage2 page table levels for the guest already in VTCR. Decode it back from the vtcr field wherever we need it.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 2 ++ arch/arm64/include/asm/kvm_mmu.h | 2 +- arch/arm64/include/asm/stage2_pgtable.h | 2 +- arch/arm64/kvm/reset.c | 2 +- 4 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 9de8137..d172f7b 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -200,6 +200,8 @@ VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT)
#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) +#define VTCR_EL2_IPA(vtcr) (64 - ((vtcr) & VTCR_EL2_T0SZ_MASK)) + /* * ARM VMSAv8-64 defines an algorithm for finding the translation table * descriptors in section D4.2.8 in ARM DDI 0487C.a. diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index f1e3cbb..48c71c8 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -142,7 +142,7 @@ static inline unsigned long __kern_hyp_va(unsigned long v) */ #define KVM_PHYS_SHIFT (40)
-#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT +#define kvm_phys_shift(kvm) VTCR_EL2_IPA(kvm->arch.vtcr) #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 36a0a11..c62fe11 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -43,7 +43,7 @@ */ #define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4) #define STAGE2_PGTABLE_LEVELS stage2_pgtable_levels(KVM_PHYS_SHIFT) -#define kvm_stage2_levels(kvm) stage2_pgtable_levels(kvm_phys_shift(kvm)) +#define kvm_stage2_levels(kvm) VTCR_EL2_LVLS(kvm->arch.vtcr)
/* * With all the supported VA_BITs and 40bit guest IPA, the following condition diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 587ed62..28edee5 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -206,7 +206,7 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) if (phys_shift > KVM_PHYS_SHIFT) phys_shift = KVM_PHYS_SHIFT; vtcr |= VTCR_EL2_T0SZ(phys_shift); - vtcr |= VTCR_EL2_LVLS_TO_SL0(kvm_stage2_levels(kvm)); + vtcr |= VTCR_EL2_LVLS_TO_SL0(stage2_pgtable_levels(phys_shift));
/* * Enable the Hardware Access Flag management, unconditionally
From: Kristina Martsenko kristina.martsenko@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 8ad50c8985d805923f52a80698010a0a5123c07d category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Add support for handling 52bit guest physical address to the VGIC layer. So far we have limited the guest physical address to 48bits, by explicitly masking the upper bits. This patch removes the restriction. We do not have to check if the host supports 52bit as the gpa is always validated during an access. (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). Also, the ITS table save-restore is also not affected with the enhancement. The DTE entries already store the bits[51:8] of the ITT_addr (with a 256byte alignment).
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Kristina Martsenko kristina.martsenko@arm.com [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/linux/irqchip/arm-gic-v3.h | 5 +++++ virt/kvm/arm/vgic/vgic-its.c | 36 ++++++++++-------------------------- virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -- 3 files changed, 15 insertions(+), 28 deletions(-)
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index f17334d..16389db 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -357,6 +357,8 @@ #define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) #define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb)
+#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(51, 12)) + #define GITS_BASER_NR_REGS 8
#define GITS_BASER_VALID (1ULL << 63) @@ -388,6 +390,9 @@ #define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48) #define GITS_BASER_PHYS_52_to_48(phys) \ (((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12) +#define GITS_BASER_ADDR_48_to_52(baser) \ + (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48) + #define GITS_BASER_SHAREABILITY_SHIFT (10) #define GITS_BASER_InnerShareable \ GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 0dbe332..7a9f47e 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -241,13 +241,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id, list_for_each_entry(dev, &(its)->device_list, dev_list) \ list_for_each_entry(ite, &(dev)->itt_head, ite_list)
-/* - * We only implement 48 bits of PA at the moment, although the ITS - * supports more. Let's be restrictive here. - */ -#define BASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 16)) -#define CBASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 12)) - #define GIC_LPI_OFFSET 8192
#define VITS_TYPER_IDBITS 16 @@ -759,6 +752,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, { int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K; u64 indirect_ptr, type = GITS_BASER_TYPE(baser); + phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser); int esz = GITS_BASER_ENTRY_SIZE(baser); int index, idx; gfn_t gfn; @@ -784,7 +778,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (id >= (l1_tbl_size / esz)) return false;
- addr = BASER_ADDRESS(baser) + id * esz; + addr = base + id * esz; gfn = addr >> PAGE_SHIFT;
if (eaddr) @@ -800,7 +794,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
/* Each 1st level entry is represented by a 64-bit value. */ if (kvm_read_guest_lock(its->dev->kvm, - BASER_ADDRESS(baser) + index * sizeof(indirect_ptr), + base + index * sizeof(indirect_ptr), &indirect_ptr, sizeof(indirect_ptr))) return false;
@@ -810,11 +804,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (!(indirect_ptr & BIT_ULL(63))) return false;
- /* - * Mask the guest physical address and calculate the frame number. - * Any address beyond our supported 48 bits of PA will be caught - * by the actual check in the final step. - */ + /* Mask the guest physical address and calculate the frame number. */ indirect_ptr &= GENMASK_ULL(51, 16);
/* Find the address of the actual entry */ @@ -1311,9 +1301,6 @@ static u64 vgic_sanitise_its_baser(u64 reg) GITS_BASER_OUTER_CACHEABILITY_SHIFT, vgic_sanitise_outer_cacheability);
- /* Bits 15:12 contain bits 51:48 of the PA, which we don't support. */ - reg &= ~GENMASK_ULL(15, 12); - /* We support only one (ITS) page size: 64K */ reg = (reg & ~GITS_BASER_PAGE_SIZE_MASK) | GITS_BASER_PAGE_SIZE_64K;
@@ -1332,11 +1319,8 @@ static u64 vgic_sanitise_its_cbaser(u64 reg) GITS_CBASER_OUTER_CACHEABILITY_SHIFT, vgic_sanitise_outer_cacheability);
- /* - * Sanitise the physical address to be 64k aligned. - * Also limit the physical addresses to 48 bits. - */ - reg &= ~(GENMASK_ULL(51, 48) | GENMASK_ULL(15, 12)); + /* Sanitise the physical address to be 64k aligned. */ + reg &= ~GENMASK_ULL(15, 12);
return reg; } @@ -1382,7 +1366,7 @@ static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its) if (!its->enabled) return;
- cbaser = CBASER_ADDRESS(its->cbaser); + cbaser = GITS_CBASER_ADDRESS(its->cbaser);
while (its->cwriter != its->creadr) { int ret = kvm_read_guest_lock(kvm, cbaser + its->creadr, @@ -2241,7 +2225,7 @@ static int vgic_its_restore_device_tables(struct vgic_its *its) if (!(baser & GITS_BASER_VALID)) return 0;
- l1_gpa = BASER_ADDRESS(baser); + l1_gpa = GITS_BASER_ADDR_48_to_52(baser);
if (baser & GITS_BASER_INDIRECT) { l1_esz = GITS_LVL1_ENTRY_SIZE; @@ -2313,7 +2297,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its) { const struct vgic_its_abi *abi = vgic_its_get_abi(its); u64 baser = its->baser_coll_table; - gpa_t gpa = BASER_ADDRESS(baser); + gpa_t gpa = GITS_BASER_ADDR_48_to_52(baser); struct its_collection *collection; u64 val; size_t max_size, filled = 0; @@ -2362,7 +2346,7 @@ static int vgic_its_restore_collection_table(struct vgic_its *its) if (!(baser & GITS_BASER_VALID)) return 0;
- gpa = BASER_ADDRESS(baser); + gpa = GITS_BASER_ADDR_48_to_52(baser);
max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index a2a175b..b3d1f09 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -364,7 +364,6 @@ static u64 vgic_sanitise_pendbaser(u64 reg) vgic_sanitise_outer_cacheability);
reg &= ~PENDBASER_RES0_MASK; - reg &= ~GENMASK_ULL(51, 48);
return reg; } @@ -382,7 +381,6 @@ static u64 vgic_sanitise_propbaser(u64 reg) vgic_sanitise_outer_cacheability);
reg &= ~PROPBASER_RES0_MASK; - reg &= ~GENMASK_ULL(51, 48); return reg; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit bc1d7de8c550db2f8fd59458a07fefa863358b8d category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Add support for handling 52bit addresses in PAR to HPFAR conversion. Instead of hardcoding the address limits, we now use PHYS_MASK_SHIFT.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 7 +++++++ arch/arm64/kvm/hyp/switch.c | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index d172f7b..f94e15d 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -314,6 +314,13 @@
/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */ #define HPFAR_MASK (~UL(0xf)) +/* + * We have + * PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12] + * HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12] + */ +#define PAR_TO_HPFAR(par) \ + (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
#define kvm_arm_exception_type \ {0, "IRQ" }, \ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index d82af97..ce355b7 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -267,7 +267,7 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar) return false; /* Translation failed, back to guest */
/* Convert PAR to HPFAR format */ - *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4; + *hpfar = PAR_TO_HPFAR(tmp); return true; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 0f62f0e95be29200ab2ab98ca870e22c9b148dfa category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
So far we have restricted the IPA size of the VM to the default value (40bits). Now that we can manage the IPA size per VM and support dynamic stage2 page tables, we can allow VMs to have larger IPA. This patch introduces a the maximum IPA size supported on the host. This is decided by the following factors :
1) Maximum PARange supported by the CPUs - This can be inferred from the system wide safe value. 2) Maximum PA size supported by the host kernel (48 vs 52) 3) Number of levels in the host page table (as we base our stage2 tables on the host table helpers).
Since the stage2 page table code is dependent on the stage1 page table, we always ensure that :
Number of Levels at Stage1 >= Number of Levels at Stage2
So we limit the IPA to make sure that the above condition is satisfied. This will affect the following combinations of VA_BITS and IPA for different page sizes.
Host configuration | Unsupported IPA ranges 39bit VA, 4K | [44, 48] 36bit VA, 16K | [41, 48] 42bit VA, 64K | [47, 52]
Supporting the above combinations need independent stage2 page table manipulation code, which would need substantial changes. We could purse the solution independently and switch the page table code once we have it ready.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit 358b28f09f0a and e761a927bc9a upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_host.h | 12 +++-------- arch/arm64/kvm/reset.c | 43 +++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/arm.c | 2 ++ 4 files changed, 50 insertions(+), 9 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 04ff168..9b24f37 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -369,6 +369,8 @@ static inline int hyp_map_aux_data(void)
#define kvm_phys_to_vttbr(addr) (addr)
+static inline void kvm_set_ipa_limit(void) {} + #endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 73f264d..0a82418 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -465,15 +465,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
-static inline void __cpu_init_stage2(void) -{ - u32 ps; - - /* Sanity check for minimum IPA size support */ - ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7); - WARN_ONCE(ps < 40, - "PARange is %d bits, unsupported configuration!", ps); -} +static inline void __cpu_init_stage2(void) {}
/* Guest/host FPSIMD coordination helpers */ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu); @@ -551,6 +543,8 @@ static inline int kvm_arm_have_ssbd(void) void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu); void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
+void kvm_set_ipa_limit(void); + #define __KVM_HAVE_ARCH_VM_ALLOC struct kvm *kvm_arch_alloc_vm(void); void kvm_arch_free_vm(struct kvm *kvm); diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 28edee5..1b87a88 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -35,6 +35,9 @@ #include <asm/kvm_emulate.h> #include <asm/kvm_mmu.h>
+/* Maximum phys_shift supported for any VM on this host */ +static u32 kvm_ipa_limit; + /* * ARMv8 Reset Values */ @@ -181,6 +184,46 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) return ret; }
+void kvm_set_ipa_limit(void) +{ + unsigned int ipa_max, pa_max, va_max, parange; + + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7; + pa_max = id_aa64mmfr0_parange_to_phys_shift(parange); + + /* Clamp the IPA limit to the PA size supported by the kernel */ + ipa_max = (pa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : pa_max; + /* + * Since our stage2 table is dependent on the stage1 page table code, + * we must always honor the following condition: + * + * Number of levels in Stage1 >= Number of levels in Stage2. + * + * So clamp the ipa limit further down to limit the number of levels. + * Since we can concatenate upto 16 tables at entry level, we could + * go upto 4bits above the maximum VA addressible with the current + * number of levels. + */ + va_max = PGDIR_SHIFT + PAGE_SHIFT - 3; + va_max += 4; + + if (va_max < ipa_max) + ipa_max = va_max; + + /* + * If the final limit is lower than the real physical address + * limit of the CPUs, report the reason. + */ + if (ipa_max < pa_max) + pr_info("kvm: Limiting the IPA size due to kernel %s Address limit\n", + (va_max < pa_max) ? "Virtual" : "Physical"); + + WARN(ipa_max < KVM_PHYS_SHIFT, + "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max); + kvm_ipa_limit = ipa_max; + kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit); +} + /* * Configure the VTCR_EL2 for this VM. The VTCR value is common * across all the physical CPUs on the system. We use system wide diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 826e0ed..8171125 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1441,6 +1441,8 @@ static int init_common_resources(void) kvm_vmid_bits = kvm_get_vmid_bits(); kvm_info("%d-bit VMID\n", kvm_vmid_bits);
+ kvm_set_ipa_limit(); + return 0; }
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 58b3efc820acd3219e89ff014e93346a734229b8 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Since we are about to remove the lower limit on the IPA size, make sure that we do not go to 1 level page table (e.g, with 32bit IPA on 64K host with concatenation) to avoid splitting the host PMD huge pages at stage2.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/stage2_pgtable.h | 7 ++++++- arch/arm64/kvm/reset.c | 10 +++++++++- 2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index c62fe11..2cce769 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -72,8 +72,13 @@ /* * The number of PTRS across all concatenated stage2 tables given by the * number of bits resolved at the initial level. + * If we force more levels than necessary, we may have (stage2_pgdir_shift > IPA), + * in which case, stage2_pgd_ptrs will have one entry. */ -#define __s2_pgd_ptrs(ipa, lvls) (1 << ((ipa) - pt_levels_pgdir_shift((lvls)))) +#define pgd_ptrs_shift(ipa, pgdir_shift) \ + ((ipa) > (pgdir_shift) ? ((ipa) - (pgdir_shift)) : 0) +#define __s2_pgd_ptrs(ipa, lvls) \ + (1 << (pgd_ptrs_shift((ipa), pt_levels_pgdir_shift(lvls)))) #define __s2_pgd_size(ipa, lvls) (__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t))
#define stage2_pgd_ptrs(kvm) __s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 1b87a88..6e41f74 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -236,6 +236,7 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) { u64 vtcr = VTCR_EL2_FLAGS; u32 parange, phys_shift; + u8 lvls;
if (type) return -EINVAL; @@ -249,7 +250,14 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) if (phys_shift > KVM_PHYS_SHIFT) phys_shift = KVM_PHYS_SHIFT; vtcr |= VTCR_EL2_T0SZ(phys_shift); - vtcr |= VTCR_EL2_LVLS_TO_SL0(stage2_pgtable_levels(phys_shift)); + /* + * Use a minimum 2 level page table to prevent splitting + * host PMD huge pages at stage2. + */ + lvls = stage2_pgtable_levels(phys_shift); + if (lvls < 2) + lvls = 2; + vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
/* * Enable the Hardware Access Flag management, unconditionally
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v4.20-rc1 commit 233a7cb235318223df8133235383f4c595c654c1 category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
Allow specifying the physical address size limit for a new VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in bits[7:0] of the type field. For backward compatibility the value 0 is reserved and implies 40bits. Also, lift the limit of the IPA to host limit and allow lower IPA sizes (e.g, 32).
The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE for the availability of this feature. The cap check returns the maximum limit for the physical address shift supported by the host.
Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Cc: Peter Maydell peter.maydell@linaro.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- Documentation/virtual/kvm/api.txt | 31 +++++++++++++++++++++++++++++++ arch/arm64/include/asm/stage2_pgtable.h | 20 -------------------- arch/arm64/kvm/reset.c | 17 +++++++++++++---- include/uapi/linux/kvm.h | 10 ++++++++++ 4 files changed, 54 insertions(+), 24 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 8e16017f..4a8e699 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -129,6 +129,37 @@ memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the flag KVM_VM_MIPS_VZ.
+On arm64, the physical address size for a VM (IPA Size limit) is limited +to 40bits by default. The limit can be configured if the host supports the +extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use +KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type +identifier, where IPA_Bits is the maximum width of any physical +address used by the VM. The IPA_Bits is encoded in bits[7-0] of the +machine type identifier. + +e.g, to configure a guest to use 48bit physical address size : + + vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48)); + +The requested size (IPA_Bits) must be : + 0 - Implies default size, 40bits (for backward compatibility) + + or + + N - Implies N bits, where N is a positive integer such that, + 32 <= N <= Host_IPA_Limit + +Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and +is dependent on the CPU capability and the kernel configuration. The limit can +be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION +ioctl() at run-time. + +Please note that configuring the IPA size does not affect the capability +exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects +size of the address translated by the stage2 level (guest physical to +host physical address translations). + + 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 2cce769..d352f6d 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -42,28 +42,8 @@ * the range (IPA_SHIFT, IPA_SHIFT - 4). */ #define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4) -#define STAGE2_PGTABLE_LEVELS stage2_pgtable_levels(KVM_PHYS_SHIFT) #define kvm_stage2_levels(kvm) VTCR_EL2_LVLS(kvm->arch.vtcr)
-/* - * With all the supported VA_BITs and 40bit guest IPA, the following condition - * is always true: - * - * STAGE2_PGTABLE_LEVELS <= CONFIG_PGTABLE_LEVELS - * - * We base our stage-2 page table walker helpers on this assumption and - * fall back to using the host version of the helper wherever possible. - * i.e, if a particular level is not folded (e.g, PUD) at stage2, we fall back - * to using the host version, since it is guaranteed it is not folded at host. - * - * If the condition breaks in the future, we can rearrange the host level - * definitions and reuse them for stage2. Till then... - */ -#if STAGE2_PGTABLE_LEVELS > CONFIG_PGTABLE_LEVELS -#error "Unsupported combination of guest IPA and host VA_BITS." -#endif - - /* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */ #define stage2_pgdir_shift(kvm) pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) #define stage2_pgdir_size(kvm) (1ULL << stage2_pgdir_shift(kvm)) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 6e41f74..dfc4b06 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -90,6 +90,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_EVENTS: r = 1; break; + case KVM_CAP_ARM_VM_IPA_SIZE: + r = kvm_ipa_limit; + break; default: r = 0; } @@ -238,17 +241,23 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type) u32 parange, phys_shift; u8 lvls;
- if (type) + if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK) return -EINVAL;
+ phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type); + if (phys_shift) { + if (phys_shift > kvm_ipa_limit || + phys_shift < 32) + return -EINVAL; + } else { + phys_shift = KVM_PHYS_SHIFT; + } + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7; if (parange > ID_AA64MMFR0_PARANGE_MAX) parange = ID_AA64MMFR0_PARANGE_MAX; vtcr |= parange << VTCR_EL2_PS_SHIFT;
- phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); - if (phys_shift > KVM_PHYS_SHIFT) - phys_shift = KVM_PHYS_SHIFT; vtcr |= VTCR_EL2_T0SZ(phys_shift); /* * Use a minimum 2 level page table to prevent splitting diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 251be35..062e5fd 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,15 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1
/* + * On arm64, machine type can be used to request the physical + * address size for the VM. Bits[7-0] are reserved for the guest + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, 40bits. + */ +#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL +#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \ + ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK) +/* * ioctls for /dev/kvm fds: */ #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) @@ -953,6 +962,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_NESTED_STATE 157 #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 #define KVM_CAP_MSR_PLATFORM_INFO 159 +#define KVM_CAP_ARM_VM_IPA_SIZE 165 /* returns maximum IPA bits for a VM */
#ifdef KVM_CAP_IRQ_ROUTING
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v4.20-rc1 commit e4e11cc0f81ee7be17d6f6fb96128a6d51c0e838 category: bugfix
-------------------------------------------------
This commit adds a paranoid check when entering the guest to make sure we don't attempt running guest code in an equally or more privilged mode than the hypervisor. We also catch other accidental programming of the SPSR_EL2 which results in an illegal exception return and report this safely back to the user.
Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit c97962f2e66c hulk] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/include/asm/ptrace.h | 3 +++ arch/arm64/kvm/handle_exit.c | 7 +++++++ arch/arm64/kvm/hyp/hyp-entry.S | 16 +++++++++++++++- arch/arm64/kvm/hyp/sysreg-sr.c | 19 ++++++++++++++++++- 5 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 0b53c72..aea01a0 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -30,6 +30,7 @@ #define ARM_EXCEPTION_IRQ 0 #define ARM_EXCEPTION_EL1_SERROR 1 #define ARM_EXCEPTION_TRAP 2 +#define ARM_EXCEPTION_IL 3 /* The hyp-stub will return this for any kvm_call_hyp() call */ #define ARM_EXCEPTION_HYP_GONE HVC_STUB_ERR
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index fade1e5..98ea224 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -45,6 +45,9 @@ #define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80) #define GIC_PRIO_PSR_I_SET (1 << 4)
+/* Additional SPSR bits not exposed in the UABI */ +#define PSR_IL_BIT (1 << 20) + /* AArch32-specific ptrace requests */ #define COMPAT_PTRACE_GETREGS 12 #define COMPAT_PTRACE_SETREGS 13 diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index e5e741b..35a81beb 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -284,6 +284,13 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, */ run->exit_reason = KVM_EXIT_FAIL_ENTRY; return 0; + case ARM_EXCEPTION_IL: + /* + * We attempted an illegal exception return. Guest state must + * have been corrupted somehow. Give up. + */ + run->exit_reason = KVM_EXIT_FAIL_ENTRY; + return -EINVAL; default: kvm_pr_unimpl("Unsupported exception type: %d", exception_index); diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index 24b4fba..b1f14f7 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -162,6 +162,20 @@ el1_error: mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit
+el2_sync: + /* Check for illegal exception return, otherwise panic */ + mrs x0, spsr_el2 + + /* if this was something else, then panic! */ + tst x0, #PSR_IL_BIT + b.eq __hyp_panic + + /* Let's attempt a recovery from the illegal exception return */ + get_vcpu_ptr x1, x0 + mov x0, #ARM_EXCEPTION_IL + b __guest_exit + + el2_error: ldp x0, x1, [sp], #16
@@ -240,7 +254,7 @@ ENTRY(__kvm_hyp_vector) invalid_vect el2t_fiq_invalid // FIQ EL2t invalid_vect el2t_error_invalid // Error EL2t
- invalid_vect el2h_sync_invalid // Synchronous EL2h + valid_vect el2_sync // Synchronous EL2h invalid_vect el2h_irq_invalid // IRQ EL2h invalid_vect el2h_fiq_invalid // FIQ EL2h valid_vect el2_error // Error EL2h diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index 7414b76..b426e2c 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -155,8 +155,25 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) static void __hyp_text __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt) { + u64 pstate = ctxt->gp_regs.regs.pstate; + u64 mode = pstate & PSR_AA32_MODE_MASK; + + /* + * Safety check to ensure we're setting the CPU up to enter the guest + * in a less privileged mode. + * + * If we are attempting a return to EL2 or higher in AArch64 state, + * program SPSR_EL2 with M=EL2h and the IL bit set which ensures that + * we'll take an illegal exception state exception immediately after + * the ERET to the guest. Attempts to return to AArch32 Hyp will + * result in an illegal exception return because EL2's execution state + * is determined by SCR_EL3.RW. + */ + if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t) + pstate = PSR_MODE_EL2h | PSR_IL_BIT; + write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); - write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); + write_sysreg_el2(pstate, spsr);
if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
From: Mark Rutland mark.rutland@arm.com
mainline inclusion from mainline-v5.0-rc1 commit d1878af3a5a6ac00bc8a3edfecf80539ee9c546e category: feature feature: Log PSTATE for unhandled sysregs
-------------------------------------------------
When KVM traps an unhandled sysreg/coproc access from a guest, it logs the guest PC. To aid debugging, it would be helpful to know which exception level the trap came from, along with other PSTATE/CPSR bits, so let's log the PSTATE/CPSR too.
Acked-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/kvm/coproc.c | 4 ++-- arch/arm64/kvm/sys_regs.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 871fa50..42ab947 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -602,8 +602,8 @@ static int emulate_cp15(struct kvm_vcpu *vcpu, } } else { /* If access function fails, it should complain. */ - kvm_err("Unsupported guest CP15 access at: %08lx\n", - *vcpu_pc(vcpu)); + kvm_err("Unsupported guest CP15 access at: %08lx [%08lx]\n", + *vcpu_pc(vcpu), *vcpu_cpsr(vcpu)); print_cp_instr(params); kvm_inject_undefined(vcpu); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0c073f3..1c09683 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1912,8 +1912,8 @@ static void unhandled_cp_access(struct kvm_vcpu *vcpu, WARN_ON(1); }
- kvm_err("Unsupported guest CP%d access at: %08lx\n", - cp, *vcpu_pc(vcpu)); + kvm_err("Unsupported guest CP%d access at: %08lx [%08lx]\n", + cp, *vcpu_pc(vcpu), *vcpu_cpsr(vcpu)); print_sys_reg_instr(params); kvm_inject_undefined(vcpu); } @@ -2063,8 +2063,8 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu, if (likely(r)) { perform_access(vcpu, params, r); } else { - kvm_err("Unsupported guest sys_reg access at: %lx\n", - *vcpu_pc(vcpu)); + kvm_err("Unsupported guest sys_reg access at: %lx [%08lx]\n", + *vcpu_pc(vcpu), *vcpu_cpsr(vcpu)); print_sys_reg_instr(params); kvm_inject_undefined(vcpu); }
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 3f58bf63455588a260b6abc8761c48bc8977b919 category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages.
Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 19 deletions(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index aad4db4..58dc9b4 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1481,7 +1481,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, exec_fault, writable, force_pte = false; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1490,7 +1490,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0;
write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1510,11 +1510,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; }
- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { - hugetlb = true; + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { /* + * Fallback to PTE if it's not one of the Stage 2 + * supported hugepage sizes + */ + vma_pagesize = PAGE_SIZE; + + /* * Pages belonging to memslots that don't have the same * alignment for userspace and IPA cannot be mapped using * block descriptors even if the pages belong to a THP for @@ -1579,23 +1585,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock;
- if (!hugetlb && !force_pte) - hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); + if (vma_pagesize == PAGE_SIZE && !force_pte) { + /* + * Only PMD_SIZE transparent hugepages(THP) are + * currently supported. This code will need to be + * updated to support other THP sizes. + */ + if (transparent_hugepage_adjust(&pfn, &fault_ipa)) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn);
- if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE);
if (exec_fault) { new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) @@ -1608,16 +1624,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); }
- if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - if (exec_fault) { new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_guest_page(pfn, PAGE_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa))
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 6396b852e46e562f4742ed0a9042b537eb26b8aa category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved.
The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 58dc9b4..7fa7982 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1481,7 +1481,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, force_pte = false; + bool write_fault, writable, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1604,19 +1605,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) invalidate_icache_guest_page(pfn, vma_pagesize);
+ /* + * If we took an execution fault we have made the + * icache/dcache coherent above and should now let the s2 + * mapping be executable. + * + * Write faults (!exec_fault && FSC_PERM) are orthogonal to + * execute permissions, and we preserve whatever we have. + */ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd);
- if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - }
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { @@ -1627,13 +1634,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mark_page_dirty(kvm, gfn); }
- if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pte = kvm_s2pte_mkexec(new_pte); - }
ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags); }
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit f8df73388ee25b5e5f1d26249202e7126ca8139d category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry.
The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Acked-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Reviewed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 5 +++++ arch/arm64/include/asm/kvm_mmu.h | 5 +++++ virt/kvm/arm/mmu.c | 14 ++++++++------ 3 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 9b24f37..351d1a5 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -82,6 +82,11 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, #define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE) #define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; })
+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 48c71c8..2e3a4b9 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,6 +184,11 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE)
+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7fa7982..3bcebb1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -607,7 +607,7 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start, addr = start; do { pte = pte_offset_kernel(pmd, addr); - kvm_set_pte(pte, pfn_pte(pfn, prot)); + kvm_set_pte(pte, kvm_pfn_pte(pfn, prot)); get_page(virt_to_page(pte)); pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1202,7 +1202,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, pfn = __phys_to_pfn(pa);
for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { - pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE); + pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE);
if (writable) pte = kvm_s2pte_mkwrite(pte); @@ -1617,8 +1617,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
if (vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd);
@@ -1627,7 +1629,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); @@ -1884,7 +1886,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) * just like a translation fault and clean the cache to the PoC. */ clean_dcache_guest_page(pfn, PAGE_SIZE); - stage2_pte = pfn_pte(pfn, PAGE_S2); + stage2_pte = kvm_pfn_pte(pfn, PAGE_S2); handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte); }
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 4ea5af53114091e23a8fc279f25637e6c4e892c6 category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs.
Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 15 +++++++++++++++ arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++ virt/kvm/arm/mmu.c | 11 +++++++---- 3 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 351d1a5..0859139 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -87,6 +87,21 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+/* + * The following kvm_*pud*() functions are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + WARN_ON(1); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + WARN_ON(1); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 2e3a4b9..eb50ba4 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); }
+static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
#ifdef __PAGETABLE_PMD_FOLDED diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 3bcebb1..21b8bf2 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, do { next = stage2_pud_addr_end(kvm, addr, end); if (!stage2_pud_none(kvm, *pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(kvm, *pud)); - stage2_wp_pmds(kvm, pud, addr, next); + if (stage2_pud_huge(kvm, *pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(kvm, pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1392,7 +1395,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 86d1c55ea605025f78d026e7fc3a2bb4c3fc2d6a category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute.
Provide trivial implementations of arm32 helpers to allow sharing of code.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 6 ++++ arch/arm64/include/asm/kvm_mmu.h | 5 ++++ arch/arm64/include/asm/pgtable-hwdef.h | 2 ++ virt/kvm/arm/mmu.c | 53 ++++++++++++++++++++++++++++++---- 4 files changed, 61 insertions(+), 5 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 0859139..0281257 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; }
+static inline bool kvm_s2pud_exec(pud_t *pud) +{ + WARN_ON(1); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index eb50ba4..e07f5a1 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); }
+static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
#ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208ea..10ae592 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
+#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 21b8bf2..a8245db 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; }
-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep;
- pmdp = stage2_get_pmd(kvm, NULL, addr); + *pudpp = NULL; + *pmdpp = NULL; + *ptepp = NULL; + + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp)) + return false; + + if (stage2_pud_huge(kvm, *pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(kvm, pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false;
- if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + }
ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false;
- return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); }
static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit eb3f0624ea082def887acc79e97934e27d0188b7 category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered.
Provide trivial helpers for arm32 to allow sharing of code.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 9 +++++++++ arch/arm64/include/asm/kvm_mmu.h | 7 +++++++ arch/arm64/include/asm/pgtable.h | 6 ++++++ virt/kvm/arm/mmu.c | 22 +++++++++++----------- 4 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 0281257..05a34b8b 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -85,6 +85,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pud_pfn(pud) ({ WARN_ON(1); 0; }) + + #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
/* @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; }
+static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index e07f5a1..b848ee4 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -187,6 +187,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); }
+static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
#ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b17199c..fb735bd 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -296,6 +296,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); }
+static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -363,6 +368,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
+#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud))
#define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a8245db..6eb68ff 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1704,6 +1704,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1713,24 +1714,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
spin_lock(&vcpu->kvm->mmu_lock);
- pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte)) goto out;
- if (pmd_thp_or_huge(*pmd)) { /* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; - goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; }
- pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(&vcpu->kvm->mmu_lock); if (pfn_valid)
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 35a63966194dd994f44150f07398c62f8dca011e category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered.
Provide trivial helpers for arm32 to allow sharing code.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 6 ++++++ arch/arm64/include/asm/kvm_mmu.h | 5 +++++ arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 39 ++++++++++++++++++++------------------- 4 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 05a34b8b..6207cd3 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; }
+static inline bool kvm_s2pud_young(pud_t pud) +{ + WARN_ON(1); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b848ee4..0743d80 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); }
+static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
#ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index fb735bd..fd7bf39 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -368,6 +368,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
+#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud))
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 6eb68ff..59f4d21 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); }
+static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1938,42 +1943,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte)) return 0;
- if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); }
static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte)) return 0;
- if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte)) /* Just a page... */ + else return pte_young(*pte); - - return 0; }
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
From: Punit Agrawal punit.agrawal@arm.com
mainline inclusion from mainline-v5.0-rc1 commit b8e0ba7c8bea994011aff3b4c35256b180fab874 category: feature feature: Support PUD hugepage at stage 2
-------------------------------------------------
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages.
Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries.
Signed-off-by: Punit Agrawal punit.agrawal@arm.com Reviewed-by: Christoffer Dall christoffer.dall@arm.com Cc: Russell King linux@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_mmu.h | 20 +++++++ arch/arm/include/asm/stage2_pgtable.h | 5 ++ arch/arm64/include/asm/kvm_mmu.h | 16 +++++ arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 104 +++++++++++++++++++++++++++++++-- 6 files changed, 143 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 6207cd3..b713db7 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -84,11 +84,14 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0))
#define kvm_pud_pfn(pud) ({ WARN_ON(1); 0; })
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud) ( {WARN_ON(1); pud; })
/* * The following kvm_*pud*() functions are provided strictly to allow @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; }
+static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + WARN_ON(1); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + WARN_ON(1); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + WARN_ON(1); + return pud; +} + static inline bool kvm_s2pud_exec(pud_t *pud) { WARN_ON(1); diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index f6a7ea8..f901716 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -68,4 +68,9 @@ #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define stage2_pud_table_empty(kvm, pudp) false
+static inline bool kvm_stage2_has_pud(struct kvm *kvm) +{ + return false; +} + #endif /* __ARM_S2_PGTABLE_H_ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 0743d80..399228f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,12 +184,16 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE)
+#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) + #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)
#define kvm_pud_pfn(pud) pud_pfn(pud)
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud) pud_mkhuge(pud)
static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; }
+static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; }
+static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592..e327665 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
+#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
/* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index fd7bf39..6341ca2 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -372,6 +372,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud))
+#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys) __phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 59f4d21..160b566 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -115,6 +115,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) put_page(virt_to_page(pmd)); }
+/** + * stage2_dissolve_pud() - clear and flush huge PUD entry + * @kvm: pointer to kvm structure. + * @addr: IPA + * @pud: pud pointer for IPA + * + * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks all + * pages in the range dirty. + */ +static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp) +{ + if (!stage2_pud_huge(kvm, *pudp)) + return; + + stage2_pud_clear(kvm, pudp); + kvm_tlb_flush_vmid_ipa(kvm, addr); + put_page(virt_to_page(pudp)); +} + static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min, int max) { @@ -1022,7 +1041,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pmd_t *pmd;
pud = stage2_get_pud(kvm, cache, addr); - if (!pud) + if (!pud || stage2_pud_huge(kvm, *pud)) return NULL;
if (stage2_pud_none(kvm, *pud)) { @@ -1083,6 +1102,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; }
+static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, + phys_addr_t addr, const pud_t *new_pudp) +{ + pud_t *pudp, old_pud; + + pudp = stage2_get_pud(kvm, cache, addr); + VM_BUG_ON(!pudp); + + old_pud = *pudp; + + /* + * A large number of vcpus faulting on the same stage 2 entry, + * can lead to a refault due to the + * stage2_pud_clear()/tlb_flush(). Skip updating the page + * tables if there is no change. + */ + if (pud_val(old_pud) == pud_val(*new_pudp)) + return 0; + + if (stage2_pud_present(kvm, old_pud)) { + stage2_pud_clear(kvm, pudp); + kvm_tlb_flush_vmid_ipa(kvm, addr); + } else { + get_page(virt_to_page(pudp)); + } + + kvm_set_pud(pudp, *new_pudp); + return 0; +} + /* * stage2_get_leaf_entry - walk the stage2 VM page tables and return * true if a valid and present leaf-entry is found. A pointer to the @@ -1149,6 +1198,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, phys_addr_t addr, const pte_t *new_pte, unsigned long flags) { + pud_t *pud; pmd_t *pmd; pte_t *pte, old_pte; bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; @@ -1157,7 +1207,31 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, VM_BUG_ON(logging_active && !cache);
/* Create stage-2 page table mapping - Levels 0 and 1 */ - pmd = stage2_get_pmd(kvm, cache, addr); + pud = stage2_get_pud(kvm, cache, addr); + if (!pud) { + /* + * Ignore calls from kvm_set_spte_hva for unallocated + * address ranges. + */ + return 0; + } + + /* + * While dirty page logging - dissolve huge PUD, then continue + * on to allocate page. + */ + if (logging_active) + stage2_dissolve_pud(kvm, addr, pud); + + if (stage2_pud_none(kvm, *pud)) { + if (!cache) + return 0; /* ignore calls from kvm_set_spte_hva */ + pmd = mmu_memory_cache_alloc(cache); + stage2_pud_populate(kvm, pud, pmd); + get_page(virt_to_page(pud)); + } + + pmd = stage2_pmd_offset(kvm, pud, addr); if (!pmd) { /* * Ignore calls from kvm_set_spte_hva for unallocated @@ -1563,12 +1637,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, }
vma_pagesize = vma_kernel_pagesize(vma); - if (vma_pagesize == PMD_SIZE && !logging_active) { - gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; + /* + * PUD level may not exist for a VM but PMD is guaranteed to + * exist. + */ + if ((vma_pagesize == PMD_SIZE || + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) && + !logging_active) { + gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; } else { /* * Fallback to PTE if it's not one of the Stage 2 - * supported hugepage sizes + * supported hugepage sizes or the corresponding level + * doesn't exist */ vma_pagesize = PAGE_SIZE;
@@ -1667,7 +1748,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, needs_exec = exec_fault || (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
- if (vma_pagesize == PMD_SIZE) { + if (vma_pagesize == PUD_SIZE) { + pud_t new_pud = kvm_pfn_pud(pfn, mem_type); + + new_pud = kvm_pud_mkhuge(new_pud); + if (writable) + new_pud = kvm_s2pud_mkwrite(new_pud); + + if (needs_exec) + new_pud = kvm_s2pud_mkexec(new_pud); + + ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud); + } else if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
new_pmd = kvm_pmd_mkhuge(new_pmd);
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 6992195cc6c6c4d673a3266ae59cbeeb746d61af category: feature feature: Dynamic IPA and 52bit IPA support
-------------------------------------------------
In attempting to re-construct the logic for our stage 2 page table layout I found the reasoning in the comment explaining how we calculate the number of levels used for stage 2 page tables a bit backwards.
This commit attempts to clarify the comment, to make it slightly easier to read without having the Arm ARM open on the right page.
While we're at it, fixup a typo in a comment that was recently changed.
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/stage2_pgtable.h | 16 +++++++--------- virt/kvm/arm/mmu.c | 2 +- 2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index d352f6d..5412fa4 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -30,16 +30,14 @@ #define pt_levels_pgdir_shift(lvls) ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (lvls))
/* - * The hardware supports concatenation of up to 16 tables at stage2 entry level - * and we use the feature whenever possible. + * The hardware supports concatenation of up to 16 tables at stage2 entry + * level and we use the feature whenever possible, which means we resolve 4 + * additional bits of address at the entry level. * - * Now, the minimum number of bits resolved at any level is (PAGE_SHIFT - 3). - * On arm64, the smallest PAGE_SIZE supported is 4k, which means - * (PAGE_SHIFT - 3) > 4 holds for all page sizes. - * This implies, the total number of page table levels at stage2 expected - * by the hardware is actually the number of levels required for (IPA_SHIFT - 4) - * in normal translations(e.g, stage1), since we cannot have another level in - * the range (IPA_SHIFT, IPA_SHIFT - 4). + * This implies, the total number of page table levels required for + * IPA_SHIFT at stage2 expected by the hardware can be calculated using + * the same logic used for the (non-collapsable) stage1 page tables but for + * (IPA_SHIFT - 4). */ #define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4) #define kvm_stage2_levels(kvm) VTCR_EL2_LVLS(kvm->arch.vtcr) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 160b566..8ff66e6 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1356,7 +1356,7 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) struct page *page = pfn_to_page(pfn);
/* - * PageTransCompoungMap() returns true for THP and + * PageTransCompoundMap() returns true for THP and * hugetlbfs. Make sure the adjustment is done only for THP * pages. */
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 9009782a4937ad0f4a4be6945080d7fa77fa4092 category: feature feature: vgic check pending enhancement
-------------------------------------------------
When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well.
Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ.
Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index 4040a33..a998bf7 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -926,6 +926,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) struct vgic_irq *irq; bool pending = false; unsigned long flags; + struct vgic_vmcr vmcr;
if (!vcpu->kvm->arch.vgic.enabled) return false; @@ -933,11 +934,15 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last) return true;
+ vgic_get_vmcr(vcpu, &vmcr); + spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { spin_lock(&irq->irq_lock); - pending = irq_is_pending(irq) && irq->enabled; + pending = irq_is_pending(irq) && irq->enabled && + !irq->active && + irq->priority < vmcr.pmr; spin_unlock(&irq->irq_lock);
if (pending)
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 71a7e47f39a2fb971ef9934e98ba089581eff641 category: feature feature: kvm tracepoint enhancement
-------------------------------------------------
The kvm_exit tracepoint strangely always reported exits as being IRQs. This seems to be because either the __print_symbolic or the tracepoint macros use a variable named idx.
Take this chance to update the fields in the tracepoint to reflect the concepts in the arm64 architecture that we pass to the tracepoint and move the exception type table to the same location and header files as the exits code.
We also clear out the exception code to 0 for IRQ exits (which translates to UNKNOWN in text) to make it slighyly less confusing to parse the trace output.
Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_arm.h | 4 ---- arch/arm64/include/asm/kvm_asm.h | 6 ++++++ virt/kvm/arm/trace.h | 18 +++++++++--------- 3 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index f94e15d..cac7eb0 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -322,10 +322,6 @@ #define PAR_TO_HPFAR(par) \ (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
-#define kvm_arm_exception_type \ - {0, "IRQ" }, \ - {1, "TRAP" } - #define ECN(x) { ESR_ELx_EC_##x, #x }
#define kvm_arm_exception_class \ diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index aea01a0..b2e12c9 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -34,6 +34,12 @@ /* The hyp-stub will return this for any kvm_call_hyp() call */ #define ARM_EXCEPTION_HYP_GONE HVC_STUB_ERR
+#define kvm_arm_exception_type \ + {ARM_EXCEPTION_IRQ, "IRQ" }, \ + {ARM_EXCEPTION_EL1_SERROR, "SERROR" }, \ + {ARM_EXCEPTION_TRAP, "TRAP" }, \ + {ARM_EXCEPTION_HYP_GONE, "HYP_GONE" } + #ifndef __ASSEMBLY__
#include <linux/mm.h> diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h index 57b3ede..f21f04f 100644 --- a/virt/kvm/arm/trace.h +++ b/virt/kvm/arm/trace.h @@ -26,25 +26,25 @@ );
TRACE_EVENT(kvm_exit, - TP_PROTO(int idx, unsigned int exit_reason, unsigned long vcpu_pc), - TP_ARGS(idx, exit_reason, vcpu_pc), + TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc), + TP_ARGS(ret, esr_ec, vcpu_pc),
TP_STRUCT__entry( - __field( int, idx ) - __field( unsigned int, exit_reason ) + __field( int, ret ) + __field( unsigned int, esr_ec ) __field( unsigned long, vcpu_pc ) ),
TP_fast_assign( - __entry->idx = idx; - __entry->exit_reason = exit_reason; + __entry->ret = ARM_EXCEPTION_CODE(ret); + __entry->esr_ec = (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_TRAP) ? esr_ec : 0; __entry->vcpu_pc = vcpu_pc; ),
TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx", - __print_symbolic(__entry->idx, kvm_arm_exception_type), - __entry->exit_reason, - __print_symbolic(__entry->exit_reason, kvm_arm_exception_class), + __print_symbolic(__entry->ret, kvm_arm_exception_type), + __entry->esr_ec, + __print_symbolic(__entry->esr_ec, kvm_arm_exception_class), __entry->vcpu_pc) );
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 8a411b060f820140e9894239fde5819bb213bef0 category: feature feature: bg_timer enhancement
-------------------------------------------------
The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired.
Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_arch_timer.h | 4 ---- virt/kvm/arm/arch_timer.c | 34 +++++++--------------------------- 2 files changed, 7 insertions(+), 31 deletions(-)
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index 6502feb..3377135 100644 --- a/include/kvm/arm_arch_timer.h +++ b/include/kvm/arm_arch_timer.h @@ -21,7 +21,6 @@
#include <linux/clocksource.h> #include <linux/hrtimer.h> -#include <linux/workqueue.h>
struct arch_timer_context { /* Registers: control register, timer value */ @@ -52,9 +51,6 @@ struct arch_timer_cpu { /* Background timer used when the guest is not running */ struct hrtimer bg_timer;
- /* Work queued with the above timer expires */ - struct work_struct expired; - /* Physical timer emulation */ struct hrtimer phys_timer;
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 17cecc9..da261f5 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -70,11 +70,9 @@ static void soft_timer_start(struct hrtimer *hrt, u64 ns) HRTIMER_MODE_ABS); }
-static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work) +static void soft_timer_cancel(struct hrtimer *hrt) { hrtimer_cancel(hrt); - if (work) - cancel_work_sync(work); }
static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id) @@ -102,23 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id) return IRQ_HANDLED; }
-/* - * Work function for handling the backup timer that we schedule when a vcpu is - * no longer running, but had a timer programmed to fire in the future. - */ -static void kvm_timer_inject_irq_work(struct work_struct *work) -{ - struct kvm_vcpu *vcpu; - - vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired); - - /* - * If the vcpu is blocked we want to wake it up so that it will see - * the timer has expired when entering the guest. - */ - kvm_vcpu_wake_up(vcpu); -} - static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx) { u64 cval, now; @@ -188,7 +169,7 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt) return HRTIMER_RESTART; }
- schedule_work(&timer->expired); + kvm_vcpu_wake_up(vcpu); return HRTIMER_NORESTART; }
@@ -300,7 +281,7 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu) * then we also don't need a soft timer. */ if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) { - soft_timer_cancel(&timer->phys_timer, NULL); + soft_timer_cancel(&timer->phys_timer); return; }
@@ -426,7 +407,7 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
vtimer_restore_state(vcpu);
- soft_timer_cancel(&timer->bg_timer, &timer->expired); + soft_timer_cancel(&timer->bg_timer); }
static void set_cntvoff(u64 cntvoff) @@ -544,7 +525,7 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu) * In any case, we re-schedule the hrtimer for the physical timer when * coming back to the VCPU thread in kvm_timer_vcpu_load(). */ - soft_timer_cancel(&timer->phys_timer, NULL); + soft_timer_cancel(&timer->phys_timer);
/* * The kernel may decide to run userspace after calling vcpu_put, so @@ -637,7 +618,6 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu) update_vtimer_cntvoff(vcpu, kvm_phys_timer_read()); vcpu_ptimer(vcpu)->cntvoff = 0;
- INIT_WORK(&timer->expired, kvm_timer_inject_irq_work); hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); timer->bg_timer.function = kvm_bg_timer_expire;
@@ -794,8 +774,8 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu) struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
- soft_timer_cancel(&timer->bg_timer, &timer->expired); - soft_timer_cancel(&timer->phys_timer, NULL); + soft_timer_cancel(&timer->bg_timer); + soft_timer_cancel(&timer->phys_timer); kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq); }
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit da6f16662a6eba3de0d1a39f3b8913d38754bb2b category: feature feature: 'vcpu' parameter enhancement
-------------------------------------------------
vcpu_read_sys_reg should not be modifying the VCPU structure. Eventually, to handle EL2 sysregs for nested virtualization, we will call vcpu_read_sys_reg from places that have a const vcpu pointer, which will complain about the lack of the const modifier on the read path.
Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/kvm/sys_regs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 0a82418..effd524 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -331,7 +331,7 @@ struct kvm_vcpu_arch { */ #define __vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
-u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg); +u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg); void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
/* diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 1c09683..2cc90a1 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -76,7 +76,7 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu, return false; }
-u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg) +u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg) { if (!vcpu->arch.sysregs_loaded_on_cpu) goto immediate_read;
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 599d79dcd18fa0f88ae15b7042c9d6b011a678b2 category: feature feature: Add trapped system register access
-------------------------------------------------
We're pretty blind when it comes to system register tracing, and rely on the ESR value displayed by kvm_handle_sys, which isn't much.
Instead, let's add an actual name to the sysreg entries, so that we can finally print it as we're about to perform the access itself.
The new tracepoint is conveniently called kvm_sys_access.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kvm/sys_regs.c | 2 ++ arch/arm64/kvm/sys_regs.h | 4 ++++ arch/arm64/kvm/trace.h | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 41 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 2cc90a1..b76ecdc 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1850,6 +1850,8 @@ static void perform_access(struct kvm_vcpu *vcpu, struct sys_reg_params *params, const struct sys_reg_desc *r) { + trace_kvm_sys_access(*vcpu_pc(vcpu), params, r); + /* * Not having an accessor means that we have configured a trap * that we don't know how to handle. This certainly qualifies diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h index cd710f8..3b1bc7f 100644 --- a/arch/arm64/kvm/sys_regs.h +++ b/arch/arm64/kvm/sys_regs.h @@ -35,6 +35,9 @@ struct sys_reg_params { };
struct sys_reg_desc { + /* Sysreg string for debug */ + const char *name; + /* MRS/MSR instruction which accesses it. */ u8 Op0; u8 Op1; @@ -130,6 +133,7 @@ const struct sys_reg_desc *find_reg_by_id(u64 id, #define Op2(_x) .Op2 = _x
#define SYS_DESC(reg) \ + .name = #reg, \ Op0(sys_reg_Op0(reg)), Op1(sys_reg_Op1(reg)), \ CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)), \ Op2(sys_reg_Op2(reg)) diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h index 3b82fb1..eab91ad 100644 --- a/arch/arm64/kvm/trace.h +++ b/arch/arm64/kvm/trace.h @@ -3,6 +3,7 @@ #define _TRACE_ARM64_KVM_H
#include <linux/tracepoint.h> +#include "sys_regs.h"
#undef TRACE_SYSTEM #define TRACE_SYSTEM kvm @@ -152,6 +153,40 @@ TP_printk("HSR 0x%08lx", __entry->hsr) );
+TRACE_EVENT(kvm_sys_access, + TP_PROTO(unsigned long vcpu_pc, struct sys_reg_params *params, const struct sys_reg_desc *reg), + TP_ARGS(vcpu_pc, params, reg), + + TP_STRUCT__entry( + __field(unsigned long, vcpu_pc) + __field(bool, is_write) + __field(const char *, name) + __field(u8, Op0) + __field(u8, Op1) + __field(u8, CRn) + __field(u8, CRm) + __field(u8, Op2) + ), + + TP_fast_assign( + __entry->vcpu_pc = vcpu_pc; + __entry->is_write = params->is_write; + __entry->name = reg->name; + __entry->Op0 = reg->Op0; + __entry->Op0 = reg->Op0; + __entry->Op1 = reg->Op1; + __entry->CRn = reg->CRn; + __entry->CRm = reg->CRm; + __entry->Op2 = reg->Op2; + ), + + TP_printk("PC: %lx %s (%d,%d,%d,%d,%d) %s", + __entry->vcpu_pc, __entry->name ?: "UNKN", + __entry->Op0, __entry->Op1, __entry->CRn, + __entry->CRm, __entry->Op2, + __entry->is_write ? "write" : "read") +); + TRACE_EVENT(kvm_set_guest_debug, TP_PROTO(struct kvm_vcpu *vcpu, __u32 guest_debug), TP_ARGS(vcpu, guest_debug),
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 6794ad5443a2118243e13879f94228524a1a2b92 category: bugfix
-------------------------------------------------
There are two things we need to take care of when we create block mappings in the stage 2 page tables:
(1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset.
(2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose.
So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing.
This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP.
Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios.
Cc: Ralph Palutke ralph.palutke@fau.de Cc: Lukas Braun koomi@moshbit.net Reported-by: Lukas Braun koomi@moshbit.net Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts with commit 795a83714526 upstream] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 86 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 22 deletions(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 8ff66e6..5b4ff3a 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1601,6 +1601,63 @@ static void kvm_send_hwpoison_signal(unsigned long address, send_sig_info(SIGBUS, &info, current); }
+static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, + unsigned long hva) +{ + gpa_t gpa_start, gpa_end; + hva_t uaddr_start, uaddr_end; + size_t size; + + size = memslot->npages * PAGE_SIZE; + + gpa_start = memslot->base_gfn << PAGE_SHIFT; + gpa_end = gpa_start + size; + + uaddr_start = memslot->userspace_addr; + uaddr_end = uaddr_start + size; + + /* + * Pages belonging to memslots that don't have the same alignment + * within a PMD for userspace and IPA cannot be mapped with stage-2 + * PMD entries, because we'll end up mapping the wrong pages. + * + * Consider a layout like the following: + * + * memslot->userspace_addr: + * +-----+--------------------+--------------------+---+ + * |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz| + * +-----+--------------------+--------------------+---+ + * + * memslot->base_gfn << PAGE_SIZE: + * +---+--------------------+--------------------+-----+ + * |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz| + * +---+--------------------+--------------------+-----+ + * + * If we create those stage-2 PMDs, we'll end up with this incorrect + * mapping: + * d -> f + * e -> g + * f -> h + */ + if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) + return false; + + /* + * Next, let's make sure we're not trying to map anything not covered + * by the memslot. This means we have to prohibit PMD size mappings + * for the beginning and end of a non-PMD aligned and non-PMD sized + * memory slot (illustrated by the head and tail parts of the + * userspace view above containing pages 'abcde' and 'xyz', + * respectively). + * + * Note that it doesn't matter if we do the check using the + * userspace_addr or the base_gfn, as both are equally aligned (per + * the check above) and equally sized. + */ + return (hva & S2_PMD_MASK) >= uaddr_start && + (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -1627,6 +1684,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; }
+ if (!fault_supports_stage2_pmd_mappings(memslot, hva)) + force_pte = true; + + if (logging_active) + force_pte = true; + /* Let's check if we will get back a huge page backed by hugetlbfs */ down_read(¤t->mm->mmap_sem); vma = find_vma_intersection(current->mm, hva, hva + 1); @@ -1643,28 +1706,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ if ((vma_pagesize == PMD_SIZE || (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) && - !logging_active) { + !force_pte) { gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - } else { - /* - * Fallback to PTE if it's not one of the Stage 2 - * supported hugepage sizes or the corresponding level - * doesn't exist - */ - vma_pagesize = PAGE_SIZE; - - /* - * Pages belonging to memslots that don't have the same - * alignment for userspace and IPA cannot be mapped using - * block descriptors even if the pages belong to a THP for - * the process, because the stage-2 block descriptor will - * cover more than a single THP and we loose atomicity for - * unmapping, updates, and splits of the THP or other pages - * in the stage-2 block range. - */ - if ((memslot->userspace_addr & ~PMD_MASK) != - ((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK)) - force_pte = true; } up_read(¤t->mm->mmap_sem);
@@ -1703,7 +1746,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * should not be mapped with huge pages (it introduces churn * and performance degradation), so force a pte mapping. */ - force_pte = true; flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
/*
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 58466766cd35754a061414c0c93225db2962948e category: feature feature: Add ARM_EXCEPTION_IS_TRAP macro
-------------------------------------------------
32 and 64bit use different symbols to identify the traps. 32bit has a fine grained approach (prefetch abort, data abort and HVC), while 64bit is pretty happy with just "trap".
This has been fine so far, except that we now need to decode some of that in tracepoints that are common to both architectures.
Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols and make the tracepoint use it.
Acked-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_asm.h | 4 ++++ arch/arm64/include/asm/kvm_asm.h | 1 + virt/kvm/arm/trace.h | 2 +- 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index 231e87a..35491af 100644 --- a/arch/arm/include/asm/kvm_asm.h +++ b/arch/arm/include/asm/kvm_asm.h @@ -23,6 +23,10 @@
#define ARM_EXIT_WITH_ABORT_BIT 31 #define ARM_EXCEPTION_CODE(x) ((x) & ~(1U << ARM_EXIT_WITH_ABORT_BIT)) +#define ARM_EXCEPTION_IS_TRAP(x) \ + (ARM_EXCEPTION_CODE((x)) == ARM_EXCEPTION_PREF_ABORT || \ + ARM_EXCEPTION_CODE((x)) == ARM_EXCEPTION_DATA_ABORT || \ + ARM_EXCEPTION_CODE((x)) == ARM_EXCEPTION_HVC) #define ARM_ABORT_PENDING(x) !!((x) & (1U << ARM_EXIT_WITH_ABORT_BIT))
#define ARM_EXCEPTION_RESET 0 diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index b2e12c9..f5b79e9 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -25,6 +25,7 @@
#define ARM_EXIT_WITH_SERROR_BIT 31 #define ARM_EXCEPTION_CODE(x) ((x) & ~(1U << ARM_EXIT_WITH_SERROR_BIT)) +#define ARM_EXCEPTION_IS_TRAP(x) (ARM_EXCEPTION_CODE((x)) == ARM_EXCEPTION_TRAP) #define ARM_SERROR_PENDING(x) !!((x) & (1U << ARM_EXIT_WITH_SERROR_BIT))
#define ARM_EXCEPTION_IRQ 0 diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h index f21f04f..3828bea 100644 --- a/virt/kvm/arm/trace.h +++ b/virt/kvm/arm/trace.h @@ -37,7 +37,7 @@
TP_fast_assign( __entry->ret = ARM_EXCEPTION_CODE(ret); - __entry->esr_ec = (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_TRAP) ? esr_ec : 0; + __entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ? esr_ec : 0; __entry->vcpu_pc = vcpu_pc; ),
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.0-rc1 commit 8c33df1afd86c611da8473dc6fc5f3af3dabe984 category: bugfix
-------------------------------------------------
They were missing, and it turns out that we do need them now.
Acked-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/stage2_pgtable.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index f901716..c4b1d4f 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -73,4 +73,7 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm) return false; }
+#define S2_PMD_MASK PMD_MASK +#define S2_PMD_SIZE PMD_SIZE + #endif /* __ARM_S2_PGTABLE_H_ */
From: Julien Thierry julien.thierry@arm.com
mainline inclusion from mainline-v5.0-rc6 commit 8fa3adb8c6beee4af079ac90b9575ab92951de3f category: feature feature: vgic lock enhancement
-------------------------------------------------
vgic_irq->irq_lock must always be taken with interrupts disabled as it is used in interrupt context.
For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: Julien Thierry julien.thierry@arm.com Acked-by: Christoffer Dall christoffer.dall@arm.com Acked-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_vgic.h | 2 +- virt/kvm/arm/vgic/vgic-debug.c | 4 +-- virt/kvm/arm/vgic/vgic-init.c | 4 +-- virt/kvm/arm/vgic/vgic-its.c | 14 ++++---- virt/kvm/arm/vgic/vgic-mmio-v2.c | 14 ++++---- virt/kvm/arm/vgic/vgic-mmio-v3.c | 12 +++---- virt/kvm/arm/vgic/vgic-mmio.c | 34 +++++++++---------- virt/kvm/arm/vgic/vgic-v2.c | 4 +-- virt/kvm/arm/vgic/vgic-v3.c | 8 ++--- virt/kvm/arm/vgic/vgic.c | 71 ++++++++++++++++++++-------------------- 10 files changed, 83 insertions(+), 84 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 561fefc..97fd4be 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -100,7 +100,7 @@ enum vgic_irq_config { };
struct vgic_irq { - spinlock_t irq_lock; /* Protects the content of the struct */ + raw_spinlock_t irq_lock; /* Protects the content of the struct */ struct list_head lpi_list; /* Used to link all LPIs together */ struct list_head ap_list;
diff --git a/virt/kvm/arm/vgic/vgic-debug.c b/virt/kvm/arm/vgic/vgic-debug.c index 07aa900..1f62f2b 100644 --- a/virt/kvm/arm/vgic/vgic-debug.c +++ b/virt/kvm/arm/vgic/vgic-debug.c @@ -251,9 +251,9 @@ static int vgic_debug_show(struct seq_file *s, void *v) return 0; }
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); print_irq_state(s, irq, vcpu); - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
vgic_put_irq(kvm, irq); return 0; diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index cd75df2..78013a5 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -172,7 +172,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
irq->intid = i + VGIC_NR_PRIVATE_IRQS; INIT_LIST_HEAD(&irq->ap_list); - spin_lock_init(&irq->irq_lock); + raw_spin_lock_init(&irq->irq_lock); irq->vcpu = NULL; irq->target_vcpu = vcpu0; kref_init(&irq->refcount); @@ -223,7 +223,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) struct vgic_irq *irq = &vgic_cpu->private_irqs[i];
INIT_LIST_HEAD(&irq->ap_list); - spin_lock_init(&irq->irq_lock); + raw_spin_lock_init(&irq->irq_lock); irq->intid = i; irq->vcpu = NULL; irq->target_vcpu = vcpu; diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 7a9f47e..ffc1e40 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -65,7 +65,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
INIT_LIST_HEAD(&irq->lpi_list); INIT_LIST_HEAD(&irq->ap_list); - spin_lock_init(&irq->irq_lock); + raw_spin_lock_init(&irq->irq_lock);
irq->config = VGIC_CONFIG_EDGE; kref_init(&irq->refcount); @@ -287,7 +287,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq, if (ret) return ret;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
if (!filter_vcpu || filter_vcpu == irq->target_vcpu) { irq->priority = LPI_PROP_PRIORITY(prop); @@ -299,7 +299,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq, } }
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
if (irq->hw) return its_prop_update_vlpi(irq->host_irq, prop, needs_inv); @@ -352,9 +352,9 @@ static int update_affinity(struct vgic_irq *irq, struct kvm_vcpu *vcpu) int ret = 0; unsigned long flags;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->target_vcpu = vcpu; - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
if (irq->hw) { struct its_vlpi_map map; @@ -455,7 +455,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu) }
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]); - spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->pending_latch = pendmask & (1U << bit_nr); vgic_queue_irq_unlock(vcpu->kvm, irq, flags); vgic_put_irq(vcpu->kvm, irq); @@ -612,7 +612,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its, return irq_set_irqchip_state(irq->host_irq, IRQCHIP_STATE_PENDING, true);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->pending_latch = true; vgic_queue_irq_unlock(kvm, irq, flags);
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c index 738b65d..b535fff 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c @@ -147,7 +147,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->pending_latch = true; irq->source |= 1U << source_vcpu->vcpu_id;
@@ -191,13 +191,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu, struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i); int target;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->targets = (val >> (i * 8)) & cpu_mask; target = irq->targets ? __ffs(irq->targets) : 0; irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); } } @@ -230,13 +230,13 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu, for (i = 0; i < len; i++) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->source &= ~((val >> (i * 8)) & 0xff); if (!irq->source) irq->pending_latch = false;
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); } } @@ -252,7 +252,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu, for (i = 0; i < len; i++) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->source |= (val >> (i * 8)) & 0xff;
@@ -260,7 +260,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu, irq->pending_latch = true; vgic_queue_irq_unlock(vcpu->kvm, irq, flags); } else { - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); } vgic_put_irq(vcpu->kvm, irq); } diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index b3d1f09..4a12322 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -169,13 +169,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu, if (!irq) return;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
/* We only care about and preserve Aff0, Aff1 and Aff2. */ irq->mpidr = val & GENMASK(23, 0); irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); }
@@ -281,7 +281,7 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu, for (i = 0; i < len * 8; i++) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); if (test_bit(i, &val)) { /* * pending_latch is set irrespective of irq type @@ -292,7 +292,7 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu, vgic_queue_irq_unlock(vcpu->kvm, irq, flags); } else { irq->pending_latch = false; - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); }
vgic_put_irq(vcpu->kvm, irq); @@ -957,7 +957,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
/* * An access targetting Group0 SGIs can only generate @@ -968,7 +968,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1) irq->pending_latch = true; vgic_queue_irq_unlock(vcpu->kvm, irq, flags); } else { - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); }
vgic_put_irq(vcpu->kvm, irq); diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c index 762f819..35d76e4 100644 --- a/virt/kvm/arm/vgic/vgic-mmio.c +++ b/virt/kvm/arm/vgic/vgic-mmio.c @@ -77,7 +77,7 @@ void vgic_mmio_write_group(struct kvm_vcpu *vcpu, gpa_t addr, for (i = 0; i < len * 8; i++) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->group = !!(val & BIT(i)); vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
@@ -120,7 +120,7 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu, for_each_set_bit(i, &val, len * 8) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->enabled = true; vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
@@ -139,11 +139,11 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu, for_each_set_bit(i, &val, len * 8) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->enabled = false;
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); } } @@ -160,10 +160,10 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu, struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); unsigned long flags;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); if (irq_is_pending(irq)) value |= (1U << i); - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
vgic_put_irq(vcpu->kvm, irq); } @@ -227,7 +227,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu, continue; }
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); if (irq->hw) vgic_hw_irq_spending(vcpu, irq, is_uaccess); else @@ -280,14 +280,14 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu, continue; }
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
if (irq->hw) vgic_hw_irq_cpending(vcpu, irq, is_uaccess); else irq->pending_latch = false;
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); } } @@ -329,7 +329,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq, unsigned long flags; struct kvm_vcpu *requester_vcpu = vgic_get_mmio_requester_vcpu();
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
if (irq->hw) { vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu); @@ -360,7 +360,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq, if (irq->active) vgic_queue_irq_unlock(vcpu->kvm, irq, flags); else - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); }
/* @@ -503,10 +503,10 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu, for (i = 0; i < len; i++) { struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); /* Narrow the priority range to what we actually support */ irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS); - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
vgic_put_irq(vcpu->kvm, irq); } @@ -552,14 +552,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu, continue;
irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); - spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
if (test_bit(i * 2 + 1, &val)) irq->config = VGIC_CONFIG_EDGE; else irq->config = VGIC_CONFIG_LEVEL;
- spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq); } } @@ -608,12 +608,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid, * restore irq config before line level. */ new_level = !!(val & (1U << i)); - spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->line_level = new_level; if (new_level) vgic_queue_irq_unlock(vcpu->kvm, irq, flags); else - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
vgic_put_irq(vcpu->kvm, irq); } diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index 91b14df..25be282 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -84,7 +84,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
- spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
/* Always preserve the active bit */ irq->active = !!(val & GICH_LR_ACTIVE_BIT); @@ -127,7 +127,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) vgic_irq_set_phys_active(irq, false); }
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); }
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 62a3a2f..30b3b82 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -76,7 +76,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) if (!irq) /* An LPI could have been unmapped. */ continue;
- spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
/* Always preserve the active bit */ irq->active = !!(val & ICH_LR_ACTIVE_BIT); @@ -119,7 +119,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) vgic_irq_set_phys_active(irq, false); }
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); }
@@ -350,9 +350,9 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
status = val & (1 << bit_nr);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); if (irq->target_vcpu != vcpu) { - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); goto retry; } irq->pending_latch = status; diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index a998bf7..a56839d 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -251,8 +251,8 @@ static int vgic_irq_cmp(void *priv, struct list_head *a, struct list_head *b) if (unlikely(irqa == irqb)) return 0;
- spin_lock(&irqa->irq_lock); - spin_lock_nested(&irqb->irq_lock, SINGLE_DEPTH_NESTING); + raw_spin_lock(&irqa->irq_lock); + raw_spin_lock_nested(&irqb->irq_lock, SINGLE_DEPTH_NESTING);
if (irqa->active || irqb->active) { ret = (int)irqb->active - (int)irqa->active; @@ -270,8 +270,8 @@ static int vgic_irq_cmp(void *priv, struct list_head *a, struct list_head *b) /* Both pending and enabled, sort by priority */ ret = irqa->priority - irqb->priority; out: - spin_unlock(&irqb->irq_lock); - spin_unlock(&irqa->irq_lock); + raw_spin_unlock(&irqb->irq_lock); + raw_spin_unlock(&irqa->irq_lock); return ret; }
@@ -332,7 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, * not need to be inserted into an ap_list and there is also * no more work for us to do. */ - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
/* * We have to kick the VCPU here, because we could be @@ -354,12 +354,12 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, * We must unlock the irq lock to take the ap_list_lock where * we are going to insert this new pending interrupt. */ - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
/* someone can do stuff here, which we re-check below */
spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags); - spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
/* * Did something change behind our backs? @@ -374,10 +374,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, */
if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) { - spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); goto retry; }
@@ -389,7 +389,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head); irq->vcpu = vcpu;
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); @@ -437,11 +437,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, if (!irq) return -EINVAL;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags);
if (!vgic_validate_injection(irq, level, owner)) { /* Nothing to see here, move along... */ - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(kvm, irq); return 0; } @@ -501,9 +501,9 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
BUG_ON(!irq);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); ret = kvm_vgic_map_irq(vcpu, irq, host_irq, get_input_level); - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq);
return ret; @@ -526,11 +526,11 @@ void kvm_vgic_reset_mapped_irq(struct kvm_vcpu *vcpu, u32 vintid) if (!irq->hw) goto out;
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->active = false; irq->pending_latch = false; irq->line_level = false; - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); out: vgic_put_irq(vcpu->kvm, irq); } @@ -546,9 +546,9 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid) irq = vgic_get_irq(vcpu->kvm, vcpu, vintid); BUG_ON(!irq);
- spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); kvm_vgic_unmap_irq(irq); - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq);
return 0; @@ -578,12 +578,12 @@ int kvm_vgic_set_owner(struct kvm_vcpu *vcpu, unsigned int intid, void *owner) return -EINVAL;
irq = vgic_get_irq(vcpu->kvm, vcpu, intid); - spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); if (irq->owner && irq->owner != owner) ret = -EEXIST; else irq->owner = owner; - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
return ret; } @@ -610,7 +610,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB; bool target_vcpu_needs_kick = false;
- spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
BUG_ON(vcpu != irq->vcpu);
@@ -623,7 +623,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) */ list_del(&irq->ap_list); irq->vcpu = NULL; - spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock);
/* * This vgic_put_irq call matches the @@ -638,13 +638,13 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
if (target_vcpu == vcpu) { /* We're on the right CPU */ - spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); continue; }
/* This interrupt looks like it has to be migrated. */
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); spin_unlock(&vgic_cpu->ap_list_lock);
/* @@ -662,7 +662,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock); spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock, SINGLE_DEPTH_NESTING); - spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
/* * If the affinity has been preserved, move the @@ -682,7 +682,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) target_vcpu_needs_kick = true; }
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock); spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock); spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
@@ -748,10 +748,10 @@ static int compute_ap_list_depth(struct kvm_vcpu *vcpu, list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { int w;
- spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock); /* GICv2 SGIs can count for more than one... */ w = vgic_irq_get_lr_count(irq); - spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock);
count += w; *multi_sgi |= (w > 1); @@ -777,7 +777,7 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu) count = 0;
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { - spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock);
/* * If we have multi-SGIs in the pipeline, we need to @@ -787,7 +787,7 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu) * the AP list has been sorted already. */ if (multi_sgi && irq->priority > prio) { - spin_unlock(&irq->irq_lock); + _raw_spin_unlock(&irq->irq_lock); break; }
@@ -798,7 +798,7 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu) prio = irq->priority; }
- spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock);
if (count == kvm_vgic_global_state.nr_lr) { if (!list_is_last(&irq->ap_list, @@ -939,11 +939,11 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { - spin_lock(&irq->irq_lock); + raw_spin_lock(&irq->irq_lock); pending = irq_is_pending(irq) && irq->enabled && !irq->active && irq->priority < vmcr.pmr; - spin_unlock(&irq->irq_lock); + raw_spin_unlock(&irq->irq_lock);
if (pending) break; @@ -981,11 +981,10 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid) return false;
irq = vgic_get_irq(vcpu->kvm, vcpu, vintid); - spin_lock_irqsave(&irq->irq_lock, flags); + raw_spin_lock_irqsave(&irq->irq_lock, flags); map_is_active = irq->hw && irq->active; - spin_unlock_irqrestore(&irq->irq_lock, flags); + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(vcpu->kvm, irq);
return map_is_active; } -
From: Julien Thierry julien.thierry@arm.com
mainline inclusion from mainline-v5.0-rc6 commit e08d8d296079e8fd7eefd53f73dcafebd3a5bf9f category: feature feature: vgic lock enhancement
-------------------------------------------------
vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context.
For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: Julien Thierry julien.thierry@arm.com Acked-by: Christoffer Dall christoffer.dall@arm.com Acked-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_vgic.h | 2 +- virt/kvm/arm/vgic/vgic-init.c | 2 +- virt/kvm/arm/vgic/vgic.c | 37 +++++++++++++++++++------------------ 3 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 97fd4be..3272869 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -307,7 +307,7 @@ struct vgic_cpu { unsigned int used_lrs; struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
- spinlock_t ap_list_lock; /* Protects the ap_list */ + raw_spinlock_t ap_list_lock; /* Protects the ap_list */
/* * List of IRQs that this VCPU should consider because they are either diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 78013a5..54a6ee6 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -213,7 +213,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) vgic_cpu->sgi_iodev.base_addr = VGIC_ADDR_UNDEF;
INIT_LIST_HEAD(&vgic_cpu->ap_list_head); - spin_lock_init(&vgic_cpu->ap_list_lock); + raw_spin_lock_init(&vgic_cpu->ap_list_lock);
/* * Enable and configure all SGIs to be edge-triggered and diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index a56839d..16b9ae1 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -54,11 +54,11 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = { * When taking more than one ap_list_lock at the same time, always take the * lowest numbered VCPU's ap_list_lock first, so: * vcpuX->vcpu_id < vcpuY->vcpu_id: - * spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock); - * spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock); + * raw_spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock); + * raw_spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock); * * Since the VGIC must support injecting virtual interrupts from ISRs, we have - * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer + * to use the raw_spin_lock_irqsave/raw_spin_unlock_irqrestore versions of outer * spinlocks for any lock that may be taken while injecting an interrupt. */
@@ -358,7 +358,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
/* someone can do stuff here, which we re-check below */
- spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags); + raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags); raw_spin_lock(&irq->irq_lock);
/* @@ -375,7 +375,8 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) { raw_spin_unlock(&irq->irq_lock); - spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags); + raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, + flags);
raw_spin_lock_irqsave(&irq->irq_lock, flags); goto retry; @@ -390,7 +391,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, irq->vcpu = vcpu;
raw_spin_unlock(&irq->irq_lock); - spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags); + raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); kvm_vcpu_kick(vcpu); @@ -604,7 +605,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
retry: - spin_lock(&vgic_cpu->ap_list_lock); + raw_spin_lock(&vgic_cpu->ap_list_lock);
list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) { struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB; @@ -645,7 +646,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) /* This interrupt looks like it has to be migrated. */
raw_spin_unlock(&irq->irq_lock); - spin_unlock(&vgic_cpu->ap_list_lock); + raw_spin_unlock(&vgic_cpu->ap_list_lock);
/* * Ensure locking order by always locking the smallest @@ -659,9 +660,9 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) vcpuB = vcpu; }
- spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock); - spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock, - SINGLE_DEPTH_NESTING); + raw_spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock); + raw_spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock, + SINGLE_DEPTH_NESTING); raw_spin_lock(&irq->irq_lock);
/* @@ -683,8 +684,8 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) }
raw_spin_unlock(&irq->irq_lock); - spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock); - spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock); + raw_spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock); + raw_spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
if (target_vcpu_needs_kick) { kvm_make_request(KVM_REQ_IRQ_PENDING, target_vcpu); @@ -694,7 +695,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) goto retry; }
- spin_unlock(&vgic_cpu->ap_list_lock); + raw_spin_unlock(&vgic_cpu->ap_list_lock); }
static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu) @@ -879,9 +880,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
- spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); + raw_spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); vgic_flush_lr_state(vcpu); - spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); + raw_spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
if (can_access_vgic_from_kernel()) vgic_restore_state(vcpu); @@ -936,7 +937,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
vgic_get_vmcr(vcpu, &vmcr);
- spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags); + raw_spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { raw_spin_lock(&irq->irq_lock); @@ -949,7 +950,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu) break; }
- spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags); + raw_spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
return pending; }
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.0-rc6 commit 309a205688060fbb000e9402078cf53cebde0793 category: bugfix
-------------------------------------------------
Fixup 32bit by providing the now required helper.
Cc: Suzuki Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/stage2_pgtable.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index c4b1d4f..de20895 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -76,4 +76,9 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm) #define S2_PMD_MASK PMD_MASK #define S2_PMD_SIZE PMD_SIZE
+static inline bool kvm_stage2_has_pmd(struct kvm *kvm) +{ + return true; +} + #endif /* __ARM_S2_PGTABLE_H_ */
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v5.0-rc6 commit 280cebfd05c8e381a392c662006dfaa6377feefc category: bugfix
-------------------------------------------------
We restrict mapping the PUD huge pages in stage2 to only when the stage2 has 4 level page table, leaving the feature unused with the default IPA size. But we could use it even with a 3 level page table, i.e, when the PUD level is folded into PGD, just like the stage1. Relax the condition to allow using the PUD huge page mappings at stage2 when it is possible.
Cc: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5b4ff3a..4a9177d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1701,11 +1701,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
vma_pagesize = vma_kernel_pagesize(vma); /* - * PUD level may not exist for a VM but PMD is guaranteed to - * exist. + * The stage2 has a minimum of 2 level table (For arm64 see + * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can + * use PMD_SIZE huge mappings (even when the PMD is folded into PGD). + * As for PUD huge maps, we must make sure that we have at least + * 3 levels, i.e, PMD is not folded. */ if ((vma_pagesize == PMD_SIZE || - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) && + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && !force_pte) { gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; }
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.1-rc1 commit 32f139551954512bfdf9d558341af453bb8b12b4 category: feature feature: MPIDR save/restore enhancement
-------------------------------------------------
We currently eagerly save/restore MPIDR. It turns out to be slightly pointless: - On the host, this value is known as soon as we're scheduled on a physical CPU - In the guest, this value cannot change, as it is set by KVM (and this is a read-only register)
The result of the above is that we can perfectly avoid the eager saving of MPIDR_EL1, and only keep the restore. We just have to setup the host contexts appropriately at boot time.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Acked-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com [yu: resolve conflicts] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_host.h | 8 ++++++++ arch/arm/kvm/hyp/cp15-sr.c | 1 - arch/arm64/include/asm/kvm_host.h | 9 +++++++++ arch/arm64/kvm/hyp/sysreg-sr.c | 1 - virt/kvm/arm/arm.c | 1 + 5 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 6b5a519..4228115 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_mmio.h> #include <asm/fpstate.h> +#include <asm/smp_plat.h> #include <kvm/arm_arch_timer.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED @@ -148,6 +149,13 @@ struct kvm_cpu_context {
typedef struct kvm_cpu_context kvm_cpu_context_t;
+static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, + int cpu) +{ + /* The host's MPIDR is immutable, so let's set it up at boot time */ + cpu_ctxt->cp15[c0_MPIDR] = cpu_logical_map(cpu); +} + struct vcpu_reset_state { unsigned long pc; unsigned long r0; diff --git a/arch/arm/kvm/hyp/cp15-sr.c b/arch/arm/kvm/hyp/cp15-sr.c index c478281..8bf895e 100644 --- a/arch/arm/kvm/hyp/cp15-sr.c +++ b/arch/arm/kvm/hyp/cp15-sr.c @@ -27,7 +27,6 @@ static u64 *cp15_64(struct kvm_cpu_context *ctxt, int idx)
void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) { - ctxt->cp15[c0_MPIDR] = read_sysreg(VMPIDR); ctxt->cp15[c0_CSSELR] = read_sysreg(CSSELR); ctxt->cp15[c1_SCTLR] = read_sysreg(SCTLR); ctxt->cp15[c1_CPACR] = read_sysreg(CPACR); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index effd524..d406292 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -31,6 +31,7 @@ #include <asm/kvm.h> #include <asm/kvm_asm.h> #include <asm/kvm_mmio.h> +#include <asm/smp_plat.h> #include <asm/thread_info.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED @@ -403,6 +404,14 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
void __kvm_enable_ssbs(void);
+static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, + int cpu) +{ + /* The host's MPIDR is immutable, so let's set it up at boot time */ + cpu_ctxt->sys_regs[MPIDR_EL1] = cpu_logical_map(cpu); +} + + static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, unsigned long hyp_stack_ptr, unsigned long vector_ptr) diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index b426e2c..c52a845 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -53,7 +53,6 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) { - ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2); ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1); ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr); ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 8171125..1a5d0f2 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1581,6 +1581,7 @@ static int init_hyp_mode(void) kvm_cpu_context_t *cpu_ctxt;
cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); + kvm_init_host_cpu_context(cpu_ctxt, cpu); err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
if (err) {
From: Christoffer Dall christoffer.dall@arm.com
mainline inclusion from mainline-v5.1-rc1 commit accb99bcd0ca6d3ee412557b0c3f583a3abc0eb6 category: feature feature: bg_timer programming enhancement
-------------------------------------------------
Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path.
This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded.
Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking.
Reported-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_arch_timer.h | 3 --- virt/kvm/arm/arch_timer.c | 35 ++++++++++++++--------------------- virt/kvm/arm/arm.c | 2 -- 3 files changed, 14 insertions(+), 26 deletions(-)
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index 3377135..d6e6a45 100644 --- a/include/kvm/arm_arch_timer.h +++ b/include/kvm/arm_arch_timer.h @@ -76,9 +76,6 @@ struct arch_timer_cpu {
bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
-void kvm_timer_schedule(struct kvm_vcpu *vcpu); -void kvm_timer_unschedule(struct kvm_vcpu *vcpu); - u64 kvm_phys_timer_read(void);
void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index da261f5..0f8cfc9 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -349,22 +349,12 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu) * thread is removed from its waitqueue and made runnable when there's a timer * interrupt to handle. */ -void kvm_timer_schedule(struct kvm_vcpu *vcpu) +static void kvm_timer_blocking(struct kvm_vcpu *vcpu) { struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; struct arch_timer_context *vtimer = vcpu_vtimer(vcpu); struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
- vtimer_save_state(vcpu); - - /* - * No need to schedule a background timer if any guest timer has - * already expired, because kvm_vcpu_block will return before putting - * the thread to sleep. - */ - if (kvm_timer_should_fire(vtimer) || kvm_timer_should_fire(ptimer)) - return; - /* * If both timers are not capable of raising interrupts (disabled or * masked), then there's no more work for us to do. @@ -373,12 +363,19 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu) return;
/* - * The guest timers have not yet expired, schedule a background timer. + * At least one guest time will expire. Schedule a background timer. * Set the earliest expiration time among the guest timers. */ soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu)); }
+static void kvm_timer_unblocking(struct kvm_vcpu *vcpu) +{ + struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; + + soft_timer_cancel(&timer->bg_timer); +} + static void vtimer_restore_state(struct kvm_vcpu *vcpu) { struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; @@ -401,15 +398,6 @@ static void vtimer_restore_state(struct kvm_vcpu *vcpu) local_irq_restore(flags); }
-void kvm_timer_unschedule(struct kvm_vcpu *vcpu) -{ - struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; - - vtimer_restore_state(vcpu); - - soft_timer_cancel(&timer->bg_timer); -} - static void set_cntvoff(u64 cntvoff) { u32 low = lower_32_bits(cntvoff); @@ -485,6 +473,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu) /* Set the background timer for the physical timer emulation. */ phys_timer_emulate(vcpu);
+ kvm_timer_unblocking(vcpu); + /* If the timer fired while we weren't running, inject it now */ if (kvm_timer_should_fire(ptimer) != ptimer->irq.level) kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer); @@ -527,6 +517,9 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu) */ soft_timer_cancel(&timer->phys_timer);
+ if (swait_active(kvm_arch_vcpu_wq(vcpu))) + kvm_timer_blocking(vcpu); + /* * The kernel may decide to run userspace after calling vcpu_put, so * we reset cntvoff to 0 to ensure a consistent read between user diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 1a5d0f2..72c2ca9 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -338,7 +338,6 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) { - kvm_timer_schedule(vcpu); /* * If we're about to block (most likely because we've just hit a * WFI), we need to sync back the state of the GIC CPU interface @@ -355,7 +354,6 @@ void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) { - kvm_timer_unschedule(vcpu); kvm_vgic_v4_disable_doorbell(vcpu); }
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.1-rc3 commit ca71228b42a96908eca7658861eafacd227856c9 category: feature feature: GICv4 VLPI perf improvement
-------------------------------------------------
The normal interrupt flow is not to enable the vgic when no virtual interrupt is to be injected (i.e. the LRs are empty). But when a guest is likely to use GICv4 for LPIs, we absolutely need to switch it on at all times. Otherwise, VLPIs only get delivered when there is something in the LRs, which doesn't happen very often.
Reported-by: Nianyao Tang tangnianyao@huawei.com Tested-by: Shameerali Kolothum Thodi shameerali.kolothum.thodi@huawei.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/hyp/vgic-v3-sr.c | 4 ++-- virt/kvm/arm/vgic/vgic.c | 14 ++++++++++---- 2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c index 616e5a4..e20e797 100644 --- a/virt/kvm/arm/hyp/vgic-v3-sr.c +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c @@ -222,7 +222,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) } }
- if (used_lrs) { + if (used_lrs || cpu_if->its_vpe.its_vm) { int i; u32 elrsr;
@@ -247,7 +247,7 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu) u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs; int i;
- if (used_lrs) { + if (used_lrs || cpu_if->its_vpe.its_vm) { write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
for (i = 0; i < used_lrs; i++) diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index 16b9ae1..6e8b17f 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -874,15 +874,21 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) * either observe the new interrupt before or after doing this check, * and introducing additional synchronization mechanism doesn't change * this. + * + * Note that we still need to go through the whole thing if anything + * can be directly injected (GICv4). */ - if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) + if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head) && + !vgic_supports_direct_msis(vcpu->kvm)) return;
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
- raw_spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); - vgic_flush_lr_state(vcpu); - raw_spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); + if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) { + raw_spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); + vgic_flush_lr_state(vcpu); + raw_spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); + }
if (can_access_vgic_from_kernel()) vgic_restore_state(vcpu);
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v5.1-rc3 commit a80868f398554842b14d07060012c06efb57c456 category: bugfix
-------------------------------------------------
commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") made the checks to skip huge mappings, stricter. However it introduced a bug where we still use huge mappings, ignoring the flag to use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE.
Also, the checks do not cover the PUD huge pages, that was under review during the same period. This patch fixes both the issues.
Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") Reported-by: Zenghui Yu yuzenghui@huawei.com Cc: Zenghui Yu yuzenghui@huawei.com Cc: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 4a9177d..ead6245 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1601,8 +1601,9 @@ static void kvm_send_hwpoison_signal(unsigned long address, send_sig_info(SIGBUS, &info, current); }
-static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, - unsigned long hva) +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, + unsigned long hva, + unsigned long map_size) { gpa_t gpa_start, gpa_end; hva_t uaddr_start, uaddr_end; @@ -1618,34 +1619,34 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot,
/* * Pages belonging to memslots that don't have the same alignment - * within a PMD for userspace and IPA cannot be mapped with stage-2 - * PMD entries, because we'll end up mapping the wrong pages. + * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 + * PMD/PUD entries, because we'll end up mapping the wrong pages. * * Consider a layout like the following: * * memslot->userspace_addr: * +-----+--------------------+--------------------+---+ - * |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz| + * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| * +-----+--------------------+--------------------+---+ * * memslot->base_gfn << PAGE_SIZE: * +---+--------------------+--------------------+-----+ - * |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz| + * |abc|def Stage-2 block | Stage-2 block |tvxyz| * +---+--------------------+--------------------+-----+ * - * If we create those stage-2 PMDs, we'll end up with this incorrect + * If we create those stage-2 blocks, we'll end up with this incorrect * mapping: * d -> f * e -> g * f -> h */ - if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) + if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1))) return false;
/* * Next, let's make sure we're not trying to map anything not covered - * by the memslot. This means we have to prohibit PMD size mappings - * for the beginning and end of a non-PMD aligned and non-PMD sized + * by the memslot. This means we have to prohibit block size mappings + * for the beginning and end of a non-block aligned and non-block sized * memory slot (illustrated by the head and tail parts of the * userspace view above containing pages 'abcde' and 'xyz', * respectively). @@ -1654,8 +1655,8 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, * userspace_addr or the base_gfn, as both are equally aligned (per * the check above) and equally sized. */ - return (hva & S2_PMD_MASK) >= uaddr_start && - (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; + return (hva & ~(map_size - 1)) >= uaddr_start && + (hva & ~(map_size - 1)) + map_size <= uaddr_end; }
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, @@ -1684,12 +1685,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; }
- if (!fault_supports_stage2_pmd_mappings(memslot, hva)) - force_pte = true; - - if (logging_active) - force_pte = true; - /* Let's check if we will get back a huge page backed by hugetlbfs */ down_read(¤t->mm->mmap_sem); vma = find_vma_intersection(current->mm, hva, hva + 1); @@ -1700,6 +1695,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, }
vma_pagesize = vma_kernel_pagesize(vma); + if (logging_active || + !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { + force_pte = true; + vma_pagesize = PAGE_SIZE; + } + /* * The stage2 has a minimum of 2 level table (For arm64 see * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can @@ -1707,11 +1708,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * As for PUD huge maps, we must make sure that we have at least * 3 levels, i.e, PMD is not folded. */ - if ((vma_pagesize == PMD_SIZE || - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && - !force_pte) { + if (vma_pagesize == PMD_SIZE || + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - } up_read(¤t->mm->mmap_sem);
/* We need minimum second+third level pages */
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v5.1-rc3 commit 3c3736cd32bf5197aed1410ae826d2d254a5b277 category: bugfix
-------------------------------------------------
We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission.
However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level.
Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range.
Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang zhengxiang9@huawei.com Cc: Zheng Xiang zhengxiang9@huawei.com Cc: Zenghui Yu yuzenghui@huawei.com Cc: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/stage2_pgtable.h | 2 ++ virt/kvm/arm/mmu.c | 59 +++++++++++++++++++++++++---------- 2 files changed, 45 insertions(+), 16 deletions(-)
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de20895..9e11dce 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -75,6 +75,8 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm)
#define S2_PMD_MASK PMD_MASK #define S2_PMD_SIZE PMD_SIZE +#define S2_PUD_MASK PUD_MASK +#define S2_PUD_SIZE PUD_SIZE
static inline bool kvm_stage2_has_pmd(struct kvm *kvm) { diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ead6245..27ef0c4 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd;
+retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd);
old_pmd = *pmd; + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* - * Multiple vcpus faulting on the same PMD entry, can - * lead to them sequentially updating the PMD with the - * same value. Following the break-before-make - * (pmd_clear() followed by tlb_flush()) process can - * hinder forward progress due to refaults generated - * on missing translations. + * If we already have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB state and + * leaking the table page. We could end up in this situation + * if the memory slot was marked for dirty logging and was + * reverted, leaving PTE level mappings for the pages accessed + * during the period. So, unmap the PTE level mapping for this + * block and retry, as we could have released the upper level + * table in the process. * - * Skip updating the page table if the entry is - * unchanged. + * Normal THP split/merge follows mmu_notifier callbacks and do + * get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud;
+retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp);
@@ -1114,14 +1132,23 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
/* * A large number of vcpus faulting on the same stage 2 entry, - * can lead to a refault due to the - * stage2_pud_clear()/tlb_flush(). Skip updating the page - * tables if there is no change. + * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). + * Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0;
if (stage2_pud_present(kvm, old_pud)) { + /* + * If we already have table level mapping for this block, unmap + * the range for this block and retry. + */ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); + goto retry; + } + + WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp)); stage2_pud_clear(kvm, pudp); kvm_tlb_flush_vmid_ipa(kvm, addr); } else {
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.1-rc7 commit 96085b949672dca19773495813b577eb3bedf06e category: bugfix
-------------------------------------------------
When disabling LPIs (for example on reset) at the redistributor level, it is expected that LPIs that was pending in the CPU interface are eventually retired.
Currently, this is not what is happening, and these LPIs will stay in the ap_list, eventually being acknowledged by the vcpu (which didn't quite expect this behaviour).
The fix is thus to retire these LPIs from the list of pending interrupts as we disable LPIs.
Reported-by: Heyi Guo guoheyi@huawei.com Tested-by: Heyi Guo guoheyi@huawei.com Fixes: 0e4e82f154e3 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller") Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-mmio-v3.c | 3 +++ virt/kvm/arm/vgic/vgic.c | 21 +++++++++++++++++++++ virt/kvm/arm/vgic/vgic.h | 1 + 3 files changed, 25 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index 4a12322..9f4843f 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -200,6 +200,9 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
vgic_cpu->lpis_enabled = val & GICR_CTLR_ENABLE_LPIS;
+ if (was_enabled && !vgic_cpu->lpis_enabled) + vgic_flush_pending_lpis(vcpu); + if (!was_enabled && vgic_cpu->lpis_enabled) vgic_enable_lpis(vcpu); } diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index 6e8b17f..07b9f67 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -151,6 +151,27 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) kfree(irq); }
+void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu) +{ + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + struct vgic_irq *irq, *tmp; + unsigned long flags; + + raw_spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags); + + list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) { + if (irq->intid >= VGIC_MIN_LPI) { + raw_spin_lock(&irq->irq_lock); + list_del(&irq->ap_list); + irq->vcpu = NULL; + raw_spin_unlock(&irq->irq_lock); + vgic_put_irq(vcpu->kvm, irq); + } + } + + raw_spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags); +} + void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending) { WARN_ON(irq_set_irqchip_state(irq->host_irq, diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h index d5e4542..d974508 100644 --- a/virt/kvm/arm/vgic/vgic.h +++ b/virt/kvm/arm/vgic/vgic.h @@ -240,6 +240,7 @@ static inline void vgic_get_irq_kref(struct vgic_irq *irq) bool vgic_has_its(struct kvm *kvm); int kvm_vgic_register_its_device(void); void vgic_enable_lpis(struct kvm_vcpu *vcpu); +void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu); int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi); int vgic_v3_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr); int vgic_v3_dist_uaccess(struct kvm_vcpu *vcpu, bool is_write,
From: Suzuki K Poulose suzuki.poulose@arm.com
mainline inclusion from mainline-v5.1-rc7 commit 2e8010bb71b39ff18aac9fb209b3c3093f4c4783 category: bugfix
-------------------------------------------------
With commit a80868f398554842b14, we no longer ensure that the THP page is properly aligned in the guest IPA. Skip the stage2 huge mapping for unaligned IPA backed by transparent hugepages.
Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") Reported-by: Eric Auger eric.auger@redhat.com Cc: Marc Zyngier marc.zyngier@arm.com Cc: Chirstoffer Dall christoffer.dall@arm.com Cc: Zenghui Yu yuzenghui@huawei.com Cc: Zheng Xiang zhengxiang9@huawei.com Cc: Andrew Murray andrew.murray@arm.com Cc: Eric Auger eric.auger@redhat.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/mmu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27ef0c4..f74d1a7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1794,8 +1794,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * Only PMD_SIZE transparent hugepages(THP) are * currently supported. This code will need to be * updated to support other THP sizes. + * + * Make sure the host VA and the guest IPA are sufficiently + * aligned and that the block is contained within the memslot. */ - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) + if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(&pfn, &fault_ipa)) vma_pagesize = PMD_SIZE; }
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 21bb0ebf5d78c669ed887f30889f2d28cf6bf7db category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
The armv8pmu_enable_event_counter function issues an isb instruction after enabling a pair of counters - this doesn't provide any value and is inconsistent with the armv8pmu_disable_event_counter.
In any case armv8pmu_enable_event_counter is always called with the PMU stopped. Starting the PMU with armv8pmu_start results in an isb instruction being issued prior to writing to PMCR_EL0.
Let's remove the unnecessary isb instruction.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kernel/perf_event.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index da62116..b609875 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -763,7 +763,6 @@ static inline void armv8pmu_enable_event_counter(struct perf_event *event) armv8pmu_enable_counter(idx); if (armv8pmu_event_is_chained(event)) armv8pmu_enable_counter(idx - 1); - isb(); }
static inline int armv8pmu_disable_counter(int idx)
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 630a16854d2d28d13e96ff27ab43cc5caa4609fc category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is a typedef to kvm_cpu_context and is used to store host cpu context. The kvm_cpu_context structure is also used elsewhere to hold vcpu context. In order to use the percpu to hold additional future host information we encapsulate kvm_cpu_context in a new structure and rename the typedef and percpu to match.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_host.h | 10 +++++++--- arch/arm64/include/asm/kvm_asm.h | 3 ++- arch/arm64/include/asm/kvm_host.h | 16 ++++++++++------ arch/arm64/kernel/asm-offsets.c | 1 + virt/kvm/arm/arm.c | 14 ++++++++------ 5 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 4228115..7cd6280 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -147,9 +147,13 @@ struct kvm_cpu_context { u32 cp15[NR_CP15_REGS]; };
-typedef struct kvm_cpu_context kvm_cpu_context_t; +struct kvm_host_data { + struct kvm_cpu_context host_ctxt; +}; + +typedef struct kvm_host_data kvm_host_data_t;
-static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, +static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, int cpu) { /* The host's MPIDR is immutable, so let's set it up at boot time */ @@ -179,7 +183,7 @@ struct kvm_vcpu_arch { struct kvm_vcpu_fault_info fault;
/* Host FP context */ - kvm_cpu_context_t *host_cpu_context; + struct kvm_cpu_context *host_cpu_context;
/* VGIC state */ struct vgic_cpu vgic_cpu; diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index f5b79e9..ff73f54 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -108,7 +108,8 @@ .endm
.macro get_host_ctxt reg, tmp - hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp + hyp_adr_this_cpu \reg, kvm_host_data, \tmp + add \reg, \reg, #HOST_DATA_CONTEXT .endm
.macro get_vcpu_ptr vcpu, ctxt diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index d406292..d9d4a848 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -209,7 +209,11 @@ struct kvm_cpu_context { struct kvm_vcpu *__hyp_running_vcpu; };
-typedef struct kvm_cpu_context kvm_cpu_context_t; +struct kvm_host_data { + struct kvm_cpu_context host_ctxt; +}; + +typedef struct kvm_host_data kvm_host_data_t;
struct vcpu_reset_state { unsigned long pc; @@ -252,7 +256,7 @@ struct kvm_vcpu_arch { struct kvm_guest_debug_arch external_debug_state;
/* Pointer to host CPU context */ - kvm_cpu_context_t *host_cpu_context; + struct kvm_cpu_context *host_cpu_context;
struct thread_info *host_thread_info; /* hyp VA */ struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ @@ -400,11 +404,11 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
-DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data);
void __kvm_enable_ssbs(void);
-static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, +static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, int cpu) { /* The host's MPIDR is immutable, so let's set it up at boot time */ @@ -421,8 +425,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, * kernel's mapping to the linear mapping, and store it in tpidr_el2 * so that we can use adr_l to access per-cpu variables in EL2. */ - u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_cpu_state) - - (u64)kvm_ksym_ref(kvm_host_cpu_state)); + u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_data) - + (u64)kvm_ksym_ref(kvm_host_data));
/* * Call initialization code, and switch to the full blown HYP code. diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 5ee9f39..f777649 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -145,6 +145,7 @@ int main(void) DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2])); DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu)); + DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt)); #endif #ifdef CONFIG_CPU_PM DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx)); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 72c2ca9..daba0c2 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -56,7 +56,7 @@ __asm__(".arch_extension virt"); #endif
-DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); +DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data); static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
/* Per-CPU variable containing the currently running vcpu. */ @@ -374,8 +374,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { int *last_ran; + kvm_host_data_t *cpu_data;
last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran); + cpu_data = this_cpu_ptr(&kvm_host_data);
/* * We might get preempted before the vCPU actually runs, but @@ -387,7 +389,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) }
vcpu->cpu = cpu; - vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state); + vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
kvm_arm_set_running_vcpu(vcpu); kvm_vgic_load(vcpu); @@ -1576,11 +1578,11 @@ static int init_hyp_mode(void) }
for_each_possible_cpu(cpu) { - kvm_cpu_context_t *cpu_ctxt; + kvm_host_data_t *cpu_data;
- cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); - kvm_init_host_cpu_context(cpu_ctxt, cpu); - err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP); + cpu_data = per_cpu_ptr(&kvm_host_data, cpu); + kvm_init_host_cpu_context(&cpu_data->host_ctxt, cpu); + err = create_hyp_mappings(cpu_data, cpu_data + 1, PAGE_HYP);
if (err) { kvm_err("Cannot map host CPU state: %d\n", err);
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit eb41238cf19fda694e3a99c1f4f58bd88479a5ee category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
In order to effeciently switch events_{guest,host} perf counters at guest entry/exit we add bitfields to kvm_cpu_context for guest and host events as well as accessors for updating them.
A function is also provided which allows the PMU driver to determine if a counter should start counting when it is enabled. With exclude_host, we may only start counting when entering the guest.
Signed-off-by: Andrew Murray andrew.murray@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 17 +++++++++++++++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/pmu.c | 45 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kvm/pmu.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index d9d4a848..b4ec4a8 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -209,8 +209,14 @@ struct kvm_cpu_context { struct kvm_vcpu *__hyp_running_vcpu; };
+struct kvm_pmu_events { + u32 events_host; + u32 events_guest; +}; + struct kvm_host_data { struct kvm_cpu_context host_ctxt; + struct kvm_pmu_events pmu_events; };
typedef struct kvm_host_data kvm_host_data_t; @@ -486,11 +492,22 @@ static inline void __cpu_init_stage2(void) {} void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) +{ + return attr->exclude_host; +} + #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) { return kvm_arch_vcpu_run_map_fp(vcpu); } + +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u32 clr); +#else +static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u32 clr) {} #endif
static inline void kvm_arm_vhe_guest_enter(void) diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 0f2a135..f34cb49 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c new file mode 100644 index 00000000..5414b13 --- /dev/null +++ b/arch/arm64/kvm/pmu.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2019 Arm Limited + * Author: Andrew Murray Andrew.Murray@arm.com + */ +#include <linux/kvm_host.h> +#include <linux/perf_event.h> + +/* + * Given the exclude_{host,guest} attributes, determine if we are going + * to need to switch counters at guest entry/exit. + */ +static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) +{ + /* Only switch if attributes are different */ + return (attr->exclude_host != attr->exclude_guest); +} + +/* + * Add events to track that we may want to switch at guest entry/exit + * time. + */ +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); + + if (!kvm_pmu_switch_needed(attr)) + return; + + if (!attr->exclude_host) + ctx->pmu_events.events_host |= set; + if (!attr->exclude_guest) + ctx->pmu_events.events_guest |= set; +} + +/* + * Stop tracking events + */ +void kvm_clr_pmu_events(u32 clr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); + + ctx->pmu_events.events_host &= ~clr; + ctx->pmu_events.events_guest &= ~clr; +}
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit d1947bc4bc63e56014bb4d812e0db89944ed4a0f category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
Add support for the :G and :H attributes in perf by handling the exclude_host/exclude_guest event attributes.
We notify KVM of counters that we wish to be enabled or disabled on guest entry/exit and thus defer from starting or stopping events based on their event attributes.
With !VHE we switch the counters between host/guest at EL2. We are able to eliminate counters counting host events on the boundaries of guest entry/exit when using :G by filtering out EL2 for exclude_host. When using !exclude_hv there is a small blackout window at the guest entry/exit where host events are not captured.
Signed-off-by: Andrew Murray andrew.murray@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kernel/perf_event.c | 43 +++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index b609875..c618bfe 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -26,6 +26,7 @@
#include <linux/acpi.h> #include <linux/clocksource.h> +#include <linux/kvm_host.h> #include <linux/of.h> #include <linux/perf/arm_pmu.h> #include <linux/platform_device.h> @@ -758,11 +759,21 @@ static inline int armv8pmu_enable_counter(int idx)
static inline void armv8pmu_enable_event_counter(struct perf_event *event) { + struct perf_event_attr *attr = &event->attr; int idx = event->hw.idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
- armv8pmu_enable_counter(idx); if (armv8pmu_event_is_chained(event)) - armv8pmu_enable_counter(idx - 1); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + kvm_set_pmu_events(counter_bits, attr); + + /* We rely on the hypervisor switch code to enable guest counters */ + if (!kvm_pmu_counter_deferred(attr)) { + armv8pmu_enable_counter(idx); + if (armv8pmu_event_is_chained(event)) + armv8pmu_enable_counter(idx - 1); + } }
static inline int armv8pmu_disable_counter(int idx) @@ -775,11 +786,21 @@ static inline int armv8pmu_disable_counter(int idx) static inline void armv8pmu_disable_event_counter(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct perf_event_attr *attr = &event->attr; int idx = hwc->idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
if (armv8pmu_event_is_chained(event)) - armv8pmu_disable_counter(idx - 1); - armv8pmu_disable_counter(idx); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + kvm_clr_pmu_events(counter_bits); + + /* We rely on the hypervisor switch code to disable guest counters */ + if (!kvm_pmu_counter_deferred(attr)) { + if (armv8pmu_event_is_chained(event)) + armv8pmu_disable_counter(idx - 1); + armv8pmu_disable_counter(idx); + } }
static inline int armv8pmu_enable_intens(int idx) @@ -1078,11 +1099,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, if (!attr->exclude_kernel) config_base |= ARMV8_PMU_INCLUDE_EL2; } else { - if (attr->exclude_kernel) - config_base |= ARMV8_PMU_EXCLUDE_EL1; - if (!attr->exclude_hv) + if (!attr->exclude_hv && !attr->exclude_host) config_base |= ARMV8_PMU_INCLUDE_EL2; } + + /* + * Filter out !VHE kernels and guest kernels + */ + if (attr->exclude_kernel) + config_base |= ARMV8_PMU_EXCLUDE_EL1; + if (attr->exclude_user) config_base |= ARMV8_PMU_EXCLUDE_EL0;
@@ -1112,6 +1138,9 @@ static void armv8pmu_reset(void *info) armv8pmu_disable_intens(idx); }
+ /* Clear the counters we flip at guest entry/exit */ + kvm_clr_pmu_events(U32_MAX); + /* * Initialize & Reset PMNC. Request overflow interrupt for * 64 bit cycle counter but cheat in armv8pmu_write_counter().
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 3d91befbb3a0fcec6e1eebde45c8074b88cc9441 category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
Enable/disable event counters as appropriate when entering and exiting the guest to enable support for guest or host only event counting.
For both VHE and non-VHE we switch the counters between host/guest at EL2.
The PMU may be on when we change which counters are enabled however we avoid adding an isb as we instead rely on existing context synchronisation events: the eret to enter the guest (__guest_enter) and eret in kvm_call_hyp for __kvm_vcpu_run_nvhe on returning.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/kvm/hyp/switch.c | 6 ++++++ arch/arm64/kvm/pmu.c | 39 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index b4ec4a8..eabc756 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -505,6 +505,9 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); void kvm_clr_pmu_events(u32 clr); + +void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); +bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); #else static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} static inline void kvm_clr_pmu_events(u32 clr) {} diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index ce355b7..5a05003 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -540,6 +540,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu) { struct kvm_cpu_context *host_ctxt; struct kvm_cpu_context *guest_ctxt; + bool pmu_switch_needed; u64 exit_code;
/* @@ -559,6 +560,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu) host_ctxt->__hyp_running_vcpu = vcpu; guest_ctxt = &vcpu->arch.ctxt;
+ pmu_switch_needed = __pmu_switch_to_guest(host_ctxt); + __sysreg_save_state_nvhe(host_ctxt);
__activate_traps(vcpu); @@ -605,6 +608,9 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu) */ __debug_switch_to_host(vcpu);
+ if (pmu_switch_needed) + __pmu_switch_to_host(host_ctxt); + /* Returning to host will clear PSR.I, remask PMR if needed */ if (system_uses_irq_prio_masking()) gic_write_pmr(GIC_PRIO_IRQOFF); diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 5414b13..599e6d3 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -5,6 +5,7 @@ */ #include <linux/kvm_host.h> #include <linux/perf_event.h> +#include <asm/kvm_hyp.h>
/* * Given the exclude_{host,guest} attributes, determine if we are going @@ -43,3 +44,41 @@ void kvm_clr_pmu_events(u32 clr) ctx->pmu_events.events_host &= ~clr; ctx->pmu_events.events_guest &= ~clr; } + +/** + * Disable host events, enable guest events + */ +bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = &host->pmu_events; + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenclr_el0); + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenset_el0); + + return (pmu->events_host || pmu->events_guest); +} + +/** + * Disable guest events, enable host events + */ +void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = &host->pmu_events; + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenclr_el0); + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenset_el0); +}
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 435e53fb5e21ad1820c5c69f208304c0e5623d01 category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
With VHE different exception levels are used between the host (EL2) and guest (EL1) with a shared exception level for userpace (EL0). We can take advantage of this and use the PMU's exception level filtering to avoid enabling/disabling counters in the world-switch code. Instead we just modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
We also ensure that trapped PMU system register writes do not re-enable EL0 when reconfiguring the backing perf events.
This approach completely avoids blackout windows seen with !VHE.
Suggested-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Andrew Murray andrew.murray@arm.com Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_host.h | 3 ++ arch/arm64/include/asm/kvm_host.h | 5 ++- arch/arm64/kernel/perf_event.c | 6 ++- arch/arm64/kvm/pmu.c | 88 ++++++++++++++++++++++++++++++++++++++- arch/arm64/kvm/sys_regs.c | 3 ++ virt/kvm/arm/arm.c | 2 + 6 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 7cd6280..c0d39e4 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -339,6 +339,9 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} +static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} + static inline void kvm_arm_vhe_guest_enter(void) {} static inline void kvm_arm_vhe_guest_exit(void) {}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index eabc756..af4f326 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -494,7 +494,7 @@ static inline void __cpu_init_stage2(void) {}
static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) { - return attr->exclude_host; + return (!has_vhe() && attr->exclude_host); }
#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ @@ -508,6 +508,9 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); + +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); #else static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} static inline void kvm_clr_pmu_events(u32 clr) {} diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index c618bfe..41aa9db 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -1096,8 +1096,12 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, * with other architectures (x86 and Power). */ if (is_kernel_in_hyp_mode()) { - if (!attr->exclude_kernel) + if (!attr->exclude_kernel && !attr->exclude_host) config_base |= ARMV8_PMU_INCLUDE_EL2; + if (attr->exclude_guest) + config_base |= ARMV8_PMU_EXCLUDE_EL1; + if (attr->exclude_host) + config_base |= ARMV8_PMU_EXCLUDE_EL0; } else { if (!attr->exclude_hv && !attr->exclude_host) config_base |= ARMV8_PMU_INCLUDE_EL2; diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 599e6d3..3f99a09 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -8,11 +8,19 @@ #include <asm/kvm_hyp.h>
/* - * Given the exclude_{host,guest} attributes, determine if we are going - * to need to switch counters at guest entry/exit. + * Given the perf event attributes and system type, determine + * if we are going to need to switch counters at guest entry/exit. */ static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) { + /** + * With VHE the guest kernel runs at EL1 and the host at EL2, + * where user (EL0) is excluded then we have no reason to switch + * counters. + */ + if (has_vhe() && attr->exclude_user) + return false; + /* Only switch if attributes are different */ return (attr->exclude_host != attr->exclude_guest); } @@ -82,3 +90,79 @@ void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) if (pmu->events_host) write_sysreg(pmu->events_host, pmcntenset_el0); } + +/* + * Modify ARMv8 PMU events to include EL0 counting + */ +static void kvm_vcpu_pmu_enable_el0(unsigned long events) +{ + u64 typer; + u32 counter; + + for_each_set_bit(counter, &events, 32) { + write_sysreg(counter, pmselr_el0); + isb(); + typer = read_sysreg(pmxevtyper_el0) & ~ARMV8_PMU_EXCLUDE_EL0; + write_sysreg(typer, pmxevtyper_el0); + isb(); + } +} + +/* + * Modify ARMv8 PMU events to exclude EL0 counting + */ +static void kvm_vcpu_pmu_disable_el0(unsigned long events) +{ + u64 typer; + u32 counter; + + for_each_set_bit(counter, &events, 32) { + write_sysreg(counter, pmselr_el0); + isb(); + typer = read_sysreg(pmxevtyper_el0) | ARMV8_PMU_EXCLUDE_EL0; + write_sysreg(typer, pmxevtyper_el0); + isb(); + } +} + +/* + * On VHE ensure that only guest events have EL0 counting enabled + */ +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) +{ + struct kvm_cpu_context *host_ctxt; + struct kvm_host_data *host; + u32 events_guest, events_host; + + if (!has_vhe()) + return; + + host_ctxt = vcpu->arch.host_cpu_context; + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + events_guest = host->pmu_events.events_guest; + events_host = host->pmu_events.events_host; + + kvm_vcpu_pmu_enable_el0(events_guest); + kvm_vcpu_pmu_disable_el0(events_host); +} + +/* + * On VHE ensure that only host events have EL0 counting enabled + */ +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) +{ + struct kvm_cpu_context *host_ctxt; + struct kvm_host_data *host; + u32 events_guest, events_host; + + if (!has_vhe()) + return; + + host_ctxt = vcpu->arch.host_cpu_context; + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + events_guest = host->pmu_events.events_guest; + events_host = host->pmu_events.events_host; + + kvm_vcpu_pmu_enable_el0(events_host); + kvm_vcpu_pmu_disable_el0(events_guest); +} diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index b76ecdc..f33feba 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -678,6 +678,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, val |= p->regval & ARMV8_PMU_PMCR_MASK; __vcpu_sys_reg(vcpu, PMCR_EL0) = val; kvm_pmu_handle_pmcr(vcpu, val); + kvm_vcpu_pmu_restore_guest(vcpu); } else { /* PMCR.P & PMCR.C are RAZ */ val = __vcpu_sys_reg(vcpu, PMCR_EL0) @@ -833,6 +834,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, if (p->is_write) { kvm_pmu_set_counter_event_type(vcpu, p->regval, idx); __vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK; + kvm_vcpu_pmu_restore_guest(vcpu); } else { p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK; } @@ -858,6 +860,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, /* accessing PMCNTENSET_EL0 */ __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; kvm_pmu_enable_counter(vcpu, val); + kvm_vcpu_pmu_restore_guest(vcpu); } else { /* accessing PMCNTENCLR_EL0 */ __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val; diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index daba0c2..e0718f1 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -396,6 +396,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_timer_vcpu_load(vcpu); kvm_vcpu_load_sysregs(vcpu); kvm_arch_vcpu_load_fp(vcpu); + kvm_vcpu_pmu_restore_guest(vcpu);
if (single_task_running()) vcpu_clear_wfe_traps(vcpu); @@ -409,6 +410,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_vcpu_put_sysregs(vcpu); kvm_timer_vcpu_put(vcpu); kvm_vgic_put(vcpu); + kvm_vcpu_pmu_restore_host(vcpu);
vcpu->cpu = -1;
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 39e3406a090a54e700a7c0820c8258af1196b0c2 category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
Upon entering or exiting a guest we may modify multiple PMU counters to enable of disable EL0 filtering. We presently do this via the indirect PMXEVTYPER_EL0 system register (where the counter we modify is selected by PMSELR). With this approach it is necessary to order the writes via isb instructions such that we select the correct counter before modifying it.
Let's avoid potentially expensive instruction barriers by using the direct PMEVTYPER<n>_EL0 registers instead.
As the change to counter type relates only to EL0 filtering we can rely on the implicit instruction barrier which occurs when we transition from EL2 to EL1 on entering the guest. On returning to userspace we can, at the latest, rely on the implicit barrier between EL2 and EL0. We can also depend on the explicit isb in armv8pmu_select_counter to order our write against any other kernel changes by the PMU driver to the type register as a result of preemption.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kvm/pmu.c | 84 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 74 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 3f99a09..cd49db8 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -91,6 +91,74 @@ void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) write_sysreg(pmu->events_host, pmcntenset_el0); }
+#define PMEVTYPER_READ_CASE(idx) \ + case idx: \ + return read_sysreg(pmevtyper##idx##_el0) + +#define PMEVTYPER_WRITE_CASE(idx) \ + case idx: \ + write_sysreg(val, pmevtyper##idx##_el0); \ + break + +#define PMEVTYPER_CASES(readwrite) \ + PMEVTYPER_##readwrite##_CASE(0); \ + PMEVTYPER_##readwrite##_CASE(1); \ + PMEVTYPER_##readwrite##_CASE(2); \ + PMEVTYPER_##readwrite##_CASE(3); \ + PMEVTYPER_##readwrite##_CASE(4); \ + PMEVTYPER_##readwrite##_CASE(5); \ + PMEVTYPER_##readwrite##_CASE(6); \ + PMEVTYPER_##readwrite##_CASE(7); \ + PMEVTYPER_##readwrite##_CASE(8); \ + PMEVTYPER_##readwrite##_CASE(9); \ + PMEVTYPER_##readwrite##_CASE(10); \ + PMEVTYPER_##readwrite##_CASE(11); \ + PMEVTYPER_##readwrite##_CASE(12); \ + PMEVTYPER_##readwrite##_CASE(13); \ + PMEVTYPER_##readwrite##_CASE(14); \ + PMEVTYPER_##readwrite##_CASE(15); \ + PMEVTYPER_##readwrite##_CASE(16); \ + PMEVTYPER_##readwrite##_CASE(17); \ + PMEVTYPER_##readwrite##_CASE(18); \ + PMEVTYPER_##readwrite##_CASE(19); \ + PMEVTYPER_##readwrite##_CASE(20); \ + PMEVTYPER_##readwrite##_CASE(21); \ + PMEVTYPER_##readwrite##_CASE(22); \ + PMEVTYPER_##readwrite##_CASE(23); \ + PMEVTYPER_##readwrite##_CASE(24); \ + PMEVTYPER_##readwrite##_CASE(25); \ + PMEVTYPER_##readwrite##_CASE(26); \ + PMEVTYPER_##readwrite##_CASE(27); \ + PMEVTYPER_##readwrite##_CASE(28); \ + PMEVTYPER_##readwrite##_CASE(29); \ + PMEVTYPER_##readwrite##_CASE(30) + +/* + * Read a value direct from PMEVTYPER<idx> + */ +static u64 kvm_vcpu_pmu_read_evtype_direct(int idx) +{ + switch (idx) { + PMEVTYPER_CASES(READ); + default: + WARN_ON(1); + } + + return 0; +} + +/* + * Write a value direct to PMEVTYPER<idx> + */ +static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val) +{ + switch (idx) { + PMEVTYPER_CASES(WRITE); + default: + WARN_ON(1); + } +} + /* * Modify ARMv8 PMU events to include EL0 counting */ @@ -100,11 +168,9 @@ static void kvm_vcpu_pmu_enable_el0(unsigned long events) u32 counter;
for_each_set_bit(counter, &events, 32) { - write_sysreg(counter, pmselr_el0); - isb(); - typer = read_sysreg(pmxevtyper_el0) & ~ARMV8_PMU_EXCLUDE_EL0; - write_sysreg(typer, pmxevtyper_el0); - isb(); + typer = kvm_vcpu_pmu_read_evtype_direct(counter); + typer &= ~ARMV8_PMU_EXCLUDE_EL0; + kvm_vcpu_pmu_write_evtype_direct(counter, typer); } }
@@ -117,11 +183,9 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events) u32 counter;
for_each_set_bit(counter, &events, 32) { - write_sysreg(counter, pmselr_el0); - isb(); - typer = read_sysreg(pmxevtyper_el0) | ARMV8_PMU_EXCLUDE_EL0; - write_sysreg(typer, pmxevtyper_el0); - isb(); + typer = kvm_vcpu_pmu_read_evtype_direct(counter); + typer |= ARMV8_PMU_EXCLUDE_EL0; + kvm_vcpu_pmu_write_evtype_direct(counter, typer); } }
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit a9bf3130ebfe0378df449a1b13997c47a58e661a category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
The interaction between the exclude_{host,guest} flags, exclude_{user,kernel,hv} flags and presence of VHE can result in different exception levels being filtered by the ARMv8 PMU. As this can be confusing let's document how they work on arm64.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- Documentation/arm64/perf.txt | 85 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 Documentation/arm64/perf.txt
diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt new file mode 100644 index 00000000..0d6a7d8 --- /dev/null +++ b/Documentation/arm64/perf.txt @@ -0,0 +1,85 @@ +Perf Event Attributes +===================== + +Author: Andrew Murray andrew.murray@arm.com +Date: 2019-03-06 + +exclude_user +------------ + +This attribute excludes userspace. + +Userspace always runs at EL0 and thus this attribute will exclude EL0. + + +exclude_kernel +-------------- + +This attribute excludes the kernel. + +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run +at EL1. + +For the host this attribute will exclude EL1 and additionally EL2 on a VHE +system. + +For the guest this attribute will exclude EL1. Please note that EL2 is +never counted within a guest. + + +exclude_hv +---------- + +This attribute excludes the hypervisor. + +For a VHE host this attribute is ignored as we consider the host kernel to +be the hypervisor. + +For a non-VHE host this attribute will exclude EL2 as we consider the +hypervisor to be any code that runs at EL2 which is predominantly used for +guest/host transitions. + +For the guest this attribute has no effect. Please note that EL2 is +never counted within a guest. + + +exclude_host / exclude_guest +---------------------------- + +These attributes exclude the KVM host and guest, respectively. + +The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE +kernel or non-VHE hypervisor). + +The KVM guest may run at EL0 (userspace) and EL1 (kernel). + +Due to the overlapping exception levels between host and guests we cannot +exclusively rely on the PMU's hardware exception filtering - therefore we +must enable/disable counting on the entry and exit to the guest. This is +performed differently on VHE and non-VHE systems. + +For non-VHE systems we exclude EL2 for exclude_host - upon entering and +exiting the guest we disable/enable the event as appropriate based on the +exclude_host and exclude_guest attributes. + +For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 +for exclude_host. Upon entering and exiting the guest we modify the event +to include/exclude EL0 as appropriate based on the exclude_host and +exclude_guest attributes. + +The statements above also apply when these attributes are used within a +non-VHE guest however please note that EL2 is never counted within a guest. + + +Accuracy +-------- + +On non-VHE hosts we enable/disable counters on the entry/exit of host/guest +transition at EL2 - however there is a period of time between +enabling/disabling the counters and entering/exiting the guest. We are +able to eliminate counters counting host events on the boundaries of guest +entry/exit when counting guest events by filtering out EL2 for +exclude_host. However when using !exclude_hv there is a small blackout +window at the guest entry/exit where host events are not captured. + +On VHE systems there are no blackout windows.
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.2-rc1 commit 21137301de5a5bd8be7b0de7896c3f139b7eb859 category: feature feature: Support perf event modifiers :G and :H
-------------------------------------------------
The kvm_vcpu_pmu_{read,write}_evtype_direct functions do not handle the cycle counter use-case, this leads to inaccurate counts and a WARN message when using perf with the cycle counter (-e cycle).
Let's fix this by adding a use case for pmccfiltr_el0.
Fixes: 39e3406a090a ("arm64: KVM: Avoid isb's by using direct pmxevtyper sysreg") Reported-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Andrew Murray andrew.murray@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kvm/pmu.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index cd49db8..3da94a5 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -134,12 +134,15 @@ void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) PMEVTYPER_##readwrite##_CASE(30)
/* - * Read a value direct from PMEVTYPER<idx> + * Read a value direct from PMEVTYPER<idx> where idx is 0-30 + * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). */ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx) { switch (idx) { PMEVTYPER_CASES(READ); + case ARMV8_PMU_CYCLE_IDX: + return read_sysreg(pmccfiltr_el0); default: WARN_ON(1); } @@ -148,12 +151,16 @@ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx) }
/* - * Write a value direct to PMEVTYPER<idx> + * Write a value direct to PMEVTYPER<idx> where idx is 0-30 + * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). */ static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val) { switch (idx) { PMEVTYPER_CASES(WRITE); + case ARMV8_PMU_CYCLE_IDX: + write_sysreg(val, pmccfiltr_el0); + break; default: WARN_ON(1); }
From: James Morse james.morse@arm.com
mainline inclusion from mainline-v5.2-rc2 commit b7c50fab66ab66e2d3e00f809a09578d78232836 category: bugfix
-------------------------------------------------
KVM's pmu.c contains the __hyp_text needed to switch the pmu registers between host and guest. Because this isn't covered by the 'hyp' Makefile, it can be built with kasan and friends when these are enabled in Kconfig.
When starting a guest, this results in: | Kernel panic - not syncing: HYP panic: | PS:a00003c9 PC:000083000028ada0 ESR:86000007 | FAR:000083000028ada0 HPFAR:0000000029df5300 PAR:0000000000000000 | VCPU:000000004e10b7d6 | CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026 | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat | Call trace: | dump_backtrace+0x0/0x200 | show_stack+0x20/0x30 | dump_stack+0xec/0x158 | panic+0x1ec/0x420 | panic+0x0/0x420 | SMP: stopping secondary CPUs | Kernel Offset: disabled | CPU features: 0x002,25006082 | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic:
This is caused by functions in pmu.c calling the instrumented code, which isn't mapped to hyp. From objdump -r: | RELOCATION RECORDS FOR [.hyp.text]: | OFFSET TYPE VALUE | 0000000000000010 R_AARCH64_CALL26 __sanitizer_cov_trace_pc | 0000000000000018 R_AARCH64_CALL26 __asan_load4_noabort | 0000000000000024 R_AARCH64_CALL26 __asan_load4_noabort
Move the affected code to a new file under 'hyp's Makefile.
Fixes: 3d91befbb3a0 ("arm64: KVM: Enable !VHE support for :G/:H perf event modifiers") Cc: Andrew Murray Andrew.Murray@arm.com Signed-off-by: James Morse james.morse@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/include/asm/kvm_host.h | 3 --- arch/arm64/kvm/hyp/switch.c | 39 +++++++++++++++++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 38 -------------------------------------- 3 files changed, 39 insertions(+), 41 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index af4f326..ab2ba6e 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -506,9 +506,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); void kvm_clr_pmu_events(u32 clr);
-void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); -bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); - void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); #else diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 5a05003..cc32ef0 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -16,6 +16,7 @@ */
#include <linux/arm-smccc.h> +#include <linux/kvm_host.h> #include <linux/types.h> #include <linux/jump_label.h> #include <uapi/linux/psci.h> @@ -490,6 +491,44 @@ static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) #endif }
+/** + * Disable host events, enable guest events + */ +static bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = &host->pmu_events; + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenclr_el0); + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenset_el0); + + return (pmu->events_host || pmu->events_guest); +} + +/** + * Disable guest events, enable host events + */ +static void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = &host->pmu_events; + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenclr_el0); + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenset_el0); +} + /* Switch to the guest for VHE systems running in EL2 */ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 3da94a5..e71d00b 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -53,44 +53,6 @@ void kvm_clr_pmu_events(u32 clr) ctx->pmu_events.events_guest &= ~clr; }
-/** - * Disable host events, enable guest events - */ -bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) -{ - struct kvm_host_data *host; - struct kvm_pmu_events *pmu; - - host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); - pmu = &host->pmu_events; - - if (pmu->events_host) - write_sysreg(pmu->events_host, pmcntenclr_el0); - - if (pmu->events_guest) - write_sysreg(pmu->events_guest, pmcntenset_el0); - - return (pmu->events_host || pmu->events_guest); -} - -/** - * Disable guest events, enable host events - */ -void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) -{ - struct kvm_host_data *host; - struct kvm_pmu_events *pmu; - - host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); - pmu = &host->pmu_events; - - if (pmu->events_guest) - write_sysreg(pmu->events_guest, pmcntenclr_el0); - - if (pmu->events_host) - write_sysreg(pmu->events_host, pmcntenset_el0); -} - #define PMEVTYPER_READ_CASE(idx) \ case idx: \ return read_sysreg(pmevtyper##idx##_el0)
From: Marc Zyngier marc.zyngier@arm.com
mainline inclusion from mainline-v5.3-rc1 commit 1e0cf16cdad1ba53e9eeee8746fe57de42f20c97 category: bugfix
-------------------------------------------------
As part of setting up the host context, we populate its MPIDR by using cpu_logical_map(). It turns out that contrary to arm64, cpu_logical_map() on 32bit ARM doesn't return the *full* MPIDR, but a truncated version.
This leaves the host MPIDR slightly corrupted after the first run of a VM, since we won't correctly restore the MPIDR on exit. Oops.
Since we cannot trust cpu_logical_map(), let's adopt a different strategy. We move the initialization of the host CPU context as part of the per-CPU initialization (which, in retrospect, makes a lot of sense), and directly read the MPIDR from the HW. This is guaranteed to work on both arm and arm64.
Reported-by: Andre Przywara Andre.Przywara@arm.com Tested-by: Andre Przywara Andre.Przywara@arm.com Fixes: 32f139551954 ("arm/arm64: KVM: Statically configure the host's view of MPIDR") Signed-off-by: Marc Zyngier marc.zyngier@arm.com [yu: resolve conflicts] Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_host.h | 6 ++---- arch/arm64/include/asm/kvm_host.h | 7 +++---- virt/kvm/arm/arm.c | 3 ++- 3 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index c0d39e4..5027f97 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -26,7 +26,6 @@ #include <asm/kvm_asm.h> #include <asm/kvm_mmio.h> #include <asm/fpstate.h> -#include <asm/smp_plat.h> #include <kvm/arm_arch_timer.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED @@ -153,11 +152,10 @@ struct kvm_host_data {
typedef struct kvm_host_data kvm_host_data_t;
-static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, - int cpu) +static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt) { /* The host's MPIDR is immutable, so let's set it up at boot time */ - cpu_ctxt->cp15[c0_MPIDR] = cpu_logical_map(cpu); + cpu_ctxt->cp15[c0_MPIDR] = read_cpuid_mpidr(); }
struct vcpu_reset_state { diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index ab2ba6e..e57cc19 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -26,12 +26,12 @@ #include <linux/kvm_types.h> #include <asm/arch_gicv3.h> #include <asm/cpufeature.h> +#include <asm/cputype.h> #include <asm/daifflags.h> #include <asm/fpsimd.h> #include <asm/kvm.h> #include <asm/kvm_asm.h> #include <asm/kvm_mmio.h> -#include <asm/smp_plat.h> #include <asm/thread_info.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED @@ -414,11 +414,10 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
void __kvm_enable_ssbs(void);
-static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, - int cpu) +static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt) { /* The host's MPIDR is immutable, so let's set it up at boot time */ - cpu_ctxt->sys_regs[MPIDR_EL1] = cpu_logical_map(cpu); + cpu_ctxt->sys_regs[MPIDR_EL1] = read_cpuid_mpidr(); }
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index e0718f1..e0ef9a9 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1336,6 +1336,8 @@ static void cpu_hyp_reset(void)
static void cpu_hyp_reinit(void) { + kvm_init_host_cpu_context(&this_cpu_ptr(&kvm_host_data)->host_ctxt); + cpu_hyp_reset();
if (is_kernel_in_hyp_mode()) { @@ -1583,7 +1585,6 @@ static int init_hyp_mode(void) kvm_host_data_t *cpu_data;
cpu_data = per_cpu_ptr(&kvm_host_data, cpu); - kvm_init_host_cpu_context(&cpu_data->host_ctxt, cpu); err = create_hyp_mappings(cpu_data, cpu_data + 1, PAGE_HYP);
if (err) {
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 24cab82c34aa6f3ede3de1d8621624cb5ef33feb category: feature feature: ITS translation cache
-------------------------------------------------
Add the basic data structure that expresses an MSI to LPI translation as well as the allocation/release hooks.
The size of the cache is arbitrarily defined as 16*nr_vcpus.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_vgic.h | 3 +++ virt/kvm/arm/vgic/vgic-init.c | 5 +++++ virt/kvm/arm/vgic/vgic-its.c | 49 +++++++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/vgic/vgic.h | 2 ++ 4 files changed, 59 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 3272869..eac0150 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -260,6 +260,9 @@ struct vgic_dist { struct list_head lpi_list_head; int lpi_list_count;
+ /* LPI translation cache */ + struct list_head lpi_translation_cache; + /* used by vgic-debug */ struct vgic_state_iter *iter;
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 54a6ee6..04bd2e4 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -65,6 +65,7 @@ void kvm_vgic_early_init(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic;
INIT_LIST_HEAD(&dist->lpi_list_head); + INIT_LIST_HEAD(&dist->lpi_translation_cache); raw_spin_lock_init(&dist->lpi_list_lock); }
@@ -315,6 +316,7 @@ int vgic_init(struct kvm *kvm) }
if (vgic_has_its(kvm)) { + vgic_lpi_translation_cache_init(kvm); ret = vgic_v4_init(kvm); if (ret) goto out; @@ -356,6 +358,9 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm) INIT_LIST_HEAD(&dist->rd_regions); }
+ if (vgic_has_its(kvm)) + vgic_lpi_translation_cache_destroy(kvm); + if (vgic_supports_direct_msis(kvm)) vgic_v4_teardown(kvm); } diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index ffc1e40..7b4b50b 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -149,6 +149,14 @@ struct its_ite { u32 event_id; };
+struct vgic_translation_cache_entry { + struct list_head entry; + phys_addr_t db; + u32 devid; + u32 eventid; + struct vgic_irq *irq; +}; + /** * struct vgic_its_abi - ITS abi ops and settings * @cte_esz: collection table entry size @@ -1668,6 +1676,45 @@ static int vgic_register_its_iodev(struct kvm *kvm, struct vgic_its *its, return ret; }
+/* Default is 16 cached LPIs per vcpu */ +#define LPI_DEFAULT_PCPU_CACHE_SIZE 16 + +void vgic_lpi_translation_cache_init(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + unsigned int sz; + int i; + + if (!list_empty(&dist->lpi_translation_cache)) + return; + + sz = atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE; + + for (i = 0; i < sz; i++) { + struct vgic_translation_cache_entry *cte; + + /* An allocation failure is not fatal */ + cte = kzalloc(sizeof(*cte), GFP_KERNEL); + if (WARN_ON(!cte)) + break; + + INIT_LIST_HEAD(&cte->entry); + list_add(&cte->entry, &dist->lpi_translation_cache); + } +} + +void vgic_lpi_translation_cache_destroy(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_translation_cache_entry *cte, *tmp; + + list_for_each_entry_safe(cte, tmp, + &dist->lpi_translation_cache, entry) { + list_del(&cte->entry); + kfree(cte); + } +} + #define INITIAL_BASER_VALUE \ (GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb) | \ GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner) | \ @@ -1696,6 +1743,8 @@ static int vgic_its_create(struct kvm_device *dev, u32 type) kfree(its); return ret; } + + vgic_lpi_translation_cache_init(dev->kvm); }
mutex_init(&its->its_lock); diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h index d974508..465d0a8 100644 --- a/virt/kvm/arm/vgic/vgic.h +++ b/virt/kvm/arm/vgic/vgic.h @@ -318,6 +318,8 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size) int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, u32 devid, u32 eventid, struct vgic_irq **irq); struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi); +void vgic_lpi_translation_cache_init(struct kvm *kvm); +void vgic_lpi_translation_cache_destroy(struct kvm *kvm);
bool vgic_supports_direct_msis(struct kvm *kvm); int vgic_v4_init(struct kvm *kvm);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 1bb3691d8330f09a0699c124fcb71991b5fd105b category: feature feature: ITS translation cache
-------------------------------------------------
Our LPI translation cache needs to be able to drop the refcount on an LPI whilst already holding the lpi_list_lock.
Let's add a new primitive for this.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic.c | 26 +++++++++++++++++--------- virt/kvm/arm/vgic/vgic.h | 1 + 2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index 07b9f67..ee29c90 100644 --- a/virt/kvm/arm/vgic/vgic.c +++ b/virt/kvm/arm/vgic/vgic.c @@ -130,6 +130,22 @@ static void vgic_irq_release(struct kref *ref) { }
+/* + * Drop the refcount on the LPI. Must be called with lpi_list_lock held. + */ +void __vgic_put_lpi_locked(struct kvm *kvm, struct vgic_irq *irq) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + + if (!kref_put(&irq->refcount, vgic_irq_release)) + return; + + list_del(&irq->lpi_list); + dist->lpi_list_count--; + + kfree(irq); +} + void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) { struct vgic_dist *dist = &kvm->arch.vgic; @@ -139,16 +155,8 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) return;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); - if (!kref_put(&irq->refcount, vgic_irq_release)) { - raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); - return; - }; - - list_del(&irq->lpi_list); - dist->lpi_list_count--; + __vgic_put_lpi_locked(kvm, irq); raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); - - kfree(irq); }
void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu) diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h index 465d0a8..77e1d76 100644 --- a/virt/kvm/arm/vgic/vgic.h +++ b/virt/kvm/arm/vgic/vgic.h @@ -172,6 +172,7 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr, gpa_t addr, int len); struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 intid); +void __vgic_put_lpi_locked(struct kvm *kvm, struct vgic_irq *irq); void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq); bool vgic_get_phys_line_level(struct vgic_irq *irq); void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 7d825fd6eaa7467c9b737fd5cf46e45f298d1c20 category: feature feature: ITS translation cache
-------------------------------------------------
There's a number of cases where we need to invalidate the caching of translations, so let's add basic support for that.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 23 +++++++++++++++++++++++ virt/kvm/arm/vgic/vgic.h | 1 + 2 files changed, 24 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 7b4b50b..fa854e5 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -546,6 +546,29 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm, return 0; }
+void vgic_its_invalidate_cache(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_translation_cache_entry *cte; + unsigned long flags; + + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); + + list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { + /* + * If we hit a NULL entry, there is nothing after this + * point. + */ + if (!cte->irq) + break; + + __vgic_put_lpi_locked(kvm, cte->irq); + cte->irq = NULL; + } + + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); +} + int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, u32 devid, u32 eventid, struct vgic_irq **irq) { diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h index 77e1d76..f5ae753 100644 --- a/virt/kvm/arm/vgic/vgic.h +++ b/virt/kvm/arm/vgic/vgic.h @@ -321,6 +321,7 @@ int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi); void vgic_lpi_translation_cache_init(struct kvm *kvm); void vgic_lpi_translation_cache_destroy(struct kvm *kvm); +void vgic_its_invalidate_cache(struct kvm *kvm);
bool vgic_supports_direct_msis(struct kvm *kvm); int vgic_v4_init(struct kvm *kvm);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 0c14484866194382c0631d6e9bb14cc2fe83e814 category: feature feature: ITS translation cache
-------------------------------------------------
The LPI translation cache needs to be discarded when an ITS command may affect the translation of an LPI (DISCARD, MAPC and MAPD with V=0) or the routing of an LPI to a redistributor with disabled LPIs (MOVI, MOVALL).
We decide to perform a full invalidation of the cache, irrespective of the LPI that is affected. Commands are supposed to be rare enough that it doesn't matter.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index fa854e5..aecf55f 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -733,6 +733,8 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its, * don't bother here since we clear the ITTE anyway and the * pending state is a property of the ITTE struct. */ + vgic_its_invalidate_cache(kvm); + its_free_ite(kvm, ite); return 0; } @@ -768,6 +770,8 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its, ite->collection = collection; vcpu = kvm_get_vcpu(kvm, collection->target_addr);
+ vgic_its_invalidate_cache(kvm); + return update_affinity(ite->irq, vcpu); }
@@ -996,6 +1000,8 @@ static void vgic_its_free_device(struct kvm *kvm, struct its_device *device) list_for_each_entry_safe(ite, temp, &device->itt_head, ite_list) its_free_ite(kvm, ite);
+ vgic_its_invalidate_cache(kvm); + list_del(&device->dev_list); kfree(device); } @@ -1101,6 +1107,7 @@ static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
if (!valid) { vgic_its_free_collection(its, coll_id); + vgic_its_invalidate_cache(kvm); } else { collection = find_collection(its, coll_id);
@@ -1249,6 +1256,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its, vgic_put_irq(kvm, irq); }
+ vgic_its_invalidate_cache(kvm); + kfree(intids); return 0; }
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit b4931afcde1ffccd4a406009aef33c14bc6c6cb8 category: feature feature: ITS translation cache
-------------------------------------------------
If a vcpu disables LPIs at its redistributor level, we need to make sure we won't pend more interrupts. For this, we need to invalidate the LPI translation cache.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-mmio-v3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index 9f4843f..2aea9f4 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -200,8 +200,10 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
vgic_cpu->lpis_enabled = val & GICR_CTLR_ENABLE_LPIS;
- if (was_enabled && !vgic_cpu->lpis_enabled) + if (was_enabled && !vgic_cpu->lpis_enabled) { vgic_flush_pending_lpis(vcpu); + vgic_its_invalidate_cache(vcpu->kvm); + }
if (!was_enabled && vgic_cpu->lpis_enabled) vgic_enable_lpis(vcpu);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 363518f37a86acc515defae6d1ba91a6c7617de9 category: feature feature: ITS translation cache
-------------------------------------------------
If an ITS gets disabled, we need to make sure that further interrupts won't hit in the cache. For that, we invalidate the translation cache when the ITS is disabled.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index aecf55f..d5b2406 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1608,6 +1608,8 @@ static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its, goto out;
its->enabled = !!(val & GITS_CTLR_ENABLE); + if (!its->enabled) + vgic_its_invalidate_cache(kvm);
/* * Try to process any pending commands. This function bails out early
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit cbfda481d87e92ce635e426099946cd413b251be category: feature feature: ITS translation cache
-------------------------------------------------
In order to avoid leaking vgic_irq structures on teardown, we need to drop all references to LPIs before deallocating the cache itself.
This is done by invalidating the cache on vgic teardown.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index d5b2406..a7d42bb 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1742,6 +1742,8 @@ void vgic_lpi_translation_cache_destroy(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_translation_cache_entry *cte, *tmp;
+ vgic_its_invalidate_cache(kvm); + list_for_each_entry_safe(cte, tmp, &dist->lpi_translation_cache, entry) { list_del(&cte->entry);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 89489ee9ced8924f64f99c0470eae38e9e4e204b category: feature feature: ITS translation cache
-------------------------------------------------
On a successful translation, preserve the parameters in the LPI translation cache. Each translation is reusing the last slot in the list, naturally evincting the least recently used entry.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 97 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index a7d42bb..a7c7270 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -546,6 +546,101 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm, return 0; }
+/** + * list_is_first -- tests whether @list is the first entry in list @head + * @list: the entry to test + * @head: the head of the list + */ +static inline int list_is_first(const struct list_head *list, + const struct list_head *head) +{ + return list->prev == head; +} + +static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist, + phys_addr_t db, + u32 devid, u32 eventid) +{ + struct vgic_translation_cache_entry *cte; + + list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { + /* + * If we hit a NULL entry, there is nothing after this + * point. + */ + if (!cte->irq) + break; + + if (cte->db != db || cte->devid != devid || + cte->eventid != eventid) + continue; + + /* + * Move this entry to the head, as it is the most + * recently used. + */ + if (!list_is_first(&cte->entry, &dist->lpi_translation_cache)) + list_move(&cte->entry, &dist->lpi_translation_cache); + + return cte->irq; + } + + return NULL; +} + +static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, + u32 devid, u32 eventid, + struct vgic_irq *irq) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_translation_cache_entry *cte; + unsigned long flags; + phys_addr_t db; + + /* Do not cache a directly injected interrupt */ + if (irq->hw) + return; + + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); + + if (unlikely(list_empty(&dist->lpi_translation_cache))) + goto out; + + /* + * We could have raced with another CPU caching the same + * translation behind our back, so let's check it is not in + * already + */ + db = its->vgic_its_base + GITS_TRANSLATER; + if (__vgic_its_check_cache(dist, db, devid, eventid)) + goto out; + + /* Always reuse the last entry (LRU policy) */ + cte = list_last_entry(&dist->lpi_translation_cache, + typeof(*cte), entry); + + /* + * Caching the translation implies having an extra reference + * to the interrupt, so drop the potential reference on what + * was in the cache, and increment it on the new interrupt. + */ + if (cte->irq) + __vgic_put_lpi_locked(kvm, cte->irq); + + vgic_get_irq_kref(irq); + + cte->db = db; + cte->devid = devid; + cte->eventid = eventid; + cte->irq = irq; + + /* Move the new translation to the head of the list */ + list_move(&cte->entry, &dist->lpi_translation_cache); + +out: + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); +} + void vgic_its_invalidate_cache(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; @@ -589,6 +684,8 @@ int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, if (!vcpu->arch.vgic_cpu.lpis_enabled) return -EBUSY;
+ vgic_its_cache_translation(kvm, its, devid, eventid, ite->irq); + *irq = ite->irq; return 0; }
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 86a7dae884f38e11f3af8137ddf1fb969cd2699c category: feature feature: ITS translation cache
-------------------------------------------------
When performing an MSI injection, let's first check if the translation is already in the cache. If so, let's inject it quickly without going through the whole translation process.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 36 ++++++++++++++++++++++++++++++++++++ virt/kvm/arm/vgic/vgic.h | 1 + 2 files changed, 37 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index a7c7270..b9ee395 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -588,6 +588,20 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist, return NULL; }
+static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db, + u32 devid, u32 eventid) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_irq *irq; + unsigned long flags; + + raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); + irq = __vgic_its_check_cache(dist, db, devid, eventid); + raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + + return irq; +} + static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, u32 devid, u32 eventid, struct vgic_irq *irq) @@ -747,6 +761,25 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its, return 0; }
+int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi) +{ + struct vgic_irq *irq; + unsigned long flags; + phys_addr_t db; + + db = (u64)msi->address_hi << 32 | msi->address_lo; + irq = vgic_its_check_cache(kvm, db, msi->devid, msi->data); + + if (!irq) + return -1; + + raw_spin_lock_irqsave(&irq->irq_lock, flags); + irq->pending_latch = true; + vgic_queue_irq_unlock(kvm, irq, flags); + + return 0; +} + /* * Queries the KVM IO bus framework to get the ITS pointer from the given * doorbell address. @@ -758,6 +791,9 @@ int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi) struct vgic_its *its; int ret;
+ if (!vgic_its_inject_cached_translation(kvm, msi)) + return 1; + its = vgic_msi_to_its(kvm, msi); if (IS_ERR(its)) return PTR_ERR(its); diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h index f5ae753..04e2f23 100644 --- a/virt/kvm/arm/vgic/vgic.h +++ b/virt/kvm/arm/vgic/vgic.h @@ -319,6 +319,7 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size) int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, u32 devid, u32 eventid, struct vgic_irq **irq); struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi); +int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi); void vgic_lpi_translation_cache_init(struct kvm *kvm); void vgic_lpi_translation_cache_destroy(struct kvm *kvm); void vgic_its_invalidate_cache(struct kvm *kvm);
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 41108170d98069b34837c4512f93ef2ce651b812 category: feature feature: ITS translation cache
-------------------------------------------------
Now that we have a cache of MSI->LPI translations, it is pretty easy to implement kvm_arch_set_irq_inatomic (this cache can be parsed without sleeping).
Hopefully, this will improve some LPI-heavy workloads.
Tested-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-irqfd.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-irqfd.c b/virt/kvm/arm/vgic/vgic-irqfd.c index 99e026d..9f203ed 100644 --- a/virt/kvm/arm/vgic/vgic-irqfd.c +++ b/virt/kvm/arm/vgic/vgic-irqfd.c @@ -77,6 +77,15 @@ int kvm_set_routing_entry(struct kvm *kvm, return r; }
+static void kvm_populate_msi(struct kvm_kernel_irq_routing_entry *e, + struct kvm_msi *msi) +{ + msi->address_lo = e->msi.address_lo; + msi->address_hi = e->msi.address_hi; + msi->data = e->msi.data; + msi->flags = e->msi.flags; + msi->devid = e->msi.devid; +} /** * kvm_set_msi: inject the MSI corresponding to the * MSI routing entry @@ -90,21 +99,36 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, { struct kvm_msi msi;
- msi.address_lo = e->msi.address_lo; - msi.address_hi = e->msi.address_hi; - msi.data = e->msi.data; - msi.flags = e->msi.flags; - msi.devid = e->msi.devid; - if (!vgic_has_its(kvm)) return -ENODEV;
if (!level) return -1;
+ kvm_populate_msi(e, &msi); return vgic_its_inject_msi(kvm, &msi); }
+/** + * kvm_arch_set_irq_inatomic: fast-path for irqfd injection + * + * Currently only direct MSI injecton is supported. + */ +int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e, + struct kvm *kvm, int irq_source_id, int level, + bool line_status) +{ + if (e->type == KVM_IRQ_ROUTING_MSI && vgic_has_its(kvm) && level) { + struct kvm_msi msi; + + kvm_populate_msi(e, &msi); + if (!vgic_its_inject_cached_translation(kvm, &msi)) + return 0; + } + + return -EWOULDBLOCK; +} + int kvm_vgic_setup_default_irq_routing(struct kvm *kvm) { struct kvm_irq_routing_entry *entries;
From: honghao honghao5@huawei.com
euleros inclusion category: feature feature: Hisi CPU type probe
-------------------------------------------------
Parse ACPI/DTB to get where the hypervisor is running.
Reviewed-by: zhanghailiang zhang.zhanghailiang@huawei.com Signed-off-by: honghao honghao5@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/hisi_cpu_model.h | 16 +++++++ arch/arm/include/asm/kvm_host.h | 1 + arch/arm64/include/asm/hisi_cpu_model.h | 19 ++++++++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Makefile | 1 + virt/kvm/arm/arm.c | 6 +++ virt/kvm/arm/hisi_cpu_model.c | 82 +++++++++++++++++++++++++++++++++ 7 files changed, 126 insertions(+) create mode 100644 arch/arm/include/asm/hisi_cpu_model.h create mode 100644 arch/arm64/include/asm/hisi_cpu_model.h create mode 100644 virt/kvm/arm/hisi_cpu_model.c
diff --git a/arch/arm/include/asm/hisi_cpu_model.h b/arch/arm/include/asm/hisi_cpu_model.h new file mode 100644 index 00000000..16aea01 --- /dev/null +++ b/arch/arm/include/asm/hisi_cpu_model.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + */ + +#ifndef __HISI_CPU_MODEL_H__ +#define __HISI_CPU_MODEL_H__ + +enum hisi_cpu_type { + UNKNOWN_HI_TYPE +}; + +extern enum hisi_cpu_type hi_cpu_type; + +void probe_hisi_cpu_type(void); +#endif /* __HISI_CPU_MODEL_H__ */ diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 5027f97..b736a45 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -27,6 +27,7 @@ #include <asm/kvm_mmio.h> #include <asm/fpstate.h> #include <kvm/arm_arch_timer.h> +#include <asm/hisi_cpu_model.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
diff --git a/arch/arm64/include/asm/hisi_cpu_model.h b/arch/arm64/include/asm/hisi_cpu_model.h new file mode 100644 index 00000000..f686a75 --- /dev/null +++ b/arch/arm64/include/asm/hisi_cpu_model.h @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + */ + +#ifndef __HISI_CPU_MODEL_H__ +#define __HISI_CPU_MODEL_H__ + +enum hisi_cpu_type { + HI_1612, + HI_1616, + HI_1620, + UNKNOWN_HI_TYPE +}; + +extern enum hisi_cpu_type hi_cpu_type; + +void probe_hisi_cpu_type(void); +#endif /* __HISI_CPU_MODEL_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e57cc19..f013151 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -33,6 +33,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_mmio.h> #include <asm/thread_info.h> +#include <asm/hisi_cpu_model.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index f34cb49..fd930cd 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -34,6 +34,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hisi_cpu_model.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index e0ef9a9..264aba2 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -70,6 +70,9 @@
static bool vgic_present;
+/* Hisi cpu type enum */ +enum hisi_cpu_type hi_cpu_type = UNKNOWN_HI_TYPE; + static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu) @@ -1672,6 +1675,9 @@ int kvm_arch_init(void *opaque) int ret, cpu; bool in_hyp_mode;
+ /* Probe the Hisi CPU type */ + probe_hisi_cpu_type(); + if (!is_hyp_mode_available()) { kvm_info("HYP mode not available\n"); return -ENODEV; diff --git a/virt/kvm/arm/hisi_cpu_model.c b/virt/kvm/arm/hisi_cpu_model.c new file mode 100644 index 00000000..fb06e0b --- /dev/null +++ b/virt/kvm/arm/hisi_cpu_model.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + */ + +#include <linux/acpi.h> +#include <linux/of.h> +#include <linux/init.h> +#include <linux/kvm_host.h> + +#ifdef CONFIG_ACPI + +/* ACPI Hisi oem table id str */ +const char *oem_str[] = { + "HIP06", /* Hisi 1612 */ + "HIP07", /* Hisi 1616 */ + "HIP08" /* Hisi 1620 */ +}; + +/* + * Get Hisi oem table id. + */ +static void acpi_get_hw_cpu_type(void) +{ + struct acpi_table_header *table; + acpi_status status; + int i, str_size = ARRAY_SIZE(oem_str); + + /* Get oem table id from ACPI table header */ + status = acpi_get_table(ACPI_SIG_DSDT, 0, &table); + if (ACPI_FAILURE(status)) { + pr_err("Failed to get ACPI table: %s\n", + acpi_format_exception(status)); + return; + } + + for (i = 0; i < str_size; ++i) { + if (!strncmp(oem_str[i], table->oem_table_id, 5)) { + hi_cpu_type = i; + return; + } + } +} + +#else +static void acpi_get_hw_cpu_type(void) {} +#endif + +/* of Hisi cpu model str */ +const char *of_model_str[] = { + "Hi1612", + "Hi1616" +}; + +static void of_get_hw_cpu_type(void) +{ + const char *cpu_type; + int ret, i, str_size = ARRAY_SIZE(of_model_str); + + ret = of_property_read_string(of_root, "model", &cpu_type); + if (ret < 0) { + pr_err("Failed to get Hisi cpu model by OF.\n"); + return; + } + + for (i = 0; i < str_size; ++i) { + if (strstr(cpu_type, of_model_str[i])) { + hi_cpu_type = i; + return; + } + } +} + +void probe_hisi_cpu_type(void) +{ + if (!acpi_disabled) + acpi_get_hw_cpu_type(); + else + of_get_hw_cpu_type(); + + WARN_ON(hi_cpu_type == UNKNOWN_HI_TYPE); +}
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: Hisi non-cacheable snoopy support
-------------------------------------------------
Kunpeng 920 offers the HHA ncsnp capability, with which hypervisor doesn't need to perform a lot of cache maintenance like before (in case the guest has some non-cacheable Stage-1 mappings). Currently we apply this hardware capability when
- VCPU switching MMU+caches on/off - creating Stage-2 mappings for Daborts
Reviewed-by: zhanghailiang zhang.zhanghailiang@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/hisi_cpu_model.h | 2 ++ arch/arm64/include/asm/hisi_cpu_model.h | 2 ++ virt/kvm/arm/arm.c | 2 ++ virt/kvm/arm/hisi_cpu_model.c | 34 +++++++++++++++++++++++++++++++++ virt/kvm/arm/mmu.c | 4 ++-- 5 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/hisi_cpu_model.h b/arch/arm/include/asm/hisi_cpu_model.h index 16aea01..54cc3c2 100644 --- a/arch/arm/include/asm/hisi_cpu_model.h +++ b/arch/arm/include/asm/hisi_cpu_model.h @@ -11,6 +11,8 @@ enum hisi_cpu_type { };
extern enum hisi_cpu_type hi_cpu_type; +extern bool kvm_ncsnp_support;
void probe_hisi_cpu_type(void); +void probe_hisi_ncsnp_support(void); #endif /* __HISI_CPU_MODEL_H__ */ diff --git a/arch/arm64/include/asm/hisi_cpu_model.h b/arch/arm64/include/asm/hisi_cpu_model.h index f686a75..e0da0ef 100644 --- a/arch/arm64/include/asm/hisi_cpu_model.h +++ b/arch/arm64/include/asm/hisi_cpu_model.h @@ -14,6 +14,8 @@ enum hisi_cpu_type { };
extern enum hisi_cpu_type hi_cpu_type; +extern bool kvm_ncsnp_support;
void probe_hisi_cpu_type(void); +void probe_hisi_ncsnp_support(void); #endif /* __HISI_CPU_MODEL_H__ */ diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 264aba2..cc21446 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -72,6 +72,7 @@
/* Hisi cpu type enum */ enum hisi_cpu_type hi_cpu_type = UNKNOWN_HI_TYPE; +bool kvm_ncsnp_support;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
@@ -1677,6 +1678,7 @@ int kvm_arch_init(void *opaque)
/* Probe the Hisi CPU type */ probe_hisi_cpu_type(); + probe_hisi_ncsnp_support();
if (!is_hyp_mode_available()) { kvm_info("HYP mode not available\n"); diff --git a/virt/kvm/arm/hisi_cpu_model.c b/virt/kvm/arm/hisi_cpu_model.c index fb06e0b..b0f7e0d 100644 --- a/virt/kvm/arm/hisi_cpu_model.c +++ b/virt/kvm/arm/hisi_cpu_model.c @@ -80,3 +80,37 @@ void probe_hisi_cpu_type(void)
WARN_ON(hi_cpu_type == UNKNOWN_HI_TYPE); } + +#define NCSNP_MMIO_BASE 0x20107E238 + +/* + * We have the fantastic HHA ncsnp capability on Kunpeng 920, + * with which hypervisor doesn't need to perform a lot of cache + * maintenance like before (in case the guest has non-cacheable + * Stage-1 mappings). + */ +void probe_hisi_ncsnp_support(void) +{ + void __iomem *base; + unsigned int high; + + kvm_ncsnp_support = false; + + if (hi_cpu_type != HI_1620) + goto out; + + base = ioremap(NCSNP_MMIO_BASE, 4); + if (!base) { + pr_err("Unable to map MMIO region when probing ncsnp!\n"); + goto out; + } + + high = readl_relaxed(base) >> 28; + iounmap(base); + if (high != 0x1) + kvm_ncsnp_support = true; + +out: + kvm_info("Hisi ncsnp: %s\n", kvm_ncsnp_support ? "enabled" : + "disabled"); +} diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index f74d1a7..3a280b3 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1806,7 +1806,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) kvm_set_pfn_dirty(pfn);
- if (fault_status != FSC_PERM) + if (fault_status != FSC_PERM && !kvm_ncsnp_support) clean_dcache_guest_page(pfn, vma_pagesize);
if (exec_fault) @@ -2465,7 +2465,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled) * If switching it off, need to clean the caches. * Clean + invalidate does the trick always. */ - if (now_enabled != was_enabled) + if (now_enabled != was_enabled && !kvm_ncsnp_support) stage2_flush_vm(vcpu->kvm);
/* Caches are now on, stop trapping VM ops (until a S/W op) */
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: perf kvm stat on arm64
picked from https://patchwork.kernel.org/patch/10989099/.
-------------------------------------------------
Currently, we use trace_kvm_exit() to report exception type (e.g., "IRQ", "TRAP") and exception class (ESR_ELx's bit[31:26]) together. But hardware only saves the exit class to ESR_ELx on synchronous exceptions, not on asynchronous exceptions. When the guest exits due to external interrupts, we will get tracing output like:
"kvm_exit: IRQ: HSR_EC: 0x0000 (UNKNOWN), PC: 0xffff87259e30"
Obviously, "HSR_EC" here is meaningless.
This patch splits "exit" and "trap" events by adding two tracepoints explictly in handle_trap_exceptions(). Let trace_kvm_exit() report VM exit events, and trace_kvm_trap_exit() report VM trap events.
These tracepoints are adjusted also in preparation for supporting 'perf kvm stat' on arm64.
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm64/kvm/handle_exit.c | 3 +++ arch/arm64/kvm/trace.h | 35 +++++++++++++++++++++++++++++++++++ virt/kvm/arm/arm.c | 4 ++-- virt/kvm/arm/trace.h | 21 +++++++++++---------- 4 files changed, 51 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 35a81beb..2349c26 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -226,7 +226,10 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run) exit_handle_fn exit_handler;
exit_handler = kvm_get_exit_handler(vcpu); + trace_kvm_trap_enter(vcpu->vcpu_id, + kvm_vcpu_trap_get_class(vcpu)); handled = exit_handler(vcpu, run); + trace_kvm_trap_exit(vcpu->vcpu_id); }
/* diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h index eab91ad..6014c73 100644 --- a/arch/arm64/kvm/trace.h +++ b/arch/arm64/kvm/trace.h @@ -8,6 +8,41 @@ #undef TRACE_SYSTEM #define TRACE_SYSTEM kvm
+TRACE_EVENT(kvm_trap_enter, + TP_PROTO(unsigned int vcpu_id, unsigned int esr_ec), + TP_ARGS(vcpu_id, esr_ec), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id) + __field(unsigned int, esr_ec) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + __entry->esr_ec = esr_ec; + ), + + TP_printk("VCPU %u: HSR_EC=0x%04x (%s)", + __entry->vcpu_id, + __entry->esr_ec, + __print_symbolic(__entry->esr_ec, kvm_arm_exception_class)) +); + +TRACE_EVENT(kvm_trap_exit, + TP_PROTO(unsigned int vcpu_id), + TP_ARGS(vcpu_id), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu_id; + ), + + TP_printk("VCPU %u", __entry->vcpu_id) +); + TRACE_EVENT(kvm_wfx_arm64, TP_PROTO(unsigned long vcpu_pc, bool is_wfe), TP_ARGS(vcpu_pc, is_wfe), diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index cc21446..5611d6c 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -788,7 +788,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) /************************************************************** * Enter the guest */ - trace_kvm_entry(*vcpu_pc(vcpu)); + trace_kvm_entry(vcpu->vcpu_id, *vcpu_pc(vcpu)); guest_enter_irqoff();
if (has_vhe()) { @@ -852,7 +852,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) * guest time. */ guest_exit(); - trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); + trace_kvm_exit(vcpu->vcpu_id, ret, *vcpu_pc(vcpu));
/* Exit types that need handling before we can be preempted */ handle_exit_early(vcpu, run, ret); diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h index 3828bea..e32fee7 100644 --- a/virt/kvm/arm/trace.h +++ b/virt/kvm/arm/trace.h @@ -11,40 +11,41 @@ * Tracepoints for entry/exit to guest */ TRACE_EVENT(kvm_entry, - TP_PROTO(unsigned long vcpu_pc), - TP_ARGS(vcpu_pc), + TP_PROTO(unsigned int vcpu_id, unsigned long vcpu_pc), + TP_ARGS(vcpu_id, vcpu_pc),
TP_STRUCT__entry( + __field( unsigned int, vcpu_id ) __field( unsigned long, vcpu_pc ) ),
TP_fast_assign( + __entry->vcpu_id = vcpu_id; __entry->vcpu_pc = vcpu_pc; ),
- TP_printk("PC: 0x%08lx", __entry->vcpu_pc) + TP_printk("VCPU %u: PC=0x%08lx",__entry->vcpu_id, __entry->vcpu_pc) );
TRACE_EVENT(kvm_exit, - TP_PROTO(int ret, unsigned int esr_ec, unsigned long vcpu_pc), - TP_ARGS(ret, esr_ec, vcpu_pc), + TP_PROTO(unsigned int vcpu_id, int ret, unsigned long vcpu_pc), + TP_ARGS(vcpu_id, ret, vcpu_pc),
TP_STRUCT__entry( + __field( unsigned int, vcpu_id ) __field( int, ret ) - __field( unsigned int, esr_ec ) __field( unsigned long, vcpu_pc ) ),
TP_fast_assign( + __entry->vcpu_id = vcpu_id; __entry->ret = ARM_EXCEPTION_CODE(ret); - __entry->esr_ec = ARM_EXCEPTION_IS_TRAP(ret) ? esr_ec : 0; __entry->vcpu_pc = vcpu_pc; ),
- TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx", + TP_printk("VCPU %u: exit_type=%s, PC=0x%08lx", + __entry->vcpu_id, __print_symbolic(__entry->ret, kvm_arm_exception_type), - __entry->esr_ec, - __print_symbolic(__entry->esr_ec, kvm_arm_exception_class), __entry->vcpu_pc) );
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: perf kvm stat on arm64
picked from https://patchwork.kernel.org/patch/10989123/.
-------------------------------------------------
The get_cpuid() function returns the MIDR string of the first CPU.
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/perf/arch/arm64/util/header.c | 74 +++++++++++++++++++++++-------------- 1 file changed, 47 insertions(+), 27 deletions(-)
diff --git a/tools/perf/arch/arm64/util/header.c b/tools/perf/arch/arm64/util/header.c index 534cd25..5c17b86 100644 --- a/tools/perf/arch/arm64/util/header.c +++ b/tools/perf/arch/arm64/util/header.c @@ -9,17 +9,56 @@ #define MIDR_VARIANT_SHIFT 20 #define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
-char *get_cpuid_str(struct perf_pmu *pmu) +/* Return value of midr_el1 if success, else return 0. */ +static int read_midr_el1(char *buf, int cpu_nr) { - char *buf = NULL; char path[PATH_MAX]; const char *sysfs = sysfs__mountpoint(); + u64 midr = 0; + FILE *file; + + if (!sysfs) + return 0; + + scnprintf(path, PATH_MAX, "%s/devices/system/cpu/cpu%d"MIDR, + sysfs, cpu_nr); + + file = fopen(path, "r"); + if (!file) { + pr_debug("fopen failed for file %s\n", path); + return 0; + } + + if (!fgets(buf, MIDR_SIZE, file)) { + fclose(file); + return 0; + } + fclose(file); + + /* Ignore/clear Variant[23:20] and Revision[3:0] of MIDR */ + midr = strtoul(buf, NULL, 16); + midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK)); + scnprintf(buf, MIDR_SIZE, "0x%016lx", midr); + + return midr; +} + +int get_cpuid(char *buffer, size_t sz __maybe_unused) +{ + if (read_midr_el1(buffer, 0)) + return 0; + + return -1; +} + +char *get_cpuid_str(struct perf_pmu *pmu) +{ + char *buf = NULL; int cpu; u64 midr = 0; struct cpu_map *cpus; - FILE *file;
- if (!sysfs || !pmu || !pmu->cpus) + if (!pmu || !pmu->cpus) return NULL;
buf = malloc(MIDR_SIZE); @@ -29,29 +68,10 @@ char *get_cpuid_str(struct perf_pmu *pmu) /* read midr from list of cpus mapped to this pmu */ cpus = cpu_map__get(pmu->cpus); for (cpu = 0; cpu < cpus->nr; cpu++) { - scnprintf(path, PATH_MAX, "%s/devices/system/cpu/cpu%d"MIDR, - sysfs, cpus->map[cpu]); - - file = fopen(path, "r"); - if (!file) { - pr_debug("fopen failed for file %s\n", path); - continue; - } - - if (!fgets(buf, MIDR_SIZE, file)) { - fclose(file); - continue; - } - fclose(file); - - /* Ignore/clear Variant[23:20] and - * Revision[3:0] of MIDR - */ - midr = strtoul(buf, NULL, 16); - midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK)); - scnprintf(buf, MIDR_SIZE, "0x%016lx", midr); - /* got midr break loop */ - break; + midr = read_midr_el1(buf, cpus->map[cpu]); + if (midr) + /* got midr break loop */ + break; }
if (!midr) {
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: perf kvm stat on arm64
picked from https://patchwork.kernel.org/patch/10989119/.
-------------------------------------------------
'perf kvm stat report/record' generates a statistical analysis of KVM events and can be used to analyze guest exit reasons. This patch tries to add stat support on arm64.
We have a mapping between guest's "exit_code" and "exit_reason" which already exists under arch/arm64/include/asm/ (kvm_arm_exception_type), and we've used it to report guest's exit type through trace_kvm_exit(). Copy kvm_arm_exception_type into aarch64_guest_exits.h, thus export it to userspace.
It records on two available KVM tracepoints for arm64: "kvm:kvm_entry" and "kvm:kvm_exit", and reports statistical data which includes events handles time, samples, and so on.
A simple test go below:
# pgrep qemu 6039 9937
# ./tools/perf/perf kvm stat record -p 6039 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 15.629 MB perf.data.guest (199063 samples) ]
# ./tools/perf/perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
TRAP 49040 97.15% 100.00% 2.60us 4072.98us 3431.60us ( +- 0.17% ) IRQ 1437 2.85% 0.00% 0.90us 24.56us 2.06us ( +- 1.37% )
Total Samples:50477, Total events handled time:168288630.04us.
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/perf/arch/arm64/Makefile | 2 + tools/perf/arch/arm64/util/Build | 1 + tools/perf/arch/arm64/util/aarch64_guest_exits.h | 27 ++++++++++++++ tools/perf/arch/arm64/util/kvm-stat.c | 47 ++++++++++++++++++++++++ 4 files changed, 77 insertions(+) create mode 100644 tools/perf/arch/arm64/util/aarch64_guest_exits.h create mode 100644 tools/perf/arch/arm64/util/kvm-stat.c
diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile index dbef716..172146e 100644 --- a/tools/perf/arch/arm64/Makefile +++ b/tools/perf/arch/arm64/Makefile @@ -2,6 +2,8 @@ ifndef NO_DWARF PERF_HAVE_DWARF_REGS := 1 endif + +HAVE_KVM_STAT_SUPPORT := 1 PERF_HAVE_JITDUMP := 1 PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build index 9962187..393b989 100644 --- a/tools/perf/arch/arm64/util/Build +++ b/tools/perf/arch/arm64/util/Build @@ -1,6 +1,7 @@ libperf-y += header.o libperf-y += tsc.o libperf-y += sym-handling.o +libperf-y += kvm-stat.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o diff --git a/tools/perf/arch/arm64/util/aarch64_guest_exits.h b/tools/perf/arch/arm64/util/aarch64_guest_exits.h new file mode 100644 index 00000000..aec2e6e --- /dev/null +++ b/tools/perf/arch/arm64/util/aarch64_guest_exits.h @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + */ + +#ifndef ARCH_PERF_AARCH64_GUEST_EXITS_H +#define ARCH_PERF_AARCH64_GUEST_EXITS_H + +/* virt.h */ +/* Error returned when an invalid stub number is passed into x0 */ +#define HVC_STUB_ERR 0xbadca11 + +/* kvm_asm.h */ +#define ARM_EXCEPTION_IRQ 0 +#define ARM_EXCEPTION_EL1_SERROR 1 +#define ARM_EXCEPTION_TRAP 2 +#define ARM_EXCEPTION_IL 3 +/* The hyp-stub will return this for any kvm_call_hyp() call */ +#define ARM_EXCEPTION_HYP_GONE HVC_STUB_ERR + +#define kvm_arm_exception_type \ + {ARM_EXCEPTION_IRQ, "IRQ" }, \ + {ARM_EXCEPTION_EL1_SERROR, "SERROR" }, \ + {ARM_EXCEPTION_TRAP, "TRAP" }, \ + {ARM_EXCEPTION_HYP_GONE, "HYP_GONE" } + +#endif /* ARCH_PERF_AARCH64_GUEST_EXITS_H */ diff --git a/tools/perf/arch/arm64/util/kvm-stat.c b/tools/perf/arch/arm64/util/kvm-stat.c new file mode 100644 index 00000000..b60b0ba --- /dev/null +++ b/tools/perf/arch/arm64/util/kvm-stat.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Arch specific functions for perf kvm stat. + * Copyright(c) 2019 Huawei Technologies Co., Ltd + */ +#include <errno.h> +#include "../../util/kvm-stat.h" +#include "../../util/evsel.h" +#include "aarch64_guest_exits.h" + +define_exit_reasons_table(arm64_exit_reasons, kvm_arm_exception_type); + +static struct kvm_events_ops exit_events = { + .is_begin_event = exit_event_begin, + .is_end_event = exit_event_end, + .decode_key = exit_event_decode_key, + .name = "VM-EXIT" +}; + +const char *vcpu_id_str = "vcpu_id"; +const int decode_str_len = 20; +const char *kvm_exit_reason = "ret"; +const char *kvm_entry_trace = "kvm:kvm_entry"; +const char *kvm_exit_trace = "kvm:kvm_exit"; + +const char *kvm_events_tp[] = { + "kvm:kvm_entry", + "kvm:kvm_exit", + NULL, +}; + +struct kvm_reg_events_ops kvm_reg_events_ops[] = { + { .name = "vmexit", .ops = &exit_events }, + { NULL, NULL }, +}; + +const char * const kvm_skip_events[] = { + NULL, +}; + +int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused) +{ + kvm->exit_reasons = arm64_exit_reasons; + kvm->exit_reasons_isa = "aarch64"; + + return 0; +}
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: perf kvm stat on arm64
picked from https://patchwork.kernel.org/patch/10989121/.
-------------------------------------------------
When guest exits due to "TRAP", we can analyze the guest exit reasons deeplier. Enhance perf-kvm-stat to record and analyze VM TRAP events.
There is a mapping between guest's "trap_code" (ESR_ELx's bits[31:26]) and "trap_reason" - kvm_arm_exception_class. Copy it from kernel to aarch64_guest_exits.h, export it to userspace.
This patch records two new KVM tracepoints: "kvm:kvm_trap_enter" and "kvm:kvm_trap_exit", and reports statistical data between these two tracepoints.
A simple test go below:
# ./tools/perf/perf kvm stat record -p 20763 [ perf record: Woken up 92 times to write data ] [ perf record: Captured and wrote 203.727 MB perf.data.guest (2601786 samples) ]
# ./tools/perf/perf kvm stat report --event=vmexit
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
TRAP 640931 97.12% 100.00% 2.44us 14683.86us 3446.49us ( +- 0.05% ) IRQ 19019 2.88% 0.00% 0.90us 461.94us 2.12us ( +- 2.09% )
Total Samples:659950, Total events handled time:2209005391.30us.
# ./tools/perf/perf kvm stat report --event=trap
Analyze events for all VMs, all VCPUs:
TRAP-EVENT Samples Samples% Time% Min Time Max Time Avg time
WFx 601194 93.80% 99.98% 0.90us 4294.04us 3671.01us ( +- 0.03% ) SYS64 33714 5.26% 0.01% 1.10us 41.34us 5.68us ( +- 0.18% ) DABT_LOW 6014 0.94% 0.00% 1.12us 18.04us 2.57us ( +- 0.91% ) IABT_LOW 12 0.00% 0.01% 12597.76us 14679.96us 12893.61us ( +- 1.34% )
Total Samples:640934, Total events handled time:2207353434.56us.
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- tools/perf/arch/arm64/util/aarch64_guest_exits.h | 68 ++++++++++++++++++++++++ tools/perf/arch/arm64/util/kvm-stat.c | 65 ++++++++++++++++++++++ 2 files changed, 133 insertions(+)
diff --git a/tools/perf/arch/arm64/util/aarch64_guest_exits.h b/tools/perf/arch/arm64/util/aarch64_guest_exits.h index aec2e6e..59fc256 100644 --- a/tools/perf/arch/arm64/util/aarch64_guest_exits.h +++ b/tools/perf/arch/arm64/util/aarch64_guest_exits.h @@ -24,4 +24,72 @@ {ARM_EXCEPTION_TRAP, "TRAP" }, \ {ARM_EXCEPTION_HYP_GONE, "HYP_GONE" }
+/* esr.h */ +#define ESR_ELx_EC_UNKNOWN (0x00) +#define ESR_ELx_EC_WFx (0x01) +/* Unallocated EC: 0x02 */ +#define ESR_ELx_EC_CP15_32 (0x03) +#define ESR_ELx_EC_CP15_64 (0x04) +#define ESR_ELx_EC_CP14_MR (0x05) +#define ESR_ELx_EC_CP14_LS (0x06) +#define ESR_ELx_EC_FP_ASIMD (0x07) +#define ESR_ELx_EC_CP10_ID (0x08) /* EL2 only */ +#define ESR_ELx_EC_PAC (0x09) /* EL2 and above */ +/* Unallocated EC: 0x0A - 0x0B */ +#define ESR_ELx_EC_CP14_64 (0x0C) +/* Unallocated EC: 0x0d */ +#define ESR_ELx_EC_ILL (0x0E) +/* Unallocated EC: 0x0F - 0x10 */ +#define ESR_ELx_EC_SVC32 (0x11) +#define ESR_ELx_EC_HVC32 (0x12) /* EL2 only */ +#define ESR_ELx_EC_SMC32 (0x13) /* EL2 and above */ +/* Unallocated EC: 0x14 */ +#define ESR_ELx_EC_SVC64 (0x15) +#define ESR_ELx_EC_HVC64 (0x16) /* EL2 and above */ +#define ESR_ELx_EC_SMC64 (0x17) /* EL2 and above */ +#define ESR_ELx_EC_SYS64 (0x18) +#define ESR_ELx_EC_SVE (0x19) +/* Unallocated EC: 0x1A - 0x1E */ +#define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */ +#define ESR_ELx_EC_IABT_LOW (0x20) +#define ESR_ELx_EC_IABT_CUR (0x21) +#define ESR_ELx_EC_PC_ALIGN (0x22) +/* Unallocated EC: 0x23 */ +#define ESR_ELx_EC_DABT_LOW (0x24) +#define ESR_ELx_EC_DABT_CUR (0x25) +#define ESR_ELx_EC_SP_ALIGN (0x26) +/* Unallocated EC: 0x27 */ +#define ESR_ELx_EC_FP_EXC32 (0x28) +/* Unallocated EC: 0x29 - 0x2B */ +#define ESR_ELx_EC_FP_EXC64 (0x2C) +/* Unallocated EC: 0x2D - 0x2E */ +#define ESR_ELx_EC_SERROR (0x2F) +#define ESR_ELx_EC_BREAKPT_LOW (0x30) +#define ESR_ELx_EC_BREAKPT_CUR (0x31) +#define ESR_ELx_EC_SOFTSTP_LOW (0x32) +#define ESR_ELx_EC_SOFTSTP_CUR (0x33) +#define ESR_ELx_EC_WATCHPT_LOW (0x34) +#define ESR_ELx_EC_WATCHPT_CUR (0x35) +/* Unallocated EC: 0x36 - 0x37 */ +#define ESR_ELx_EC_BKPT32 (0x38) +/* Unallocated EC: 0x39 */ +#define ESR_ELx_EC_VECTOR32 (0x3A) /* EL2 only */ +/* Unallocted EC: 0x3B */ +#define ESR_ELx_EC_BRK64 (0x3C) +/* Unallocated EC: 0x3D - 0x3F */ +#define ESR_ELx_EC_MAX (0x3F) + +/* kvm_arm.h */ +#define ECN(x) { ESR_ELx_EC_##x, #x } + +#define kvm_arm_exception_class \ + ECN(UNKNOWN), ECN(WFx), ECN(CP15_32), ECN(CP15_64), ECN(CP14_MR), \ + ECN(CP14_LS), ECN(FP_ASIMD), ECN(CP10_ID), ECN(CP14_64), ECN(SVC64), \ + ECN(HVC64), ECN(SMC64), ECN(SYS64), ECN(IMP_DEF), ECN(IABT_LOW), \ + ECN(IABT_CUR), ECN(PC_ALIGN), ECN(DABT_LOW), ECN(DABT_CUR), \ + ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \ + ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \ + ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \ + ECN(BKPT32), ECN(VECTOR32), ECN(BRK64) + #endif /* ARCH_PERF_AARCH64_GUEST_EXITS_H */ diff --git a/tools/perf/arch/arm64/util/kvm-stat.c b/tools/perf/arch/arm64/util/kvm-stat.c index b60b0ba..7e70d93 100644 --- a/tools/perf/arch/arm64/util/kvm-stat.c +++ b/tools/perf/arch/arm64/util/kvm-stat.c @@ -9,6 +9,7 @@ #include "aarch64_guest_exits.h"
define_exit_reasons_table(arm64_exit_reasons, kvm_arm_exception_type); +define_exit_reasons_table(arm64_trap_reasons, kvm_arm_exception_class);
static struct kvm_events_ops exit_events = { .is_begin_event = exit_event_begin, @@ -23,14 +24,78 @@ const char *kvm_entry_trace = "kvm:kvm_entry"; const char *kvm_exit_trace = "kvm:kvm_exit";
+const char *kvm_trap_reason = "esr_ec"; +const char *kvm_trap_enter_trace = "kvm:kvm_trap_enter"; +const char *kvm_trap_exit_trace = "kvm:kvm_trap_exit"; + +static void trap_event_get_key(struct perf_evsel *evsel, + struct perf_sample *sample, + struct event_key *key) +{ + key->info = 0; + key->key = perf_evsel__intval(evsel, sample, kvm_trap_reason); +} + +static const char *get_trap_reason(u64 exit_code) +{ + struct exit_reasons_table *tbl = arm64_trap_reasons; + + while (tbl->reason != NULL) { + if (tbl->exit_code == exit_code) + return tbl->reason; + tbl++; + } + + pr_err("Unknown kvm trap exit code: %lld on aarch64\n", + (unsigned long long)exit_code); + return "UNKNOWN"; +} + +static bool trap_event_end(struct perf_evsel *evsel, + struct perf_sample *sample __maybe_unused, + struct event_key *key __maybe_unused) +{ + return (!strcmp(evsel->name, kvm_trap_exit_trace)); +} + +static bool trap_event_begin(struct perf_evsel *evsel, + struct perf_sample *sample, struct event_key *key) +{ + if (!strcmp(evsel->name, kvm_trap_enter_trace)) { + trap_event_get_key(evsel, sample, key); + return true; + } + + return false; +} + +static void trap_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, + struct event_key *key, + char *decode) +{ + const char *trap_reason = get_trap_reason(key->key); + + scnprintf(decode, decode_str_len, "%s", trap_reason); +} + +static struct kvm_events_ops trap_events = { + .is_begin_event = trap_event_begin, + .is_end_event = trap_event_end, + .decode_key = trap_event_decode_key, + .name = "TRAP-EVENT", +}; + const char *kvm_events_tp[] = { "kvm:kvm_entry", "kvm:kvm_exit", + "kvm:kvm_trap_enter", + "kvm:kvm_trap_exit", NULL, };
struct kvm_reg_events_ops kvm_reg_events_ops[] = { { .name = "vmexit", .ops = &exit_events }, + { .name = "trap", .ops = &trap_events }, { NULL, NULL }, };
From: Feng Tiantian fengtiantian@huawei.com
euleros inclusion category: bugfix
picked from https://lore.kernel.org/patchwork/patch/1099831/.
-------------------------------------------------
While using "top" on a CentOS guest's VNC-client, then continuously press "Shift+PgUp", the guest kernel will get panic! Backtrace is attached below. We tested it on 5.2.0, and the issue remains.
[ 66.946362] Unable to handle kernel paging request at virtual address ffff00000e240840 [ 66.946363] Mem abort info: [ 66.946364] Exception class = DABT (current EL), IL = 32 bits [ 66.946365] SET = 0, FnV = 0 [ 66.946366] EA = 0, S1PTW = 0 [ 66.946367] Data abort info: [ 66.946368] ISV = 0, ISS = 0x00000047 [ 66.946368] CM = 0, WnR = 1 [ 66.946370] swapper pgtable: 64k pages, 48-bit VAs, pgd = ffff000009660000 [ 66.946372] [ffff00000e240840] *pgd=000000023ffe0003, *pud=000000023ffe0003, *pmd=000000023ffd0003, *pte=0000000000000000 [ 66.946378] Internal error: Oops: 96000047 [#1] SMP [ 66.946379] Modules linked in: vfat fat crc32_ce ghash_ce sg sha2_ce sha256_arm64 virtio_balloon virtio_net sha1_ce ip_tables ext4 mbcache jbd2 virtio_gpu drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i2c_core virtio_scsi virtio_pci virtio_mmio virtio_ring virtio [ 66.946403] CPU: 0 PID: 1035 Comm: top Not tainted 4.14.0-49.el7a.aarch64 #1 [ 66.946404] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 66.946405] task: ffff8001c18fdc00 task.stack: ffff00000d4e0000 [ 66.946409] PC is at sys_imageblit+0x40c/0x10000 [sysimgblt] [ 66.946431] LR is at drm_fb_helper_sys_imageblit+0x28/0x4c [drm_kms_helper] [ 66.946433] pc : [<ffff0000020a040c>] lr : [<ffff000002202e74>] pstate: 00000005 [ 66.946433] sp : ffff00000d4ef7f0 [ 66.946434] x29: ffff00000d4ef7f0 x28: 00000000000001ff [ 66.946436] x27: ffff8001c1c88100 x26: 0000000000000001 [ 66.946438] x25: 00000000000001f0 x24: 0000000000000018 [ 66.946440] x23: 0000000000000000 x22: ffff00000d4ef978 [ 66.946442] x21: ffff00000e240840 x20: 0000000000000000 [ 66.946444] x19: ffff8001c98c9000 x18: 0000fffff9d56670 [ 66.946445] x17: 0000000000000000 x16: 0000000000000000 [ 66.946447] x15: 0000000000000008 x14: 1b20202020202020 [ 66.946449] x13: 00000000000001f0 x12: 000000000000003e [ 66.946450] x11: 000000000000000f x10: ffff8001c8400000 [ 66.946452] x9 : 0000000000aaaaaa x8 : 0000000000000001 [ 66.946454] x7 : ffff0000020b0090 x6 : 0000000000000001 [ 66.946456] x5 : 0000000000000000 x4 : 0000000000000000 [ 66.946457] x3 : ffff8001c8400000 x2 : ffff00000e240840 [ 66.946459] x1 : 00000000000001ef x0 : 0000000000000007 [ 66.946461] Process top (pid: 1035, stack limit = 0xffff00000d4e0000) [ 66.946462] Call trace: [ 66.946464] Exception stack(0xffff00000d4ef6b0 to 0xffff00000d4ef7f0) [ 66.946465] f6a0: 0000000000000007 00000000000001ef [ 66.946467] f6c0: ffff00000e240840 ffff8001c8400000 0000000000000000 0000000000000000 [ 66.946468] f6e0: 0000000000000001 ffff0000020b0090 0000000000000001 0000000000aaaaaa [ 66.946470] f700: ffff8001c8400000 000000000000000f 000000000000003e 00000000000001f0 [ 66.946471] f720: 1b20202020202020 0000000000000008 0000000000000000 0000000000000000 [ 66.946472] f740: 0000fffff9d56670 ffff8001c98c9000 0000000000000000 ffff00000e240840 [ 66.946474] f760: ffff00000d4ef978 0000000000000000 0000000000000018 00000000000001f0 [ 66.946475] f780: 0000000000000001 ffff8001c1c88100 00000000000001ff ffff00000d4ef7f0 [ 66.946476] f7a0: ffff000002202e74 ffff00000d4ef7f0 ffff0000020a040c 0000000000000005 [ 66.946478] f7c0: ffff00000d4ef7e0 ffff0000080ea614 0001000000000000 ffff000008152f08 [ 66.946479] f7e0: ffff00000d4ef7f0 ffff0000020a040c [ 66.946481] [<ffff0000020a040c>] sys_imageblit+0x40c/0x10000 [sysimgblt] [ 66.946501] [<ffff000002202e74>] drm_fb_helper_sys_imageblit+0x28/0x4c [drm_kms_helper] [ 66.946510] [<ffff0000022a12dc>] virtio_gpu_3d_imageblit+0x2c/0x78 [virtio_gpu] [ 66.946515] [<ffff00000847f458>] bit_putcs+0x288/0x49c [ 66.946517] [<ffff00000847ad24>] fbcon_putcs+0x114/0x148 [ 66.946519] [<ffff0000084fe92c>] do_update_region+0x118/0x19c [ 66.946521] [<ffff00000850413c>] do_con_trol+0x114c/0x1314 [ 66.946523] [<ffff0000085044dc>] do_con_write.part.22+0x1d8/0x890 [ 66.946525] [<ffff000008504c88>] con_write+0x84/0x8c [ 66.946527] [<ffff0000084ec7f0>] n_tty_write+0x19c/0x408 [ 66.946529] [<ffff0000084e9120>] tty_write+0x150/0x270 [ 66.946532] [<ffff00000829d558>] __vfs_write+0x58/0x180 [ 66.946534] [<ffff00000829d880>] vfs_write+0xa8/0x1a0 [ 66.946536] [<ffff00000829db40>] SyS_write+0x60/0xc0 [ 66.946537] Exception stack(0xffff00000d4efec0 to 0xffff00000d4f0000) [ 66.946539] fec0: 0000000000000001 0000000000457958 0000000000000800 0000000000000000 [ 66.946540] fee0: 00000000fbad2885 0000000000000bd0 0000ffff8556add4 0000000000000000 [ 66.946541] ff00: 0000000000000040 0000000000000000 0000000000434a88 0000000000000012 [ 66.946543] ff20: 0000000100000000 0000fffff9d564f0 0000fffff9d564a0 0000000000000008 [ 66.946544] ff40: 0000000000000000 0000ffff85593b1c 0000fffff9d56670 0000000000000800 [ 66.946546] ff60: 0000000000457958 0000ffff856a1158 0000000000000800 0000ffff85720000 [ 66.946547] ff80: 0000000000000000 0000ffff856f604c 0000000000000000 0000000000436000 [ 66.946548] ffa0: 000000001c90a160 0000fffff9d56f20 0000ffff855965f4 0000fffff9d56f20 [ 66.946549] ffc0: 0000ffff855f12c8 0000000020000000 0000000000000001 0000000000000040 [ 66.946551] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 66.946554] [<ffff00000808359c>] __sys_trace_return+0x0/0x4 [ 66.946556] Code: 0a080084 b86478e4 0a040124 4a050084 (b9000044) [ 66.946561] ---[ end trace 32d49c68b19c4796 ]--- [ 66.946562] Kernel panic - not syncing: Fatal exception [ 66.946564] SMP: stopping secondary CPUs [ 66.946596] Kernel Offset: disabled [ 66.946598] CPU features: 0x1802008 [ 66.946598] Memory Limit: none [ 67.092353] ---[ end Kernel panic - not syncing: Fatal exception
From our non-expert analysis, fbcon ypos will sometimes over boundary and then fbcon_putcs() access invalid VGA framebuffer address. We modify the real_y() to make sure fbcon ypos is always less than rows.
Reported-by: Zengruan Ye yezengruan@huawei.com Signed-off-by: Feng Tiantian fengtiantian@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/video/fbdev/core/fbcon.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fbcon.h b/drivers/video/fbdev/core/fbcon.h index 21912a3..13ae6cd 100644 --- a/drivers/video/fbdev/core/fbcon.h +++ b/drivers/video/fbdev/core/fbcon.h @@ -230,7 +230,10 @@ static inline int real_y(struct display *p, int ypos) int rows = p->vrows;
ypos += p->yscroll; - return ypos < rows ? ypos : ypos - rows; + if (rows == 0) + return ypos; + else + return ypos < rows ? ypos : ypos % rows; }
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 07ab0f8d9a129add914aff3e988da4033471dfc0 category: feature feature: guest VLPI during the halt poll window can be immediately recognised
-------------------------------------------------
When a vpcu is about to block by calling kvm_vcpu_block, we call back into the arch code to allow any form of synchronization that may be required at this point (SVN stops the AVIC, ARM synchronises the VMCR and enables GICv4 doorbells). But this synchronization comes in quite late, as we've potentially waited for halt_poll_ns to expire.
Instead, let's move kvm_arch_vcpu_blocking() to the beginning of kvm_vcpu_block(), which on ARM has several benefits:
- VMCR gets synchronised early, meaning that any interrupt delivered during the polling window will be evaluated with the correct guest PMR - GICv4 doorbells are enabled, which means that any guest interrupt directly injected during that window will be immediately recognised
Tang Nianyao ran some tests on a GICv4 machine to evaluate such change, and reported up to a 10% improvement for netperf:
<quote> netperf result: D06 as server, intel 8180 server as client with change: package 512 bytes - 5500 Mbits/s package 64 bytes - 760 Mbits/s without change: package 512 bytes - 5000 Mbits/s package 64 bytes - 710 Mbits/s </quote>
Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/kvm_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d7b7418..9c17696 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2212,6 +2212,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) bool waited = false; u64 block_ns;
+ kvm_arch_vcpu_blocking(vcpu); + start = cur = ktime_get(); if (vcpu->halt_poll_ns) { ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns); @@ -2232,8 +2234,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) } while (single_task_running() && ktime_before(cur, stop)); }
- kvm_arch_vcpu_blocking(vcpu); - for (;;) { prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
@@ -2247,8 +2247,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) finish_swait(&vcpu->wq, &wait); cur = ktime_get();
- kvm_arch_vcpu_unblocking(vcpu); out: + kvm_arch_vcpu_unblocking(vcpu); block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
if (!vcpu_valid_wakeup(vcpu))
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.5-rc1 commit 11635fa26dc7a715f3fc1c351846859e90985ae1 category: bugfix
-------------------------------------------------
The VLPI map is currently a mutex, and that's a bad idea as this lock can be taken in non-preemptible contexts. Convert it to a raw spinlock, and turn the memory allocation of the VLPI map to be atomic.
Reported-by: Heyi Guo guoheyi@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20191108165805.3071-12-maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index ed26f3d..79f370eb 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -142,7 +142,7 @@ struct event_lpi_map { u16 *col_map; irq_hw_number_t lpi_base; int nr_lpis; - struct mutex vlpi_lock; + raw_spinlock_t vlpi_lock; struct its_vm *vm; struct its_vlpi_map *vlpi_maps; int nr_vlpis; @@ -1291,13 +1291,13 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) if (!info->map) return -EINVAL;
- mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock);
if (!its_dev->event_map.vm) { struct its_vlpi_map *maps;
maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), - GFP_KERNEL); + GFP_ATOMIC); if (!maps) { ret = -ENOMEM; goto out; @@ -1340,7 +1340,7 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) }
out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; }
@@ -1350,7 +1350,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) u32 event = its_get_event_id(d); int ret = 0;
- mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock);
if (!its_dev->event_map.vm || !its_dev->event_map.vlpi_maps[event].vm) { @@ -1362,7 +1362,7 @@ static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) *info->map = its_dev->event_map.vlpi_maps[event];
out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; }
@@ -1372,7 +1372,7 @@ static int its_vlpi_unmap(struct irq_data *d) u32 event = its_get_event_id(d); int ret = 0;
- mutex_lock(&its_dev->event_map.vlpi_lock); + raw_spin_lock(&its_dev->event_map.vlpi_lock);
if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { ret = -EINVAL; @@ -1402,7 +1402,7 @@ static int its_vlpi_unmap(struct irq_data *d) }
out: - mutex_unlock(&its_dev->event_map.vlpi_lock); + raw_spin_unlock(&its_dev->event_map.vlpi_lock); return ret; }
@@ -2463,7 +2463,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, dev->event_map.col_map = col_map; dev->event_map.lpi_base = lpi_base; dev->event_map.nr_lpis = nr_lpis; - mutex_init(&dev->event_map.vlpi_lock); + raw_spin_lock_init(&dev->event_map.vlpi_lock); dev->device_id = dev_id; INIT_LIST_HEAD(&dev->entry);
From: Heyi Guo guoheyi@huawei.com
euleros inclusion category: bugfix
This patch has already been queued in linux-next and will soon land on mainline tree. commit 52ff679c98731d3d3ae435e29c0620ae3c04a87d
-------------------------------------------------
There is no special reason to set virtual LPI pending table as non-shareable. If we choose to hard code the shareability without probing, inner-shareable will be a better choice, for all the other ITS/GICR tables prefer to be inner-shareable.
What's more, on Hisilicon hip08 it will trigger some kind of bus warning when mixing use of different shareabilities.
Signed-off-by: Heyi Guo guoheyi@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20191130073849.38378-1-guoheyi@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/irqchip/irq-gic-v3-its.c | 2 +- include/linux/irqchip/arm-gic-v3.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 79f370eb..21dfe22 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2839,7 +2839,7 @@ static void its_vpe_schedule(struct its_vpe *vpe) val = virt_to_phys(page_address(vpe->vpt_page)) & GENMASK_ULL(51, 16); val |= GICR_VPENDBASER_RaWaWb; - val |= GICR_VPENDBASER_NonShareable; + val |= GICR_VPENDBASER_InnerShareable; /* * There is no good way of finding out if the pending table is * empty as we can race against the doorbell interrupt very diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 16389db..83e47a2 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -274,6 +274,9 @@ #define GICR_VPENDBASER_NonShareable \ GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable)
+#define GICR_VPENDBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GICR_VPENDBASER, InnerShareable) + #define GICR_VPENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nCnB) #define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nC) #define GICR_VPENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWt)
From: Heyi Guo guoheyi@huawei.com
euleros inclusion category: bugfix
-------------------------------------------------
Pending LPIs may block new allocated LPIs in kdump secondary kernel. We only do that for guest kernel access to limit the change impact.
Signed-off-by: Heyi Guo guoheyi@huawei.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 41 +++++++++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index b9ee395..01bfa57 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1654,10 +1654,10 @@ static unsigned long vgic_mmio_read_its_baser(struct kvm *kvm, }
#define GITS_BASER_RO_MASK (GENMASK_ULL(52, 48) | GENMASK_ULL(58, 56)) -static void vgic_mmio_write_its_baser(struct kvm *kvm, - struct vgic_its *its, - gpa_t addr, unsigned int len, - unsigned long val) +static void vgic_mmio_write_its_baser_common(struct kvm *kvm, + struct vgic_its *its, + gpa_t addr, unsigned int len, + unsigned long val, bool uaccess) { const struct vgic_its_abi *abi = vgic_its_get_abi(its); u64 entry_size, table_type; @@ -1694,10 +1694,21 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm, *regptr = reg;
if (!(reg & GITS_BASER_VALID)) { + struct kvm_vcpu *vcpu; + int c; + /* Take the its_lock to prevent a race with a save/restore */ mutex_lock(&its->its_lock); switch (table_type) { case GITS_BASER_TYPE_DEVICE: + if (!uaccess) { + /* Fix kdump irq missing issue */ + pr_debug("%s: flush pending LPIs for all VCPUs.\n", + __func__); + kvm_for_each_vcpu(c, vcpu, kvm) + vgic_flush_pending_lpis(vcpu); + } + vgic_its_free_device_list(kvm, its); break; case GITS_BASER_TYPE_COLLECTION: @@ -1708,6 +1719,23 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm, } }
+static void vgic_mmio_write_its_baser(struct kvm *kvm, + struct vgic_its *its, + gpa_t addr, unsigned int len, + unsigned long val) +{ + vgic_mmio_write_its_baser_common(kvm, its, addr, len, val, false); +} + +static int vgic_mmio_uaccess_write_its_baser(struct kvm *kvm, + struct vgic_its *its, + gpa_t addr, unsigned int len, + unsigned long val) +{ + vgic_mmio_write_its_baser_common(kvm, its, addr, len, val, true); + return 0; +} + static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu, struct vgic_its *its, gpa_t addr, unsigned int len) @@ -1800,8 +1828,9 @@ static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its, vgic_mmio_read_its_creadr, its_mmio_write_wi, vgic_mmio_uaccess_write_its_creadr, 8, VGIC_ACCESS_64bit | VGIC_ACCESS_32bit), - REGISTER_ITS_DESC(GITS_BASER, - vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, 0x40, + REGISTER_ITS_DESC_UACCESS(GITS_BASER, + vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, + vgic_mmio_uaccess_write_its_baser, 0x40, VGIC_ACCESS_64bit | VGIC_ACCESS_32bit), REGISTER_ITS_DESC(GITS_IDREGS_BASE, vgic_mmio_read_its_idregs, its_mmio_write_wi, 0x30,
From: Andrew Murray andrew.murray@arm.com
mainline inclusion from mainline-v5.3-rc1 commit 30d97754b2d1bc4fd20f27c25fed92fc7ce39ce3 category: feature feature: PMU emulation improvements
-------------------------------------------------
The perf event sample_period is currently set based upon the current counter value, when PMXEVTYPER is written to and the perf event is created. However the user may choose to write the type before the counter value in which case sample_period will be set incorrectly. Let's instead decouple event creation from PMXEVTYPER and (re)create the event in either suitation.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Julien Thierry julien.thierry@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/pmu.c | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index d78df14..31a283d 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -25,6 +25,7 @@ #include <kvm/arm_pmu.h> #include <kvm/arm_vgic.h>
+static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx); /** * kvm_pmu_get_counter_value - get PMU counter value * @vcpu: The vcpu pointer @@ -63,6 +64,9 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) reg = (select_idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); + + /* Recreate the perf event to reflect the updated sample_period */ + kvm_pmu_create_perf_event(vcpu, select_idx); }
/** @@ -406,23 +410,21 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) }
/** - * kvm_pmu_set_counter_event_type - set selected counter to monitor some event + * kvm_pmu_create_perf_event - create a perf event for a counter * @vcpu: The vcpu pointer - * @data: The data guest writes to PMXEVTYPER_EL0 * @select_idx: The number of selected counter - * - * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an - * event with given hardware event number. Here we call perf_event API to - * emulate this action and create a kernel perf event for it. */ -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, - u64 select_idx) +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) { struct kvm_pmu *pmu = &vcpu->arch.pmu; struct kvm_pmc *pmc = &pmu->pmc[select_idx]; struct perf_event *event; struct perf_event_attr attr; - u64 eventsel, counter; + u64 eventsel, counter, reg, data; + + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; + data = __vcpu_sys_reg(vcpu, reg);
kvm_pmu_stop_counter(vcpu, pmc); eventsel = data & ARMV8_PMU_EVTYPE_EVENT; @@ -459,6 +461,28 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, pmc->perf_event = event; }
+/** + * kvm_pmu_set_counter_event_type - set selected counter to monitor some event + * @vcpu: The vcpu pointer + * @data: The data guest writes to PMXEVTYPER_EL0 + * @select_idx: The number of selected counter + * + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an + * event with given hardware event number. Here we call perf_event API to + * emulate this action and create a kernel perf event for it. + */ +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, + u64 select_idx) +{ + u64 reg, event_type = data & ARMV8_PMU_EVTYPE_MASK; + + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; + + __vcpu_sys_reg(vcpu, reg) = event_type; + kvm_pmu_create_perf_event(vcpu, select_idx); +} + bool kvm_arm_support_pmu_v3(void) { /*
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: bugfix
-------------------------------------------------
Commit 976d34e2dab1 ("KVM: arm/arm64: Skip updating PTE entry if no change") pointed out one refaulting scene caused by BBM, but failed to take the following scene into account:
logging is enabled: vcpu-0 permission fault - write to a RO address vcpu-1 translation fault - caused by vcpu-0's BBM
specifically:
vcpu-0 vcpu-1
*permission fault*
kvm_set_pte(pte, __pte(0)) kvm_tlb_flush_vmid_ipa(kvm, addr)
*translation fault*
*RO -> RW*
*RW -> RO*
KVM will busy doing this "RO->RW->RO" page table entry switch if we have *many* vcpus running in the logging process. Actually, we should avoid handling stage 2 translation faults (e.g., set_stage2_pte) if old_pte is *valid*, let us rework the skipping logic to fix the issue explained above.
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- arch/arm/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_arm.h | 1 + virt/kvm/arm/mmu.c | 50 +++++++++++++++++++++++++++------------- 3 files changed, 36 insertions(+), 16 deletions(-)
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index c3f1f9b..289ed7b 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -185,6 +185,7 @@ #define FSC_FAULT (0x04) #define FSC_ACCESS (0x08) #define FSC_PERM (0x0c) +#define FSC_IGNORE (-1) #define FSC_SEA (0x10) #define FSC_SEA_TTW0 (0x14) #define FSC_SEA_TTW1 (0x15) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index cac7eb0..fd85e54 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -301,6 +301,7 @@ #define FSC_FAULT ESR_ELx_FSC_FAULT #define FSC_ACCESS ESR_ELx_FSC_ACCESS #define FSC_PERM ESR_ELx_FSC_PERM +#define FSC_IGNORE (-1) #define FSC_SEA ESR_ELx_FSC_EXTABT #define FSC_SEA_TTW0 (0x14) #define FSC_SEA_TTW1 (0x15) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 3a280b3..f377352 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1056,7 +1056,8 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache }
static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache - *cache, phys_addr_t addr, const pmd_t *new_pmd) + *cache, phys_addr_t addr, const pmd_t *new_pmd, + unsigned long fault_status) { pmd_t *pmd, old_pmd;
@@ -1072,12 +1073,15 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * (pmd_clear() followed by tlb_flush()) process can * hinder forward progress due to refaults generated * on missing translations. - * - * Skip updating the page table if the entry is - * unchanged. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; + if (fault_status == FSC_PERM) { + /* Skip updating the page table if the entry is unchanged. */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + } else if (fault_status == FSC_FAULT) { + if (pmd_present(old_pmd) && !pmd_table(old_pmd)) + return 0; + }
if (pmd_present(old_pmd)) { /* @@ -1120,7 +1124,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache }
static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, - phys_addr_t addr, const pud_t *new_pudp) + phys_addr_t addr, const pud_t *new_pudp, + unsigned long fault_status) { pud_t *pudp, old_pud;
@@ -1135,8 +1140,13 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). * Skip updating the page tables if there is no change. */ - if (pud_val(old_pud) == pud_val(*new_pudp)) - return 0; + if (fault_status == FSC_PERM) { + if (pud_val(old_pud) == pud_val(*new_pudp)) + return 0; + } else if (fault_status == FSC_FAULT) { + if (pud_present(old_pud) && !pud_table(old_pud)) + return 0; + }
if (stage2_pud_present(kvm, old_pud)) { /* @@ -1223,7 +1233,7 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, phys_addr_t addr, const pte_t *new_pte, - unsigned long flags) + unsigned long flags, unsigned long fault_status) { pud_t *pud; pmd_t *pmd; @@ -1290,11 +1300,16 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
/* Create 2nd stage page table mapping - Level 3 */ old_pte = *pte; - if (pte_present(old_pte)) { + if (fault_status == FSC_PERM) { /* Skip page table update if there is no change */ if (pte_val(old_pte) == pte_val(*new_pte)) return 0; + } else if (fault_status == FSC_FAULT) { + if (pte_present(old_pte)) + return 0; + }
+ if (pte_present(old_pte)) { kvm_set_pte(pte, __pte(0)); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1363,7 +1378,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, goto out; spin_lock(&kvm->mmu_lock); ret = stage2_set_pte(kvm, &cache, addr, &pte, - KVM_S2PTE_FLAG_IS_IOMAP); + KVM_S2PTE_FLAG_IS_IOMAP, FSC_IGNORE); spin_unlock(&kvm->mmu_lock); if (ret) goto out; @@ -1833,7 +1848,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (needs_exec) new_pud = kvm_s2pud_mkexec(new_pud);
- ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud); + ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud, + fault_status); } else if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
@@ -1845,7 +1861,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd);
- ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); + ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd, + fault_status); } else { pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
@@ -1857,7 +1874,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte);
- ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags); + ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags, + fault_status); }
out_unlock: @@ -2083,7 +2101,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data * therefore stage2_set_pte() never needs to clear out a huge PMD * through this calling path. */ - stage2_set_pte(kvm, NULL, gpa, pte, 0); + stage2_set_pte(kvm, NULL, gpa, pte, 0, FSC_IGNORE); return 0; }
From: Xiangyou Xie xiexiangyou@huawei.com
euleros inclusion category: feature feature: ITS translation cache improvements
-------------------------------------------------
Because dist->lpi_list_lock is a perVM lock, when a virtual machine is configured with multiple virtual NIC devices and receives network packets at the same time, dist->lpi_list_lock will become a performance bottleneck.
This patch increases the number of lpi_translation_cache to eight, hashes the cpuid that executes irqfd_wakeup, and chooses which lpi_translation_cache to use.
I tested the impact of virtual interrupt injection time-consuming: Run the iperf command to send UDP packets to the VM: iperf -c $IP -u -b 40m -l 64 -t 6000& The vm just receive UDP traffic. When configure multiple NICs, each NIC receives the above iperf UDP traffic, This may reflect the performance impact of shared resource competition, such as lock.
Observing the delay of virtual interrupt injection: the time spent by the "irqfd_wakeup", "irqfd_inject" function, and kworker context switch. The less the better.
ITS translation cache greatly reduces the delay of interrupt injection compared to kworker thread, because it eliminate wakeup and uncertain scheduling delay: kworker ITS translation cache improved 1 NIC 6.692 us 1.766 us 73.6% 10 NICs 7.536 us 2.574 us 65.8%
Increases the number of lpi_translation_cache reduce lock competition. Multi-interrupt concurrent injections perform better:
ITS translation cache with patch improved 1 NIC 1.766 us 1.694 us 4.1% 10 NICs 2.574 us 1.848 us 28.2%
Signed-off-by: Xiangyou Xie xiexiangyou@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_vgic.h | 13 ++- virt/kvm/arm/vgic/vgic-init.c | 8 +- virt/kvm/arm/vgic/vgic-its.c | 211 +++++++++++++++++++++++++----------------- 3 files changed, 146 insertions(+), 86 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index eac0150..88142c1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -44,6 +44,9 @@ #define irq_is_spi(irq) ((irq) >= VGIC_NR_PRIVATE_IRQS && \ (irq) <= VGIC_MAX_SPI)
+/*The number of lpi translation cache lists*/ +#define LPI_TRANS_CACHES_NUM 8 + enum vgic_type { VGIC_V2, /* Good ol' GICv2 */ VGIC_V3, /* New fancy GICv3 */ @@ -173,6 +176,12 @@ struct vgic_io_device { struct kvm_io_device dev; };
+struct its_trans_cache { + /* LPI translation cache */ + struct list_head lpi_cache; + raw_spinlock_t lpi_cache_lock; +}; + struct vgic_its { /* The base address of the ITS control register frame */ gpa_t vgic_its_base; @@ -260,8 +269,8 @@ struct vgic_dist { struct list_head lpi_list_head; int lpi_list_count;
- /* LPI translation cache */ - struct list_head lpi_translation_cache; + /* LPI translation cache array*/ + struct its_trans_cache lpi_translation_cache[LPI_TRANS_CACHES_NUM];
/* used by vgic-debug */ struct vgic_state_iter *iter; diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 04bd2e4..683ae45 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -63,9 +63,15 @@ void kvm_vgic_early_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; + raw_spinlock_t *lpi_lock; + int i;
INIT_LIST_HEAD(&dist->lpi_list_head); - INIT_LIST_HEAD(&dist->lpi_translation_cache); + for (i = 0; i < LPI_TRANS_CACHES_NUM; i++) { + lpi_lock = &dist->lpi_translation_cache[i].lpi_cache_lock; + INIT_LIST_HEAD(&dist->lpi_translation_cache[i].lpi_cache); + raw_spin_lock_init(lpi_lock); + } raw_spin_lock_init(&dist->lpi_list_lock); }
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 01bfa57..d5ec21d 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -557,13 +557,21 @@ static inline int list_is_first(const struct list_head *list, return list->prev == head; }
+/* Default is 16 cached LPIs per vcpu */ +#define LPI_DEFAULT_PCPU_CACHE_SIZE 16 + static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist, phys_addr_t db, - u32 devid, u32 eventid) + u32 devid, u32 eventid, + int cacheid) { struct vgic_translation_cache_entry *cte; + struct vgic_irq *irq = NULL; + struct list_head *cache_head; + int pos = 0;
- list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { + cache_head = &dist->lpi_translation_cache[cacheid].lpi_cache; + list_for_each_entry(cte, cache_head, entry) { /* * If we hit a NULL entry, there is nothing after this * point. @@ -571,21 +579,25 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist, if (!cte->irq) break;
- if (cte->db != db || cte->devid != devid || - cte->eventid != eventid) - continue; + pos++;
- /* - * Move this entry to the head, as it is the most - * recently used. - */ - if (!list_is_first(&cte->entry, &dist->lpi_translation_cache)) - list_move(&cte->entry, &dist->lpi_translation_cache); + if (cte->devid == devid && + cte->eventid == eventid && + cte->db == db) { + /* + * Move this entry to the head if the entry at the + * position behind the LPI_DEFAULT_PCPU_CACHE_SIZE * 2 + * of the LRU list, as it is the most recently used. + */ + if (pos > LPI_DEFAULT_PCPU_CACHE_SIZE * 2) + list_move(&cte->entry, cache_head);
- return cte->irq; + irq = cte->irq; + break; + } }
- return NULL; + return irq; }
static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db, @@ -593,11 +605,15 @@ static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db, { struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_irq *irq; - unsigned long flags; + int cpu; + int cacheid;
- raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); - irq = __vgic_its_check_cache(dist, db, devid, eventid); - raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + cpu = smp_processor_id(); + cacheid = cpu % LPI_TRANS_CACHES_NUM; + + raw_spin_lock(&dist->lpi_translation_cache[cacheid].lpi_cache_lock); + irq = __vgic_its_check_cache(dist, db, devid, eventid, cacheid); + raw_spin_unlock(&dist->lpi_translation_cache[cacheid].lpi_cache_lock);
return irq; } @@ -610,49 +626,58 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, struct vgic_translation_cache_entry *cte; unsigned long flags; phys_addr_t db; + raw_spinlock_t *lpi_lock; + struct list_head *cache_head; + int cacheid;
/* Do not cache a directly injected interrupt */ if (irq->hw) return;
- raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); - - if (unlikely(list_empty(&dist->lpi_translation_cache))) - goto out; - - /* - * We could have raced with another CPU caching the same - * translation behind our back, so let's check it is not in - * already - */ - db = its->vgic_its_base + GITS_TRANSLATER; - if (__vgic_its_check_cache(dist, db, devid, eventid)) - goto out; - - /* Always reuse the last entry (LRU policy) */ - cte = list_last_entry(&dist->lpi_translation_cache, - typeof(*cte), entry); + for (cacheid = 0; cacheid < LPI_TRANS_CACHES_NUM; cacheid++) { + lpi_lock = &dist->lpi_translation_cache[cacheid].lpi_cache_lock; + cache_head = &dist->lpi_translation_cache[cacheid].lpi_cache; + raw_spin_lock_irqsave(lpi_lock, flags); + if (unlikely(list_empty(cache_head))) { + raw_spin_unlock_irqrestore(lpi_lock, flags); + break; + }
- /* - * Caching the translation implies having an extra reference - * to the interrupt, so drop the potential reference on what - * was in the cache, and increment it on the new interrupt. - */ - if (cte->irq) - __vgic_put_lpi_locked(kvm, cte->irq); + /* + * We could have raced with another CPU caching the same + * translation behind our back, so let's check it is not in + * already + */ + db = its->vgic_its_base + GITS_TRANSLATER; + if (__vgic_its_check_cache(dist, db, devid, eventid, cacheid)) { + raw_spin_unlock_irqrestore(lpi_lock, flags); + continue; + }
- vgic_get_irq_kref(irq); + /* Always reuse the last entry (LRU policy) */ + cte = list_last_entry(cache_head, typeof(*cte), entry);
- cte->db = db; - cte->devid = devid; - cte->eventid = eventid; - cte->irq = irq; + /* + * Caching the translation implies having an extra reference + * to the interrupt, so drop the potential reference on what + * was in the cache, and increment it on the new interrupt. + */ + if (cte->irq) { + raw_spin_lock(&dist->lpi_list_lock); + __vgic_put_lpi_locked(kvm, cte->irq); + raw_spin_unlock(&dist->lpi_list_lock); + } + vgic_get_irq_kref(irq);
- /* Move the new translation to the head of the list */ - list_move(&cte->entry, &dist->lpi_translation_cache); + cte->db = db; + cte->devid = devid; + cte->eventid = eventid; + cte->irq = irq;
-out: - raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); + /* Move the new translation to the head of the list */ + list_move(&cte->entry, cache_head); + raw_spin_unlock_irqrestore(lpi_lock, flags); + } }
void vgic_its_invalidate_cache(struct kvm *kvm) @@ -660,22 +685,29 @@ void vgic_its_invalidate_cache(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_translation_cache_entry *cte; unsigned long flags; + raw_spinlock_t *lpi_lock; + struct list_head *cache_head; + int i;
- raw_spin_lock_irqsave(&dist->lpi_list_lock, flags); - - list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { - /* - * If we hit a NULL entry, there is nothing after this - * point. - */ - if (!cte->irq) - break; - - __vgic_put_lpi_locked(kvm, cte->irq); - cte->irq = NULL; + for (i = 0; i < LPI_TRANS_CACHES_NUM; i++) { + lpi_lock = &dist->lpi_translation_cache[i].lpi_cache_lock; + cache_head = &dist->lpi_translation_cache[i].lpi_cache; + raw_spin_lock_irqsave(lpi_lock, flags); + list_for_each_entry(cte, cache_head, entry) { + /* + * If we hit a NULL entry, there is nothing after this + * point. + */ + if (!cte->irq) + break; + + raw_spin_lock(&dist->lpi_list_lock); + __vgic_put_lpi_locked(kvm, cte->irq); + raw_spin_unlock(&dist->lpi_list_lock); + cte->irq = NULL; + } + raw_spin_unlock_irqrestore(lpi_lock, flags); } - - raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags); }
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its, @@ -1872,30 +1904,34 @@ static int vgic_register_its_iodev(struct kvm *kvm, struct vgic_its *its, return ret; }
-/* Default is 16 cached LPIs per vcpu */ -#define LPI_DEFAULT_PCPU_CACHE_SIZE 16 - void vgic_lpi_translation_cache_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; unsigned int sz; + struct list_head *cache_head; int i; + int cacheid;
- if (!list_empty(&dist->lpi_translation_cache)) - return; + for (cacheid = 0; cacheid < LPI_TRANS_CACHES_NUM; cacheid++) { + cache_head = &dist->lpi_translation_cache[cacheid].lpi_cache; + if (!list_empty(cache_head)) + return; + }
sz = atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
- for (i = 0; i < sz; i++) { - struct vgic_translation_cache_entry *cte; - - /* An allocation failure is not fatal */ - cte = kzalloc(sizeof(*cte), GFP_KERNEL); - if (WARN_ON(!cte)) - break; - - INIT_LIST_HEAD(&cte->entry); - list_add(&cte->entry, &dist->lpi_translation_cache); + for (cacheid = 0; cacheid < LPI_TRANS_CACHES_NUM; cacheid++) { + cache_head = &dist->lpi_translation_cache[cacheid].lpi_cache; + for (i = 0; i < sz; i++) { + struct vgic_translation_cache_entry *cte; + + /* An allocation failure is not fatal */ + cte = kzalloc(sizeof(*cte), GFP_KERNEL); + if (WARN_ON(!cte)) + break; + INIT_LIST_HEAD(&cte->entry); + list_add(&cte->entry, cache_head); + } } }
@@ -1903,13 +1939,22 @@ void vgic_lpi_translation_cache_destroy(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_translation_cache_entry *cte, *tmp; + unsigned long flags; + raw_spinlock_t *lpi_lock; + struct list_head *cache_head; + int cacheid;
vgic_its_invalidate_cache(kvm);
- list_for_each_entry_safe(cte, tmp, - &dist->lpi_translation_cache, entry) { - list_del(&cte->entry); - kfree(cte); + for (cacheid = 0; cacheid < LPI_TRANS_CACHES_NUM; cacheid++) { + lpi_lock = &dist->lpi_translation_cache[cacheid].lpi_cache_lock; + cache_head = &dist->lpi_translation_cache[cacheid].lpi_cache; + raw_spin_lock_irqsave(lpi_lock, flags); + list_for_each_entry_safe(cte, tmp, cache_head, entry) { + list_del(&cte->entry); + kfree(cte); + } + raw_spin_unlock_irqrestore(lpi_lock, flags); } }
From: Xiangyou Xie xiexiangyou@huawei.com
euleros inclusion category: feature feature: ITS translation cache improvements
-------------------------------------------------
It is not necessary to invalidate the lpi translation cache when the virtual machine executes the movi instruction to adjust the affinity of the interrupt. Irqbalance will adjust the interrupt affinity in a short period of time to achieve the purpose of interrupting load balancing, but this does not affect the contents of the lpi translation cache.
Signed-off-by: Xiangyou Xie xiexiangyou@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Xiangyou Xie xiexiangyou@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index d5ec21d..0ee0986 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -935,7 +935,8 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its, ite->collection = collection; vcpu = kvm_get_vcpu(kvm, collection->target_addr);
- vgic_its_invalidate_cache(kvm); + if (!vcpu->arch.vgic_cpu.lpis_enabled) + vgic_its_invalidate_cache(kvm);
return update_affinity(ite->irq, vcpu); }
From: Yiwen Jiang jiangyiwen@huawei.com
euleros inclusion category: bugfix
-------------------------------------------------
lpi_cache_lock can be called in irq context, so it should use irqsave spinlock.
Signed-off-by: Yiwen Jiang jiangyiwen@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/vgic/vgic-its.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 0ee0986..e9a5716 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -605,15 +605,16 @@ static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db, { struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_irq *irq; + unsigned long flags; int cpu; int cacheid;
cpu = smp_processor_id(); cacheid = cpu % LPI_TRANS_CACHES_NUM;
- raw_spin_lock(&dist->lpi_translation_cache[cacheid].lpi_cache_lock); + raw_spin_lock_irqsave(&dist->lpi_translation_cache[cacheid].lpi_cache_lock, flags); irq = __vgic_its_check_cache(dist, db, devid, eventid, cacheid); - raw_spin_unlock(&dist->lpi_translation_cache[cacheid].lpi_cache_lock); + raw_spin_unlock_irqrestore(&dist->lpi_translation_cache[cacheid].lpi_cache_lock, flags);
return irq; }
From: Zenghui Yu yuzenghui@huawei.com
euleros inclusion category: feature feature: Hisi non-cacheable snoopy support
-------------------------------------------------
Currently we don't expose HiSi CPU type and any hardware capability to guest. Don't try to probe them if we're running in a guest kernel. This is detected by the WARN_ON() when probing CPU type in guest...
Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- virt/kvm/arm/arm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 5611d6c..6d66a8a 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1676,15 +1676,15 @@ int kvm_arch_init(void *opaque) int ret, cpu; bool in_hyp_mode;
- /* Probe the Hisi CPU type */ - probe_hisi_cpu_type(); - probe_hisi_ncsnp_support(); - if (!is_hyp_mode_available()) { kvm_info("HYP mode not available\n"); return -ENODEV; }
+ /* Probe the Hisi CPU type */ + probe_hisi_cpu_type(); + probe_hisi_ncsnp_support(); + if (!kvm_arch_check_sve_has_vhe()) { kvm_pr_unimpl("SVE system without VHE unsupported. Broken cpu?"); return -ENODEV;
From: Eric Auger eric.auger@redhat.com
mainline inclusion from mainline-v5.4-rc1 commit 3109741a8d773b91eec4a1f7764c97a1176ec32d category: feature feature: >256U kvm guest
-------------------------------------------------
At the moment we use 2 IO devices per GICv3 redistributor: one one for the RD_base frame and one for the SGI_base frame.
Instead we can use a single IO device per redistributor (the 2 frames are contiguous). This saves slots on the KVM_MMIO_BUS which is currently limited to NR_IOBUS_DEVS (1000).
This change allows to instantiate up to 512 redistributors and may speed the guest boot with a large number of VCPUs.
Signed-off-by: Eric Auger eric.auger@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- include/kvm/arm_vgic.h | 1 - virt/kvm/arm/vgic/vgic-init.c | 1 - virt/kvm/arm/vgic/vgic-mmio-v3.c | 81 ++++++++++++---------------------------- 3 files changed, 24 insertions(+), 59 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 88142c1..e3d93c8 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -334,7 +334,6 @@ struct vgic_cpu { * parts of the redistributor. */ struct vgic_io_device rd_iodev; - struct vgic_io_device sgi_iodev; struct vgic_redist_region *rdreg;
/* Contains the attributes and gpa of the LPI pending tables. */ diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c index 683ae45..cd99139 100644 --- a/virt/kvm/arm/vgic/vgic-init.c +++ b/virt/kvm/arm/vgic/vgic-init.c @@ -217,7 +217,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) int i;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF; - vgic_cpu->sgi_iodev.base_addr = VGIC_ADDR_UNDEF;
INIT_LIST_HEAD(&vgic_cpu->ap_list_head); raw_spin_lock_init(&vgic_cpu->ap_list_lock); diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index 2aea9f4..11ba1b0 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -525,7 +525,8 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu, VGIC_ACCESS_32bit), };
-static const struct vgic_register_region vgic_v3_rdbase_registers[] = { +static const struct vgic_register_region vgic_v3_rd_registers[] = { + /* RD_base registers */ REGISTER_DESC_WITH_LENGTH(GICR_CTLR, vgic_mmio_read_v3r_ctlr, vgic_mmio_write_v3r_ctlr, 4, VGIC_ACCESS_32bit), @@ -550,44 +551,42 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu, REGISTER_DESC_WITH_LENGTH(GICR_IDREGS, vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48, VGIC_ACCESS_32bit), -}; - -static const struct vgic_register_region vgic_v3_sgibase_registers[] = { - REGISTER_DESC_WITH_LENGTH(GICR_IGROUPR0, + /* SGI_base registers */ + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGROUPR0, vgic_mmio_read_group, vgic_mmio_write_group, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_ISENABLER0, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ISENABLER0, vgic_mmio_read_enable, vgic_mmio_write_senable, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_ICENABLER0, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICENABLER0, vgic_mmio_read_enable, vgic_mmio_write_cenable, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ISPENDR0, + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISPENDR0, vgic_mmio_read_pending, vgic_mmio_write_spending, vgic_v3_uaccess_read_pending, vgic_v3_uaccess_write_pending, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ICPENDR0, + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICPENDR0, vgic_mmio_read_pending, vgic_mmio_write_cpending, vgic_mmio_read_raz, vgic_mmio_uaccess_write_wi, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ISACTIVER0, + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISACTIVER0, vgic_mmio_read_active, vgic_mmio_write_sactive, NULL, vgic_mmio_uaccess_write_sactive, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ICACTIVER0, + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICACTIVER0, vgic_mmio_read_active, vgic_mmio_write_cactive, NULL, vgic_mmio_uaccess_write_cactive, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_IPRIORITYR0, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IPRIORITYR0, vgic_mmio_read_priority, vgic_mmio_write_priority, 32, VGIC_ACCESS_32bit | VGIC_ACCESS_8bit), - REGISTER_DESC_WITH_LENGTH(GICR_ICFGR0, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICFGR0, vgic_mmio_read_config, vgic_mmio_write_config, 8, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_IGRPMODR0, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGRPMODR0, vgic_mmio_read_raz, vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_NSACR, + REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_NSACR, vgic_mmio_read_raz, vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit), }; @@ -617,9 +616,8 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu) struct vgic_dist *vgic = &kvm->arch.vgic; struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; struct vgic_io_device *rd_dev = &vcpu->arch.vgic_cpu.rd_iodev; - struct vgic_io_device *sgi_dev = &vcpu->arch.vgic_cpu.sgi_iodev; struct vgic_redist_region *rdreg; - gpa_t rd_base, sgi_base; + gpa_t rd_base; int ret;
if (!IS_VGIC_ADDR_UNDEF(vgic_cpu->rd_iodev.base_addr)) @@ -641,52 +639,31 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu) vgic_cpu->rdreg = rdreg;
rd_base = rdreg->base + rdreg->free_index * KVM_VGIC_V3_REDIST_SIZE; - sgi_base = rd_base + SZ_64K;
kvm_iodevice_init(&rd_dev->dev, &kvm_io_gic_ops); rd_dev->base_addr = rd_base; rd_dev->iodev_type = IODEV_REDIST; - rd_dev->regions = vgic_v3_rdbase_registers; - rd_dev->nr_regions = ARRAY_SIZE(vgic_v3_rdbase_registers); + rd_dev->regions = vgic_v3_rd_registers; + rd_dev->nr_regions = ARRAY_SIZE(vgic_v3_rd_registers); rd_dev->redist_vcpu = vcpu;
mutex_lock(&kvm->slots_lock); ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, rd_base, - SZ_64K, &rd_dev->dev); + 2 * SZ_64K, &rd_dev->dev); mutex_unlock(&kvm->slots_lock);
if (ret) return ret;
- kvm_iodevice_init(&sgi_dev->dev, &kvm_io_gic_ops); - sgi_dev->base_addr = sgi_base; - sgi_dev->iodev_type = IODEV_REDIST; - sgi_dev->regions = vgic_v3_sgibase_registers; - sgi_dev->nr_regions = ARRAY_SIZE(vgic_v3_sgibase_registers); - sgi_dev->redist_vcpu = vcpu; - - mutex_lock(&kvm->slots_lock); - ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, sgi_base, - SZ_64K, &sgi_dev->dev); - if (ret) { - kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, - &rd_dev->dev); - goto out; - } - rdreg->free_index++; -out: - mutex_unlock(&kvm->slots_lock); - return ret; + return 0; }
static void vgic_unregister_redist_iodev(struct kvm_vcpu *vcpu) { struct vgic_io_device *rd_dev = &vcpu->arch.vgic_cpu.rd_iodev; - struct vgic_io_device *sgi_dev = &vcpu->arch.vgic_cpu.sgi_iodev;
kvm_io_bus_unregister_dev(vcpu->kvm, KVM_MMIO_BUS, &rd_dev->dev); - kvm_io_bus_unregister_dev(vcpu->kvm, KVM_MMIO_BUS, &sgi_dev->dev); }
static int vgic_register_all_redist_iodevs(struct kvm *kvm) @@ -836,8 +813,8 @@ int vgic_v3_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr) iodev.base_addr = 0; break; case KVM_DEV_ARM_VGIC_GRP_REDIST_REGS:{ - iodev.regions = vgic_v3_rdbase_registers; - iodev.nr_regions = ARRAY_SIZE(vgic_v3_rdbase_registers); + iodev.regions = vgic_v3_rd_registers; + iodev.nr_regions = ARRAY_SIZE(vgic_v3_rd_registers); iodev.base_addr = 0; break; } @@ -995,21 +972,11 @@ int vgic_v3_redist_uaccess(struct kvm_vcpu *vcpu, bool is_write, int offset, u32 *val) { struct vgic_io_device rd_dev = { - .regions = vgic_v3_rdbase_registers, - .nr_regions = ARRAY_SIZE(vgic_v3_rdbase_registers), + .regions = vgic_v3_rd_registers, + .nr_regions = ARRAY_SIZE(vgic_v3_rd_registers), };
- struct vgic_io_device sgi_dev = { - .regions = vgic_v3_sgibase_registers, - .nr_regions = ARRAY_SIZE(vgic_v3_sgibase_registers), - }; - - /* SGI_base is the next 64K frame after RD_base */ - if (offset >= SZ_64K) - return vgic_uaccess(vcpu, &sgi_dev, is_write, offset - SZ_64K, - val); - else - return vgic_uaccess(vcpu, &rd_dev, is_write, offset, val); + return vgic_uaccess(vcpu, &rd_dev, is_write, offset, val); }
int vgic_v3_line_level_info_uaccess(struct kvm_vcpu *vcpu, bool is_write,
From: Marc Zyngier maz@kernel.org
mainline inclusion from mainline-v5.4-rc1 commit 92f35b751c71d14250a401246f2c792e3aa5b386 category: feature feature: >256U kvm guest
-------------------------------------------------
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited.
One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field.
Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier:
vcpu_id = 256 * vcpu2_index + vcpu_index
With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't.
Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
Reported-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Eric Auger eric.auger@redhat.com Reviewed-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Hailiang Zhang zhang.zhanghailiang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- Documentation/virtual/kvm/api.txt | 12 ++++++++++-- arch/arm/include/uapi/asm/kvm.h | 4 +++- arch/arm64/include/uapi/asm/kvm.h | 4 +++- include/uapi/linux/kvm.h | 1 + virt/kvm/arm/arm.c | 2 ++ 5 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 4a8e699..3072aaf 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -728,8 +728,8 @@ in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for specific cpus. The irq field is interpreted like this:
- ��bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | - field: | irq_type | vcpu_index | irq_id | + bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 | + field: | vcpu2_index | irq_type | vcpu_index | irq_id |
The irq_type field has the following values: - irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ @@ -741,6 +741,14 @@ The irq_type field has the following values:
In both cases, level is used to assert/deassert the line.
+When KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 is supported, the target vcpu is +identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index +must be zero. + +Note that on arm/arm64, the KVM_CAP_IRQCHIP capability only conditions +injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always +be used for a userspace interrupt controller. + struct kvm_irq_level { union { __u32 irq; /* GSI */ diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index 4602464..86db092 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -254,8 +254,10 @@ struct kvm_vcpu_events { #define KVM_DEV_ARM_ITS_CTRL_RESET 4
/* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 97c3478..307d136 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -265,8 +265,10 @@ struct kvm_vcpu_events { #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
/* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 062e5fd..b4430d3 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -963,6 +963,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 #define KVM_CAP_MSR_PLATFORM_INFO 159 #define KVM_CAP_ARM_VM_IPA_SIZE 165 /* returns maximum IPA bits for a VM */ +#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 6d66a8a..a54c221 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -217,6 +217,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_READONLY_MEM: case KVM_CAP_MP_STATE: case KVM_CAP_IMMEDIATE_EXIT: + case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2: r = 1; break; case KVM_CAP_ARM_SET_DEVICE_ADDR: @@ -919,6 +920,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK; vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK; + vcpu_idx += ((irq >> KVM_ARM_IRQ_VCPU2_SHIFT) & KVM_ARM_IRQ_VCPU2_MASK) * (KVM_ARM_IRQ_VCPU_MASK + 1); irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);