[PATCH openEuler-5.10-LTS 1/6] KVM: x86: do not report a vCPU as preempted outside instruction boundaries

From: Paolo Bonzini <pbonzini@redhat.com> mainline inclusion from mainline-v5.19-rc2 commit 6cd88243c7e03845a450795e134b488fc2afb736 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H CVE: CVE-2022-39189 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------- If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> conflict: arch/x86/kvm/x86.c Signed-off-by: Guo Mengqi <guomengqi3@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: yezengruan <yezengruan@huawei.com> Reviewed-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++ 4 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d90792da5b9e..01544554a4c2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -551,6 +551,7 @@ struct kvm_vcpu_arch { u64 ia32_misc_enable_msr; u64 smbase; u64 smi_count; + bool at_instruction_boundary; bool tpr_access_reporting; bool xsaves_enabled; u64 ia32_xss; @@ -1072,6 +1073,8 @@ struct kvm_vcpu_stat { u64 utime; u64 stime; u64 gtime; + u64 preemption_reported; + u64 preemption_other; u64 preemption_timer_exits; }; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2124fe54abfb..5231f40e8312 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3992,6 +3992,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu) { + if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR) + vcpu->arch.at_instruction_boundary = true; } static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 79889d27aa5b..96693ac066c2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6532,6 +6532,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) return; handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc)); + vcpu->arch.at_instruction_boundary = true; } static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 24c198e70f8e..dac43d92eb81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -232,6 +232,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { VCPU_STAT("l1d_flush", l1d_flush), VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns), VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns), + VCPU_STAT("preemption_reported", preemption_reported), + VCPU_STAT("preemption_other", preemption_other), VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped), VM_STAT("mmu_pte_write", mmu_pte_write), VM_STAT("mmu_pde_zapped", mmu_pde_zapped), @@ -286,6 +288,8 @@ struct dfx_kvm_stats_debugfs_item dfx_debugfs_entries[] = { DFX_STAT("stime", stime), DFX_STAT("gtime", gtime), DFX_STAT("preemption_timer_exits", preemption_timer_exits), + DFX_STAT("preemption_reported", preemption_reported), + DFX_STAT("preemption_other", preemption_other), { NULL } }; @@ -4078,6 +4082,19 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) struct kvm_host_map map; struct kvm_steal_time *st; + /* + * The vCPU can be marked preempted if and only if the VM-Exit was on + * an instruction boundary and will not trigger guest emulation of any + * kind (see vcpu_run). Vendor specific code controls (conservatively) + * when this is true, for example allowing the vCPU to be marked + * preempted if and only if the VM-Exit was due to a host interrupt. + */ + if (!vcpu->arch.at_instruction_boundary) { + vcpu->stat.preemption_other++; + return; + } + + vcpu->stat.preemption_reported++; if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; @@ -9306,6 +9323,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu) vcpu->arch.l1tf_flush_l1d = true; for (;;) { + /* + * If another guest vCPU requests a PV TLB flush in the middle + * of instruction emulation, the rest of the emulation could + * use a stale page translation. Assume that any code after + * this point can start executing an instruction. + */ + vcpu->arch.at_instruction_boundary = false; if (kvm_vcpu_running(vcpu)) { r = vcpu_enter_guest(vcpu); } else { -- 2.20.1

From: Chunguang Xu <brookxu@tencent.com> mainline inclusion from mainline-v5.14-rc1 commit d80c228d44640f0b47b57a2ca4afa26ef87e16b0 category: bugfix bugzilla: 187475, https://gitee.com/openeuler/kernel/issues/I5ME0J CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- On the IO submission path, blk_account_io_start() may interrupt the system interruption. When the interruption returns, the value of part->stamp may have been updated by other cores, so the time value collected before the interruption may be less than part-> stamp. So when this happens, we should do nothing to make io_ticks more accurate? For kernels less than 5.0, this may cause io_ticks to become smaller, which in turn may cause abnormal ioutil values. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.... Signed-off-by: Jens Axboe <axboe@kernel.dk> conflict: block/blk-core.c Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 109fb2750453..0b496dabc5ac 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1258,7 +1258,7 @@ void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) unsigned long stamp; again: stamp = READ_ONCE(part->stamp); - if (unlikely(stamp != now)) { + if (unlikely(time_after(now, stamp))) { if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) __part_stat_add(part, io_ticks, end ? now - stamp : 1); } -- 2.20.1

From: Zheyu Ma <zheyuma97@gmail.com> mainline inclusion from mainline-v5.18-rc5 commit 15cf0b82271b1823fb02ab8c377badba614d95d5 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OVRU CVE: CVE-2022-3061 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... --------------------------- The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of 'pixclock', it may cause divide error. Fix this by checking whether 'pixclock' is zero in the function i740fb_check_var(). The following log reveals it: divide error: 0000 [#1] PREEMPT SMP KASAN PTI RIP: 0010:i740fb_decode_var drivers/video/fbdev/i740fb.c:444 [inline] RIP: 0010:i740fb_set_par+0x272f/0x3bb0 drivers/video/fbdev/i740fb.c:739 Call Trace: fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1036 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1112 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1191 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Xia Longlong <xialonglong1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- drivers/video/fbdev/i740fb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c index 52cce0db8bd3..b595437a5752 100644 --- a/drivers/video/fbdev/i740fb.c +++ b/drivers/video/fbdev/i740fb.c @@ -657,6 +657,9 @@ static int i740fb_decode_var(const struct fb_var_screeninfo *var, static int i740fb_check_var(struct fb_var_screeninfo *var, struct fb_info *info) { + if (!var->pixclock) + return -EINVAL; + switch (var->bits_per_pixel) { case 8: var->red.offset = var->green.offset = var->blue.offset = 0; -- 2.20.1

From: Zhang Zekun <zhangzekun11@huawei.com> hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5E461 CVE: NA ------------------------ Enable ACPI HMAT and memory hot remove feature on arm64 by default. For ACPI_HMAT: ACPI HMAT describe the memory attributes, such as bandwidth and latency details, related to the System Physical Address(SPA) Memory Ranges. HMAT is especially useful when software wants to get some information about a certain special memory's memory attributes, such as PMEM and HBM. For MEMORY_HOT_REMOTE: Add support for memory hot remove feature. Some special memory, such as HBM, can be power consuming, and will only be used in some aimed scenarios. With memory hot remove feature, User can offline the idle memory for energy saving purpose when this special memory is unused. As PMEM and HBM are getting more popular, ACPI_HMAT and memory hot remove feature should be enabled as default. The following configs should be opened with CONFIG_ACPI_HMAT by default: 1.CONFIG_EFI_SOFT_RESERVE=y 2.CONFIG_HMEM_REPORTING=y 3.CONFIG_DEV_DAX_HMEM=m 4.CONFIG_DEV_DAX_HMEM_DEVICES=y Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Kai Liu <kai.liu@suse.com> Reviewed-by: Chao Liu <liuchao173@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- arch/arm64/configs/openeuler_defconfig | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 78a63cbc3db6..d246fd508ef6 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -709,7 +709,12 @@ CONFIG_ACPI_REDUCED_HARDWARE_ONLY=y CONFIG_ACPI_NFIT=m # CONFIG_NFIT_SECURITY_DEBUG is not set CONFIG_ACPI_NUMA=y -# CONFIG_ACPI_HMAT is not set +CONFIG_ACPI_HMAT=y +CONFIG_EFI_SOFT_RESERVE=y +# CONFIG_ZONE_DEVICE is not set +CONFIG_HMEM_REPORTING=y +CONFIG_DEV_DAX_HMEM=m +CONFIG_DEV_DAX_HMEM_DEVICES=y CONFIG_HAVE_ACPI_APEI=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y @@ -1046,7 +1051,7 @@ CONFIG_COHERENT_DEVICE=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_SPARSE=y CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y -# CONFIG_MEMORY_HOTREMOVE is not set +CONFIG_MEMORY_HOTREMOVE=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MEMORY_BALLOON=y CONFIG_BALLOON_COMPACTION=y -- 2.20.1

From: David Leadbeater <dgl@dgl.cx> mainline inclusion from mainline-v6.0-rc6 commit e8d5dfd1d8747b56077d02664a8838c71ced948e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OWZ7 CVE: CVE-2022-2663 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=e8... --------------------------- CTCP messages should only be at the start of an IRC message, not anywhere within it. While the helper only decodes packes in the ORIGINAL direction, its possible to make a client send a CTCP message back by empedding one into a PING request. As-is, thats enough to make the helper believe that it saw a CTCP message. Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port") Signed-off-by: David Leadbeater <dgl@dgl.cx> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- net/netfilter/nf_conntrack_irc.c | 34 ++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c index e40988a2f22f..c745dd19d42f 100644 --- a/net/netfilter/nf_conntrack_irc.c +++ b/net/netfilter/nf_conntrack_irc.c @@ -148,15 +148,37 @@ static int help(struct sk_buff *skb, unsigned int protoff, data = ib_ptr; data_limit = ib_ptr + skb->len - dataoff; - /* strlen("\1DCC SENT t AAAAAAAA P\1\n")=24 - * 5+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=14 */ - while (data < data_limit - (19 + MINMATCHLEN)) { - if (memcmp(data, "\1DCC ", 5)) { + /* Skip any whitespace */ + while (data < data_limit - 10) { + if (*data == ' ' || *data == '\r' || *data == '\n') + data++; + else + break; + } + + /* strlen("PRIVMSG x ")=10 */ + if (data < data_limit - 10) { + if (strncasecmp("PRIVMSG ", data, 8)) + goto out; + data += 8; + } + + /* strlen(" :\1DCC SENT t AAAAAAAA P\1\n")=26 + * 7+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=26 + */ + while (data < data_limit - (21 + MINMATCHLEN)) { + /* Find first " :", the start of message */ + if (memcmp(data, " :", 2)) { data++; continue; } + data += 2; + + /* then check that place only for the DCC command */ + if (memcmp(data, "\1DCC ", 5)) + goto out; data += 5; - /* we have at least (19+MINMATCHLEN)-5 bytes valid data left */ + /* we have at least (21+MINMATCHLEN)-(2+5) bytes valid data left */ iph = ip_hdr(skb); pr_debug("DCC found in master %pI4:%u %pI4:%u\n", @@ -172,7 +194,7 @@ static int help(struct sk_buff *skb, unsigned int protoff, pr_debug("DCC %s detected\n", dccprotos[i]); /* we have at least - * (19+MINMATCHLEN)-5-dccprotos[i].matchlen bytes valid + * (21+MINMATCHLEN)-7-dccprotos[i].matchlen bytes valid * data left (== 14/13 bytes) */ if (parse_dcc(data, data_limit, &dcc_ip, &dcc_port, &addr_beg_p, &addr_end_p)) { -- 2.20.1

From: Pablo Neira Ayuso <pablo@netfilter.org> stable inclusion from stable-v5.10.140 commit c08a104a8bce832f6e7a4e8d9ac091777b9982ea category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PEDR?from=project-issue CVE: CVE-2022-39190 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- [ Upstream commit e02f0d3970404bfea385b6edb86f2d936db0ea2b ] Update nft_data_init() to report EINVAL if chain is already bound. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6a26a9cfdb58..44b6d7de8100 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8658,6 +8658,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, return PTR_ERR(chain); if (nft_is_base_chain(chain)) return -EOPNOTSUPP; + if (nft_chain_is_bound(chain)) + return -EINVAL; chain->use++; data->verdict.chain = chain; -- 2.20.1
participants (1)
-
Zheng Zengkai