From: Paolo Bonzini pbonzini@redhat.com
mainline inclusion from mainline-v5.19-rc2 commit 6cd88243c7e03845a450795e134b488fc2afb736 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H CVE: CVE-2022-39189
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
----------------------------------------
If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry.
To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts.
It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com
conflict: arch/x86/kvm/x86.c
Signed-off-by: Guo Mengqi guomengqi3@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: yezengruan yezengruan@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++ 4 files changed, 30 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d90792da5b9e..01544554a4c2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -551,6 +551,7 @@ struct kvm_vcpu_arch { u64 ia32_misc_enable_msr; u64 smbase; u64 smi_count; + bool at_instruction_boundary; bool tpr_access_reporting; bool xsaves_enabled; u64 ia32_xss; @@ -1072,6 +1073,8 @@ struct kvm_vcpu_stat { u64 utime; u64 stime; u64 gtime; + u64 preemption_reported; + u64 preemption_other; u64 preemption_timer_exits; };
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2124fe54abfb..5231f40e8312 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3992,6 +3992,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu) { + if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR) + vcpu->arch.at_instruction_boundary = true; }
static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 79889d27aa5b..96693ac066c2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6532,6 +6532,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) return;
handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc)); + vcpu->arch.at_instruction_boundary = true; }
static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 24c198e70f8e..dac43d92eb81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -232,6 +232,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { VCPU_STAT("l1d_flush", l1d_flush), VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns), VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns), + VCPU_STAT("preemption_reported", preemption_reported), + VCPU_STAT("preemption_other", preemption_other), VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped), VM_STAT("mmu_pte_write", mmu_pte_write), VM_STAT("mmu_pde_zapped", mmu_pde_zapped), @@ -286,6 +288,8 @@ struct dfx_kvm_stats_debugfs_item dfx_debugfs_entries[] = { DFX_STAT("stime", stime), DFX_STAT("gtime", gtime), DFX_STAT("preemption_timer_exits", preemption_timer_exits), + DFX_STAT("preemption_reported", preemption_reported), + DFX_STAT("preemption_other", preemption_other), { NULL } };
@@ -4078,6 +4082,19 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) struct kvm_host_map map; struct kvm_steal_time *st;
+ /* + * The vCPU can be marked preempted if and only if the VM-Exit was on + * an instruction boundary and will not trigger guest emulation of any + * kind (see vcpu_run). Vendor specific code controls (conservatively) + * when this is true, for example allowing the vCPU to be marked + * preempted if and only if the VM-Exit was due to a host interrupt. + */ + if (!vcpu->arch.at_instruction_boundary) { + vcpu->stat.preemption_other++; + return; + } + + vcpu->stat.preemption_reported++; if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return;
@@ -9306,6 +9323,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu) vcpu->arch.l1tf_flush_l1d = true;
for (;;) { + /* + * If another guest vCPU requests a PV TLB flush in the middle + * of instruction emulation, the rest of the emulation could + * use a stale page translation. Assume that any code after + * this point can start executing an instruction. + */ + vcpu->arch.at_instruction_boundary = false; if (kvm_vcpu_running(vcpu)) { r = vcpu_enter_guest(vcpu); } else {
From: Chunguang Xu brookxu@tencent.com
mainline inclusion from mainline-v5.14-rc1 commit d80c228d44640f0b47b57a2ca4afa26ef87e16b0 category: bugfix bugzilla: 187475, https://gitee.com/openeuler/kernel/issues/I5ME0J CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
On the IO submission path, blk_account_io_start() may interrupt the system interruption. When the interruption returns, the value of part->stamp may have been updated by other cores, so the time value collected before the interruption may be less than part-> stamp. So when this happens, we should do nothing to make io_ticks more accurate? For kernels less than 5.0, this may cause io_ticks to become smaller, which in turn may cause abnormal ioutil values.
Signed-off-by: Chunguang Xu brookxu@tencent.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.... Signed-off-by: Jens Axboe axboe@kernel.dk
conflict: block/blk-core.c
Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c index 109fb2750453..0b496dabc5ac 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1258,7 +1258,7 @@ void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) unsigned long stamp; again: stamp = READ_ONCE(part->stamp); - if (unlikely(stamp != now)) { + if (unlikely(time_after(now, stamp))) { if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) __part_stat_add(part, io_ticks, end ? now - stamp : 1); }
From: Zheyu Ma zheyuma97@gmail.com
mainline inclusion from mainline-v5.18-rc5 commit 15cf0b82271b1823fb02ab8c377badba614d95d5 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OVRU CVE: CVE-2022-3061
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of 'pixclock', it may cause divide error.
Fix this by checking whether 'pixclock' is zero in the function i740fb_check_var().
The following log reveals it:
divide error: 0000 [#1] PREEMPT SMP KASAN PTI RIP: 0010:i740fb_decode_var drivers/video/fbdev/i740fb.c:444 [inline] RIP: 0010:i740fb_set_par+0x272f/0x3bb0 drivers/video/fbdev/i740fb.c:739 Call Trace: fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1036 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1112 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1191 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline]
Signed-off-by: Zheyu Ma zheyuma97@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Xia Longlong xialonglong1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/video/fbdev/i740fb.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c index 52cce0db8bd3..b595437a5752 100644 --- a/drivers/video/fbdev/i740fb.c +++ b/drivers/video/fbdev/i740fb.c @@ -657,6 +657,9 @@ static int i740fb_decode_var(const struct fb_var_screeninfo *var,
static int i740fb_check_var(struct fb_var_screeninfo *var, struct fb_info *info) { + if (!var->pixclock) + return -EINVAL; + switch (var->bits_per_pixel) { case 8: var->red.offset = var->green.offset = var->blue.offset = 0;
From: Zhang Zekun zhangzekun11@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5E461 CVE: NA
------------------------ Enable ACPI HMAT and memory hot remove feature on arm64 by default.
For ACPI_HMAT: ACPI HMAT describe the memory attributes, such as bandwidth and latency details, related to the System Physical Address(SPA) Memory Ranges. HMAT is especially useful when software wants to get some information about a certain special memory's memory attributes, such as PMEM and HBM.
For MEMORY_HOT_REMOTE: Add support for memory hot remove feature. Some special memory, such as HBM, can be power consuming, and will only be used in some aimed scenarios. With memory hot remove feature, User can offline the idle memory for energy saving purpose when this special memory is unused.
As PMEM and HBM are getting more popular, ACPI_HMAT and memory hot remove feature should be enabled as default.
The following configs should be opened with CONFIG_ACPI_HMAT by default: 1.CONFIG_EFI_SOFT_RESERVE=y 2.CONFIG_HMEM_REPORTING=y 3.CONFIG_DEV_DAX_HMEM=m 4.CONFIG_DEV_DAX_HMEM_DEVICES=y
Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Kai Liu kai.liu@suse.com Reviewed-by: Chao Liu liuchao173@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/configs/openeuler_defconfig | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 78a63cbc3db6..d246fd508ef6 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -709,7 +709,12 @@ CONFIG_ACPI_REDUCED_HARDWARE_ONLY=y CONFIG_ACPI_NFIT=m # CONFIG_NFIT_SECURITY_DEBUG is not set CONFIG_ACPI_NUMA=y -# CONFIG_ACPI_HMAT is not set +CONFIG_ACPI_HMAT=y +CONFIG_EFI_SOFT_RESERVE=y +# CONFIG_ZONE_DEVICE is not set +CONFIG_HMEM_REPORTING=y +CONFIG_DEV_DAX_HMEM=m +CONFIG_DEV_DAX_HMEM_DEVICES=y CONFIG_HAVE_ACPI_APEI=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y @@ -1046,7 +1051,7 @@ CONFIG_COHERENT_DEVICE=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_SPARSE=y CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y -# CONFIG_MEMORY_HOTREMOVE is not set +CONFIG_MEMORY_HOTREMOVE=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MEMORY_BALLOON=y CONFIG_BALLOON_COMPACTION=y
From: David Leadbeater dgl@dgl.cx
mainline inclusion from mainline-v6.0-rc6 commit e8d5dfd1d8747b56077d02664a8838c71ced948e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OWZ7 CVE: CVE-2022-2663
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=e8...
---------------------------
CTCP messages should only be at the start of an IRC message, not anywhere within it.
While the helper only decodes packes in the ORIGINAL direction, its possible to make a client send a CTCP message back by empedding one into a PING request. As-is, thats enough to make the helper believe that it saw a CTCP message.
Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port") Signed-off-by: David Leadbeater dgl@dgl.cx Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/netfilter/nf_conntrack_irc.c | 34 ++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c index e40988a2f22f..c745dd19d42f 100644 --- a/net/netfilter/nf_conntrack_irc.c +++ b/net/netfilter/nf_conntrack_irc.c @@ -148,15 +148,37 @@ static int help(struct sk_buff *skb, unsigned int protoff, data = ib_ptr; data_limit = ib_ptr + skb->len - dataoff;
- /* strlen("\1DCC SENT t AAAAAAAA P\1\n")=24 - * 5+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=14 */ - while (data < data_limit - (19 + MINMATCHLEN)) { - if (memcmp(data, "\1DCC ", 5)) { + /* Skip any whitespace */ + while (data < data_limit - 10) { + if (*data == ' ' || *data == '\r' || *data == '\n') + data++; + else + break; + } + + /* strlen("PRIVMSG x ")=10 */ + if (data < data_limit - 10) { + if (strncasecmp("PRIVMSG ", data, 8)) + goto out; + data += 8; + } + + /* strlen(" :\1DCC SENT t AAAAAAAA P\1\n")=26 + * 7+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=26 + */ + while (data < data_limit - (21 + MINMATCHLEN)) { + /* Find first " :", the start of message */ + if (memcmp(data, " :", 2)) { data++; continue; } + data += 2; + + /* then check that place only for the DCC command */ + if (memcmp(data, "\1DCC ", 5)) + goto out; data += 5; - /* we have at least (19+MINMATCHLEN)-5 bytes valid data left */ + /* we have at least (21+MINMATCHLEN)-(2+5) bytes valid data left */
iph = ip_hdr(skb); pr_debug("DCC found in master %pI4:%u %pI4:%u\n", @@ -172,7 +194,7 @@ static int help(struct sk_buff *skb, unsigned int protoff, pr_debug("DCC %s detected\n", dccprotos[i]);
/* we have at least - * (19+MINMATCHLEN)-5-dccprotos[i].matchlen bytes valid + * (21+MINMATCHLEN)-7-dccprotos[i].matchlen bytes valid * data left (== 14/13 bytes) */ if (parse_dcc(data, data_limit, &dcc_ip, &dcc_port, &addr_beg_p, &addr_end_p)) {
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.140 commit c08a104a8bce832f6e7a4e8d9ac091777b9982ea category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PEDR?from=project-issue CVE: CVE-2022-39190
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit e02f0d3970404bfea385b6edb86f2d936db0ea2b ]
Update nft_data_init() to report EINVAL if chain is already bound.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung exsociety@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org
Conflicts: net/netfilter/nf_tables_api.c
Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6a26a9cfdb58..44b6d7de8100 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8658,6 +8658,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, return PTR_ERR(chain); if (nft_is_base_chain(chain)) return -EOPNOTSUPP; + if (nft_chain_is_bound(chain)) + return -EINVAL;
chain->use++; data->verdict.chain = chain;