您好!
Kernel SIG 邀请您参加 2022-09-16 14:00 召开的Zoom会议(自动录制)
会议主题:openEuler Kernel SIG例会
会议内容:
1.调度系列技术分享第三期 -- Linux调度器框架
2.openEuler开源之夏特性 -- 轻量级死锁检测实现
3.openEuler-22.09龙芯支持特性简介
会议链接:https://us06web.zoom.us/j/87690719217?pwd=cDN6V3Bhc05DU3lqTmhHYzA2L0RBQT09
会议纪要:https://etherpad.openeuler.org/p/Kernel-meetings
温馨提醒:建议接入会议后修改参会人的姓名,也可以使用您在gitee.com的ID
更多资讯尽在:https://openeuler.org/zh/
Hello!
openEuler Kernel SIG invites you to attend the Zoom conference(auto recording) will be held at 2022-09-16 14:00,
The subject of the conference is openEuler Kernel SIG例会,
Summary:
1.调度系列技术分享第三期 -- Linux调度器框架
2.openEuler开源之夏特性 -- 轻量级死锁检测实现
3.openEuler-22.09龙芯支持特性简介
You can join the meeting at https://us06web.zoom.us/j/87690719217?pwd=cDN6V3Bhc05DU3lqTmhHYzA2L0RBQT09.
Add topics at https://etherpad.openeuler.org/p/Kernel-meetings.
Note: You are advised to change the participant name after joining the conference or use your ID at gitee.com.
More information: https://openeuler.org/en/
From: Hyunwoo Kim <imv4bel(a)gmail.com>
mainline inclusion
from mainline-v5.19-rc4
commit a09d2d00af53b43c6f11e6ab3cb58443c2cac8a7
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PRMO
CVE: CVE-2022-39842
---------------------
In pxa3xx_gcu_write, a count parameter of type size_t is passed to words of
type int. Then, copy_from_user() may cause a heap overflow because it is used
as the third argument of copy_from_user().
Signed-off-by: Hyunwoo Kim <imv4bel(a)gmail.com>
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Zhao Wenhui <zhaowenhui8(a)huawei.com>
Reviewed-by: Chen Hui <judy.chenhui(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
drivers/video/fbdev/pxa3xx-gcu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/pxa3xx-gcu.c b/drivers/video/fbdev/pxa3xx-gcu.c
index 69cfb337c857..6fe5573e2008 100644
--- a/drivers/video/fbdev/pxa3xx-gcu.c
+++ b/drivers/video/fbdev/pxa3xx-gcu.c
@@ -394,7 +394,7 @@ pxa3xx_gcu_write(struct file *file, const char *buff,
struct pxa3xx_gcu_batch *buffer;
struct pxa3xx_gcu_priv *priv = to_pxa3xx_gcu_priv(file);
- int words = count / 4;
+ size_t words = count / 4;
/* Does not need to be atomic. There's a lock in user space,
* but anyhow, this is just for statistics. */
--
2.25.1
From: Paolo Bonzini <pbonzini(a)redhat.com>
mainline inclusion
from mainline-v5.19-rc2
commit 6cd88243c7e03845a450795e134b488fc2afb736
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H
CVE: CVE-2022-39189
----------------------------------------
If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access. A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.
To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary. A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.
It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed. However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses. Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
conflict:
arch/x86/include/asm/kvm_host.h
arch/x86/kvm/svm.c
arch/x86/kvm/vmx.c
arch/x86/kvm/x86.c
Signed-off-by: Guo Mengqi <guomengqi3(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Reviewed-by: Weilong Chen <chenweilong(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm.c | 3 +++
arch/x86/kvm/vmx.c | 1 +
arch/x86/kvm/x86.c | 25 +++++++++++++++++++++++++
4 files changed, 32 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6549c20c958c..7afb2cb2d42c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -520,6 +520,7 @@ struct kvm_vcpu_arch {
u64 ia32_misc_enable_msr;
u64 smbase;
u64 smi_count;
+ bool at_instruction_boundary;
bool tpr_access_reporting;
u64 ia32_xss;
u64 microcode_version;
@@ -935,6 +936,8 @@ struct kvm_vcpu_stat {
u64 utime;
u64 stime;
u64 gtime;
+ u64 preemption_reported;
+ u64 preemption_other;
};
struct x86_instruction_info;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 46567343680f..f611ecdfc145 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6205,6 +6205,9 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
{
+ if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR)
+ vcpu->arch.at_instruction_boundary = true;
+
local_irq_enable();
/*
* We must have an instruction with interrupts enabled, so
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7ab18b5a8e96..4164016cf105 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10597,6 +10597,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
[cs]"i"(__KERNEL_CS)
);
}
+ vcpu->arch.at_instruction_boundary = true;
}
STACK_FRAME_NON_STANDARD(vmx_handle_external_intr);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5da2811219ec..c61cab768d2a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -186,6 +186,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "halt_successful_poll", VCPU_STAT(halt_successful_poll) },
{ "halt_attempted_poll", VCPU_STAT(halt_attempted_poll) },
{ "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) },
+ { "preemption_reported", VCPU_STAT(preemption_reported) },
+ { "preemption_other", VCPU_STAT(preemption_other) },
{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
{ "hypercalls", VCPU_STAT(hypercalls) },
{ "request_irq", VCPU_STAT(request_irq_exits) },
@@ -231,6 +233,8 @@ struct dfx_kvm_stats_debugfs_item dfx_debugfs_entries[] = {
DFX_STAT("halt_exits", halt_exits),
DFX_STAT("halt_successful_poll", halt_successful_poll),
DFX_STAT("halt_attempted_poll", halt_attempted_poll),
+ DFX_STAT("preemption_reported", preemption_reported),
+ DFX_STAT("preemption_other", preemption_other),
DFX_STAT("halt_wakeup", halt_wakeup),
DFX_STAT("request_irq", request_irq_exits),
DFX_STAT("irq_exits", irq_exits),
@@ -3314,6 +3318,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
{
+ /*
+ * The vCPU can be marked preempted if and only if the VM-Exit was on
+ * an instruction boundary and will not trigger guest emulation of any
+ * kind (see vcpu_run). Vendor specific code controls (conservatively)
+ * when this is true, for example allowing the vCPU to be marked
+ * preempted if and only if the VM-Exit was due to a host interrupt.
+ */
+ if (!vcpu->arch.at_instruction_boundary) {
+ vcpu->stat.preemption_other++;
+ return;
+ }
+
+ vcpu->stat.preemption_reported++;
+
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -7920,6 +7938,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.l1tf_flush_l1d = true;
for (;;) {
+ /*
+ * If another guest vCPU requests a PV TLB flush in the middle
+ * of instruction emulation, the rest of the emulation could
+ * use a stale page translation. Assume that any code after
+ * this point can start executing an instruction.
+ */
+ vcpu->arch.at_instruction_boundary = false;
if (kvm_vcpu_running(vcpu)) {
r = vcpu_enter_guest(vcpu);
} else {
--
2.25.1
From: Andrew Murray <andrew.murray(a)arm.com>
mainline inclusion
from mainline-v5.6
commit 4942dc6638b07b5326b6d2faa142635c559e7cd5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QR6C
CVE: NA
-------------
On VHE systems arch.mdcr_el2 is written to mdcr_el2 at vcpu_load time to
set options for self-hosted debug and the performance monitors
extension.
Unfortunately the value of arch.mdcr_el2 is not calculated until
kvm_arm_setup_debug() in the run loop after the vcpu has been loaded.
This means that the initial brief iterations of the run loop use a zero
value of mdcr_el2 - until the vcpu is preempted. This also results in a
delay between changes to vcpu->guest_debug taking effect.
Fix this by writing to mdcr_el2 in kvm_arm_setup_debug() on VHE systems
when a change to arch.mdcr_el2 has been detected.
Fixes: d5a21bcc2995 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions")
Cc: <stable(a)vger.kernel.org> # 4.17.x-
Suggested-by: James Morse <james.morse(a)arm.com>
Acked-by: Will Deacon <will(a)kernel.org>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Andrew Murray <andrew.murray(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Zengruan Ye <yezengruan(a)huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1(a)huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin(a)huawei.com>
---
arch/arm64/kvm/debug.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 00d422336a45..4e722d73a3c3 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -112,7 +112,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
{
bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
- unsigned long mdscr;
+ unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
@@ -208,6 +208,10 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
+ /* Write mdcr_el2 changes since vcpu_load on VHE systems */
+ if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
+ write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+
trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
}
--
2.25.1
From: David Leadbeater <dgl(a)dgl.cx>
mainline inclusion
from mainline-v6.0-rc6
commit e8d5dfd1d8747b56077d02664a8838c71ced948e
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OWZ7
CVE: CVE-2022-2663
---------------------------
CTCP messages should only be at the start of an IRC message, not
anywhere within it.
While the helper only decodes packes in the ORIGINAL direction, its
possible to make a client send a CTCP message back by empedding one into
a PING request. As-is, thats enough to make the helper believe that it
saw a CTCP message.
Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Signed-off-by: David Leadbeater <dgl(a)dgl.cx>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Liu Jian <liujian56(a)huawei.com>
Reviewed-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com>
---
net/netfilter/nf_conntrack_irc.c | 34 ++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c
index 4099f4d79bae..819fc0df233f 100644
--- a/net/netfilter/nf_conntrack_irc.c
+++ b/net/netfilter/nf_conntrack_irc.c
@@ -150,15 +150,37 @@ static int help(struct sk_buff *skb, unsigned int protoff,
data = ib_ptr;
data_limit = ib_ptr + skb->len - dataoff;
- /* strlen("\1DCC SENT t AAAAAAAA P\1\n")=24
- * 5+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=14 */
- while (data < data_limit - (19 + MINMATCHLEN)) {
- if (memcmp(data, "\1DCC ", 5)) {
+ /* Skip any whitespace */
+ while (data < data_limit - 10) {
+ if (*data == ' ' || *data == '\r' || *data == '\n')
+ data++;
+ else
+ break;
+ }
+
+ /* strlen("PRIVMSG x ")=10 */
+ if (data < data_limit - 10) {
+ if (strncasecmp("PRIVMSG ", data, 8))
+ goto out;
+ data += 8;
+ }
+
+ /* strlen(" :\1DCC SENT t AAAAAAAA P\1\n")=26
+ * 7+MINMATCHLEN+strlen("t AAAAAAAA P\1\n")=26
+ */
+ while (data < data_limit - (21 + MINMATCHLEN)) {
+ /* Find first " :", the start of message */
+ if (memcmp(data, " :", 2)) {
data++;
continue;
}
+ data += 2;
+
+ /* then check that place only for the DCC command */
+ if (memcmp(data, "\1DCC ", 5))
+ goto out;
data += 5;
- /* we have at least (19+MINMATCHLEN)-5 bytes valid data left */
+ /* we have at least (21+MINMATCHLEN)-(2+5) bytes valid data left */
iph = ip_hdr(skb);
pr_debug("DCC found in master %pI4:%u %pI4:%u\n",
@@ -174,7 +196,7 @@ static int help(struct sk_buff *skb, unsigned int protoff,
pr_debug("DCC %s detected\n", dccprotos[i]);
/* we have at least
- * (19+MINMATCHLEN)-5-dccprotos[i].matchlen bytes valid
+ * (21+MINMATCHLEN)-7-dccprotos[i].matchlen bytes valid
* data left (== 14/13 bytes) */
if (parse_dcc(data, data_limit, &dcc_ip,
&dcc_port, &addr_beg_p, &addr_end_p)) {
--
2.25.1
From: Li Zhijian <lizhijian(a)cn.fujitsu.com>
stable inclusion
from stable-v4.19.207
commit 4c48bd3aa1a33cb3e8a0eca8330b8f100f43d895
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QR5E
CVE: NA
--------------------------------
commit 2d82d73da35b72b53fe0d96350a2b8d929d07e42
0Day robot observed that it's easily timeout on a heavy load host.
-------------------
# selftests: bpf: test_maps
# Fork 1024 tasks to 'test_update_delete'
# Fork 1024 tasks to 'test_update_delete'
# Fork 100 tasks to 'test_hashmap'
# Fork 100 tasks to 'test_hashmap_percpu'
# Fork 100 tasks to 'test_hashmap_sizes'
# Fork 100 tasks to 'test_hashmap_walk'
# Fork 100 tasks to 'test_arraymap'
# Fork 100 tasks to 'test_arraymap_percpu'
# Failed sockmap unexpected timeout
not ok 3 selftests: bpf: test_maps # exit=1
# selftests: bpf: test_lru_map
# nr_cpus:8
-------------------
Since this test will be scheduled by 0Day to a random host that could have
only a few cpus(2-8), enlarge the timeout to avoid a false NG report.
In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps',
and knew 10S is likely enough, but i still perfer to a larger value 30.
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com
Signed-off-by: tangbin <tangbin(a)cmss.chinamobile.com>
---
tools/testing/selftests/bpf/test_maps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 9b552c0fc47d..e5e62cb57ff7 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -796,7 +796,7 @@ static void test_sockmap(int tasks, void *data)
FD_ZERO(&w);
FD_SET(sfd[3], &w);
- to.tv_sec = 1;
+ to.tv_sec = 30;
to.tv_usec = 0;
s = select(sfd[3] + 1, &w, NULL, NULL, &to);
if (s == -1) {
--
2.18.4
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
stable inclusion
from stable-v5.10.137
commit 1a4b18b1ff11ba26f9a852019d674fde9d1d1cff
category: bugfix
bugzilla: 187457, https://gitee.com/src-openeuler/kernel/issues/I5MEZD
CVE: CVE-2022-2586
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
commit 470ee20e069a6d05ae549f7d0ef2bdbcee6a81b2 upstream.
When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.
Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.
When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.
This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.
Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lu Wei <luwei32(a)huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1(a)huawei.com>
Reviewed-by: Yue Haibing <yuehaibing(a)huawei.com>
---
net/netfilter/nf_tables_api.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 560a93aad5b6..7b0497da2d94 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3638,6 +3638,7 @@ static struct nft_set *nft_set_lookup_byhandle(const struct nft_table *table,
}
static struct nft_set *nft_set_lookup_byid(const struct net *net,
+ const struct nft_table *table,
const struct nlattr *nla, u8 genmask)
{
struct nft_trans *trans;
@@ -3648,6 +3649,7 @@ static struct nft_set *nft_set_lookup_byid(const struct net *net,
struct nft_set *set = nft_trans_set(trans);
if (id == nft_trans_set_id(trans) &&
+ set->table == table &&
nft_active_genmask(set, genmask))
return set;
}
@@ -3668,7 +3670,7 @@ struct nft_set *nft_set_lookup_global(const struct net *net,
if (!nla_set_id)
return set;
- set = nft_set_lookup_byid(net, nla_set_id, genmask);
+ set = nft_set_lookup_byid(net, table, nla_set_id, genmask);
}
return set;
}
--
2.20.1