mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2026 -----
  • May
  • April
  • March
  • February
  • January
  • ----- 2025 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 2 participants
  • 23341 discussions
[PATCH OLK-6.6 0/1] cpufreq: conservative: Reset requested_freq on limits change
by Lifeng Zheng 29 Apr '26

29 Apr '26
From: Hongye Lin <linhongye(a)h-partners.com> mainline inclusion from mainline-v7.0-rc6 commit 6a28fb8cb28b9eb39a392e531d938a889eacafc5 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8993 CVE: NA Reference: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm… ---------------------------------------------------------------------- A recently reported issue highlighted that the cached requested_freq is not guaranteed to stay in sync with policy->cur. If the platform changes the actual CPU frequency after the governor sets one (e.g. due to platform-specific frequency scaling) and a re-sync occurs later, policy->cur may diverge from requested_freq. This can lead to incorrect behavior in the conservative governor. For example, the governor may assume the CPU is already running at the maximum frequency and skip further increases even though there is still headroom. Avoid this by resetting the cached requested_freq to policy->cur on detecting a change in policy limits. Viresh Kumar (1): cpufreq: conservative: Reset requested_freq on limits change drivers/cpufreq/cpufreq_conservative.c | 12 ++++++++++++ drivers/cpufreq/cpufreq_governor.c | 3 +++ drivers/cpufreq/cpufreq_governor.h | 1 + 3 files changed, 16 insertions(+) -- 2.33.0
2 2
0 0
[PATCH OLK-5.10] NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd
by Li Lingfeng 29 Apr '26

29 Apr '26
From: Chuck Lever <chuck.lever(a)oracle.com> stable inclusion from stable-v5.10.253 commit 76740c28050dc6db2f5550f1325b00a11bbb3255 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14129 CVE: CVE-2026-31403 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit e7fcf179b82d3a3730fd8615da01b087cc654d0b ] The /proc/fs/nfs/exports proc entry is created at module init and persists for the module's lifetime. exports_proc_open() captures the caller's current network namespace and stores its svc_export_cache in seq->private, but takes no reference on the namespace. If the namespace is subsequently torn down (e.g. container destruction after the opener does setns() to a different namespace), nfsd_net_exit() calls nfsd_export_shutdown() which frees the cache. Subsequent reads on the still-open fd dereference the freed cache_detail, walking a freed hash table. Hold a reference on the struct net for the lifetime of the open file descriptor. This prevents nfsd_net_exit() from running -- and thus prevents nfsd_export_shutdown() from freeing the cache -- while any exports fd is open. cache_detail already stores its net pointer (cd->net, set by cache_create_net()), so exports_release() can retrieve it without additional per-file storage. Reported-by: Misbah Anjum N <misanjum(a)linux.ibm.com> Closes: https://lore.kernel.org/linux-nfs/dcd371d3a95815a84ba7de52cef447b8@linux.ib… Fixes: 96d851c4d28d ("nfsd: use proper net while reading "exports" file") Cc: stable(a)vger.kernel.org Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Reviewed-by: NeilBrown <neil(a)brown.name> Tested-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: fs/nfsd/nfsctl.c [Commit 97a32539b956 ("proc: convert everything to "struct proc_ops"") convert exports_proc_operations to exports_proc_ops.] Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- fs/nfsd/nfsctl.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index e84c8cf8e2c0..2bfd200bcd05 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -149,9 +149,19 @@ static int exports_net_open(struct net *net, struct file *file) seq = file->private_data; seq->private = nn->svc_export_cache; + get_net(net); return 0; } +static int exports_release(struct inode *inode, struct file *file) +{ + struct seq_file *seq = file->private_data; + struct cache_detail *cd = seq->private; + + put_net(cd->net); + return seq_release(inode, file); +} + static int exports_proc_open(struct inode *inode, struct file *file) { return exports_net_open(current->nsproxy->net_ns, file); @@ -161,7 +171,7 @@ static const struct proc_ops exports_proc_ops = { .proc_open = exports_proc_open, .proc_read = seq_read, .proc_lseek = seq_lseek, - .proc_release = seq_release, + .proc_release = exports_release, }; static int exports_nfsd_open(struct inode *inode, struct file *file) @@ -173,7 +183,7 @@ static const struct file_operations exports_nfsd_operations = { .open = exports_nfsd_open, .read = seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = exports_release, }; static int export_features_show(struct seq_file *m, void *v) -- 2.52.0
2 1
0 0
[PATCH OLK-6.6] KVM: x86: Use scratch field in MMIO fragment to hold small write values
by Xinyu Zheng 29 Apr '26

29 Apr '26
From: Sean Christopherson <seanjc(a)google.com> mainline inclusion from mainline-v7.1-rc1 commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14416 CVE: CVE-2026-31588 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Reported-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable(a)vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky(a)gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- arch/x86/kvm/x86.c | 14 +++++++++++++- include/linux/kvm_host.h | 3 ++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 947dae594ddb..18b733c8bb5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7814,7 +7814,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; frag->gpa = gpa; - frag->data = val; + if (write && bytes <= 8u) { + frag->val = 0; + frag->data = &frag->val; + memcpy(&frag->val, val, bytes); + } else { + frag->data = val; + } frag->len = bytes; return X86EMUL_CONTINUE; } @@ -7829,6 +7835,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, gpa_t gpa; int rc; + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) + return X86EMUL_UNHANDLEABLE; + if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; @@ -11272,6 +11281,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) frag++; vcpu->mmio_cur_fragment++; } else { + if (WARN_ON_ONCE(frag->data == &frag->val)) + return -EIO; + /* Go forward to the next mmio piece. */ frag->data += len; frag->gpa += len; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 480196639480..ae7053b044f7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -317,7 +317,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] KVM: x86: Use scratch field in MMIO fragment to hold small write values
by Xinyu Zheng 29 Apr '26

29 Apr '26
From: Sean Christopherson <seanjc(a)google.com> mainline inclusion from mainline-v7.1-rc1 commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14416 CVE: CVE-2026-31588 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Reported-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable(a)vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky(a)gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- arch/x86/kvm/x86.c | 14 +++++++++++++- include/linux/kvm_host.h | 3 ++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 947dae594ddb..18b733c8bb5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7814,7 +7814,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; frag->gpa = gpa; - frag->data = val; + if (write && bytes <= 8u) { + frag->val = 0; + frag->data = &frag->val; + memcpy(&frag->val, val, bytes); + } else { + frag->data = val; + } frag->len = bytes; return X86EMUL_CONTINUE; } @@ -7829,6 +7835,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, gpa_t gpa; int rc; + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) + return X86EMUL_UNHANDLEABLE; + if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; @@ -11272,6 +11281,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) frag++; vcpu->mmio_cur_fragment++; } else { + if (WARN_ON_ONCE(frag->data == &frag->val)) + return -EIO; + /* Go forward to the next mmio piece. */ frag->data += len; frag->gpa += len; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 480196639480..ae7053b044f7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -317,7 +317,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] [Backport] KVM: x86: Use scratch field in MMIO fragment to hold small write values
by Xinyu Zheng 29 Apr '26

29 Apr '26
From: Sean Christopherson <seanjc(a)google.com> mainline inclusion from mainline-v7.1-rc1 commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14416 CVE: CVE-2026-31588 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Reported-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable(a)vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky(a)gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- arch/x86/kvm/x86.c | 14 +++++++++++++- include/linux/kvm_host.h | 3 ++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 947dae594ddb..18b733c8bb5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7814,7 +7814,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; frag->gpa = gpa; - frag->data = val; + if (write && bytes <= 8u) { + frag->val = 0; + frag->data = &frag->val; + memcpy(&frag->val, val, bytes); + } else { + frag->data = val; + } frag->len = bytes; return X86EMUL_CONTINUE; } @@ -7829,6 +7835,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, gpa_t gpa; int rc; + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) + return X86EMUL_UNHANDLEABLE; + if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; @@ -11272,6 +11281,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) frag++; vcpu->mmio_cur_fragment++; } else { + if (WARN_ON_ONCE(frag->data == &frag->val)) + return -EIO; + /* Go forward to the next mmio piece. */ frag->data += len; frag->gpa += len; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 480196639480..ae7053b044f7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -317,7 +317,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { -- 2.34.1
2 1
0 0
[PATCH OLK-6.6] [Backport] KVM: x86: Use scratch field in MMIO fragment to hold small write values
by Xinyu Zheng 29 Apr '26

29 Apr '26
From: Sean Christopherson <seanjc(a)google.com> mainline inclusion from mainline-v7.1-rc1 commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14416 CVE: CVE-2026-31588 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Reported-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable(a)vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky(a)gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- arch/x86/kvm/x86.c | 14 +++++++++++++- include/linux/kvm_host.h | 3 ++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 947dae594ddb..18b733c8bb5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7814,7 +7814,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; frag->gpa = gpa; - frag->data = val; + if (write && bytes <= 8u) { + frag->val = 0; + frag->data = &frag->val; + memcpy(&frag->val, val, bytes); + } else { + frag->data = val; + } frag->len = bytes; return X86EMUL_CONTINUE; } @@ -7829,6 +7835,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, gpa_t gpa; int rc; + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) + return X86EMUL_UNHANDLEABLE; + if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; @@ -11272,6 +11281,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) frag++; vcpu->mmio_cur_fragment++; } else { + if (WARN_ON_ONCE(frag->data == &frag->val)) + return -EIO; + /* Go forward to the next mmio piece. */ frag->data += len; frag->gpa += len; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 480196639480..ae7053b044f7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -317,7 +317,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { -- 2.34.1
2 1
0 0
[PATCH OLK-5.10] NFSD: Defer sub-object cleanup in export put callbacks
by Li Lingfeng 29 Apr '26

29 Apr '26
From: Chuck Lever <chuck.lever(a)oracle.com> mainline inclusion from mainline-v7.0-rc5 commit 48db892356d6cb80f6942885545de4a6dd8d2a29 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14130 CVE: CVE-2026-31404 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- svc_export_put() calls path_put() and auth_domain_put() immediately when the last reference drops, before the RCU grace period. RCU readers in e_show() and c_show() access both ex_path (via seq_path/d_path) and ex_client->name (via seq_escape) without holding a reference. If cache_clean removes the entry and drops the last reference concurrently, the sub-objects are freed while still in use, producing a NULL pointer dereference in d_path. Commit 2530766492ec ("nfsd: fix UAF when access ex_uuid or ex_stats") moved kfree of ex_uuid and ex_stats into the call_rcu callback, but left path_put() and auth_domain_put() running before the grace period because both may sleep and call_rcu callbacks execute in softirq context. Replace call_rcu/kfree_rcu with queue_rcu_work(), which defers the callback until after the RCU grace period and executes it in process context where sleeping is permitted. This allows path_put() and auth_domain_put() to be moved into the deferred callback alongside the other resource releases. Apply the same fix to expkey_put(), which has the identical pattern with ek_path and ek_client. A dedicated workqueue scopes the shutdown drain to only NFSD export release work items; flushing the shared system_unbound_wq would stall on unrelated work from other subsystems. nfsd_export_shutdown() uses rcu_barrier() followed by flush_workqueue() to ensure all deferred release callbacks complete before the export caches are destroyed. Reported-by: Misbah Anjum N <misanjum(a)linux.ibm.com> Closes: https://lore.kernel.org/linux-nfs/dcd371d3a95815a84ba7de52cef447b8@linux.ib… Fixes: c224edca7af0 ("nfsd: no need get cache ref when protected by rcu") Fixes: 1b10f0b603c0 ("SUNRPC: no need get cache ref when protected by rcu") Cc: stable(a)vger.kernel.org Reviwed-by: Jeff Layton <jlayton(a)kernel.org> Reviewed-by: NeilBrown <neil(a)brown.name> Tested-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Conflicts: fs/nfsd/export.c fs/nfsd/export.h fs/nfsd/nfsctl.c [Commit 20ad856e4732 ("nfsd: report per-export stats") add EXP_STATS_COUNTERS_NUM in export_stats_reset(); commit 20ad856e4732 ("nfsd: report per-export stats") add "percpu_counter.h" in fs/nfsd/export.h; commit f7fb730cac9a ("NFSD: fix race between nfsd registration and exports_proc") move create_proc_exports_entry() down in init_nfsd().] Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- fs/nfsd/export.c | 63 +++++++++++++++++++++++++++++++++++++++++------- fs/nfsd/export.h | 7 ++++-- fs/nfsd/nfsctl.c | 8 +++++- 3 files changed, 66 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 0e1caf71296e..5eba1cd9a72e 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -36,19 +36,30 @@ * second map contains a reference to the entry in the first map. */ +static struct workqueue_struct *nfsd_export_wq; + #define EXPKEY_HASHBITS 8 #define EXPKEY_HASHMAX (1 << EXPKEY_HASHBITS) #define EXPKEY_HASHMASK (EXPKEY_HASHMAX -1) -static void expkey_put(struct kref *ref) +static void expkey_release(struct work_struct *work) { - struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref); + struct svc_expkey *key = container_of(to_rcu_work(work), + struct svc_expkey, ek_rwork); if (test_bit(CACHE_VALID, &key->h.flags) && !test_bit(CACHE_NEGATIVE, &key->h.flags)) path_put(&key->ek_path); auth_domain_put(key->ek_client); - kfree_rcu(key, ek_rcu); + kfree(key); +} + +static void expkey_put(struct kref *ref) +{ + struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref); + + INIT_RCU_WORK(&key->ek_rwork, expkey_release); + queue_rcu_work(nfsd_export_wq, &key->ek_rwork); } static int expkey_upcall(struct cache_detail *cd, struct cache_head *h) @@ -331,11 +342,13 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc) fsloc->locations = NULL; } -static void svc_export_release(struct rcu_head *rcu_head) +static void svc_export_release(struct work_struct *work) { - struct svc_export *exp = container_of(rcu_head, struct svc_export, - ex_rcu); + struct svc_export *exp = container_of(to_rcu_work(work), + struct svc_export, ex_rwork); + path_put(&exp->ex_path); + auth_domain_put(exp->ex_client); nfsd4_fslocs_free(&exp->ex_fslocs); kfree(exp->ex_uuid); kfree(exp); @@ -345,9 +358,8 @@ static void svc_export_put(struct kref *ref) { struct svc_export *exp = container_of(ref, struct svc_export, h.ref); - path_put(&exp->ex_path); - auth_domain_put(exp->ex_client); - call_rcu(&exp->ex_rcu, svc_export_release); + INIT_RCU_WORK(&exp->ex_rwork, svc_export_release); + queue_rcu_work(nfsd_export_wq, &exp->ex_rwork); } static int svc_export_upcall(struct cache_detail *cd, struct cache_head *h) @@ -1268,6 +1280,36 @@ const struct seq_operations nfs_exports_op = { .show = e_show, }; +/** + * nfsd_export_wq_init - allocate the export release workqueue + * + * Called once at module load. The workqueue runs deferred svc_export and + * svc_expkey release work scheduled by queue_rcu_work() in the cache put + * callbacks. + * + * Return values: + * %0: workqueue allocated + * %-ENOMEM: allocation failed + */ +int nfsd_export_wq_init(void) +{ + nfsd_export_wq = alloc_workqueue("nfsd_export", WQ_UNBOUND, 0); + if (!nfsd_export_wq) + return -ENOMEM; + return 0; +} + +/** + * nfsd_export_wq_shutdown - drain and free the export release workqueue + * + * Called once at module unload. Per-namespace teardown in + * nfsd_export_shutdown() has already drained all deferred work. + */ +void nfsd_export_wq_shutdown(void) +{ + destroy_workqueue(nfsd_export_wq); +} + /* * Initialize the exports module. */ @@ -1329,6 +1371,9 @@ nfsd_export_shutdown(struct net *net) cache_unregister_net(nn->svc_expkey_cache, net); cache_unregister_net(nn->svc_export_cache, net); + /* Drain deferred export and expkey release work. */ + rcu_barrier(); + flush_workqueue(nfsd_export_wq); cache_destroy_net(nn->svc_expkey_cache, net); cache_destroy_net(nn->svc_export_cache, net); svcauth_unix_purge(net); diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index e7daa1f246f0..d87fd4b57ea2 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -6,6 +6,7 @@ #define NFSD_EXPORT_H #include <linux/sunrpc/cache.h> +#include <linux/workqueue.h> #include <uapi/linux/nfsd/export.h> #include <linux/nfs4.h> @@ -61,7 +62,7 @@ struct svc_export { u32 ex_layout_types; struct nfsd4_deviceid_map *ex_devid_map; struct cache_detail *cd; - struct rcu_head ex_rcu; + struct rcu_work ex_rwork; }; /* an "export key" (expkey) maps a filehandlefragement to an @@ -76,7 +77,7 @@ struct svc_expkey { u32 ek_fsid[6]; struct path ek_path; - struct rcu_head ek_rcu; + struct rcu_work ek_rwork; }; #define EX_ISSYNC(exp) (!((exp)->ex_flags & NFSEXP_ASYNC)) @@ -89,6 +90,8 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp); /* * Function declarations */ +int nfsd_export_wq_init(void); +void nfsd_export_wq_shutdown(void); int nfsd_export_init(struct net *); void nfsd_export_shutdown(struct net *); void nfsd_export_flush(struct net *); diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index e84c8cf8e2c0..7ede258033a1 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1529,9 +1529,12 @@ static int __init init_nfsd(void) if (retval) goto out_free_stat; nfsd_lockd_init(); /* lockd->nfsd callbacks */ - retval = create_proc_exports_entry(); + retval = nfsd_export_wq_init(); if (retval) goto out_free_lockd; + retval = create_proc_exports_entry(); + if (retval) + goto out_free_export_wq; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) goto out_free_exports; @@ -1549,6 +1552,8 @@ static int __init init_nfsd(void) out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); +out_free_export_wq: + nfsd_export_wq_shutdown(); out_free_lockd: nfsd_lockd_shutdown(); nfsd_drc_slab_free(); @@ -1566,6 +1571,7 @@ static void __exit exit_nfsd(void) unregister_filesystem(&nfsd_fs_type); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); + nfsd_export_wq_shutdown(); nfsd_drc_slab_free(); remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); -- 2.52.0
2 1
0 0
[PATCH OLK-6.6] bpf: Fix the use of prog->axu->sleepable
by Luo Gengkun 28 Apr '26

28 Apr '26
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- After commit 137a3f0724c8 ("bpf: move sleepable flag from bpf_prog_aux to bpf_prog"), the sleepable variable is no longer usable. However, commit 038a41427a20 ("bpf: Reject sleepable kprobe_multi programs at attach time") uses it accidentally. Fix it by using prog->sleepable. Fixes: 137a3f0724c8 ("bpf: move sleepable flag from bpf_prog_aux to bpf_prog") Signed-off-by: Luo Gengkun <luogengkun2(a)huawei.com> --- kernel/trace/bpf_trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 52cb76dd27c8..768159fad93c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2900,7 +2900,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr return -EOPNOTSUPP; /* kprobe_multi is not allowed to be sleepable. */ - if (prog->aux->sleepable) + if (prog->sleepable) return -EINVAL; if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI) -- 2.34.1
2 1
0 0
[PATCH OLK-6.6] bpf: Fix the use of prog->axu->sleepable
by Luo Gengkun 28 Apr '26

28 Apr '26
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- After commit 137a3f0724c8 ("bpf: move sleepable flag from bpf_prog_aux to bpf_prog"), the sleepable variable is no longer usable. However, commit 038a41427a20 ("bpf: Reject sleepable kprobe_multi programs at attach time") uses it accidentally. Fix it by using prog->sleepable. Signed-off-by: Luo Gengkun <luogengkun2(a)huawei.com> --- kernel/trace/bpf_trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 52cb76dd27c8..768159fad93c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2900,7 +2900,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr return -EOPNOTSUPP; /* kprobe_multi is not allowed to be sleepable. */ - if (prog->aux->sleepable) + if (prog->sleepable) return -EINVAL; if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI) -- 2.34.1
2 1
0 0
[PATCH openEuler-1.0-LTS] KVM: x86: Use scratch field in MMIO fragment to hold small write values
by Xinyu Zheng 28 Apr '26

28 Apr '26
From: Sean Christopherson <seanjc(a)google.com> mainline inclusion from mainline-v7.1-rc1 commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14416 CVE: CVE-2026-31588 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Reported-by: Yashu Zhang <zhangjiaji1(a)huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable(a)vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky(a)gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> Conflicts: include/linux/kvm_host.h [context conflicts] Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- arch/x86/kvm/x86.c | 14 +++++++++++++- include/linux/kvm_host.h | 3 ++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83d215d43eb3..c4de9cc7bfce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5425,7 +5425,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; frag->gpa = gpa; - frag->data = val; + if (write && bytes <= 8u) { + frag->val = 0; + frag->data = &frag->val; + memcpy(&frag->val, val, bytes); + } else { + frag->data = val; + } frag->len = bytes; return X86EMUL_CONTINUE; } @@ -5440,6 +5446,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, gpa_t gpa; int rc; + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) + return X86EMUL_UNHANDLEABLE; + if (ops->read_write_prepare && ops->read_write_prepare(vcpu, val, bytes)) return X86EMUL_CONTINUE; @@ -8086,6 +8095,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) frag++; vcpu->mmio_cur_fragment++; } else { + if (WARN_ON_ONCE(frag->data == &frag->val)) + return -EIO; + /* Go forward to the next mmio piece. */ frag->data += len; frag->gpa += len; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f34f0989e453..23731239a228 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -234,7 +234,8 @@ enum { struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { -- 2.34.1
2 1
0 0
  • ← Newer
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • ...
  • 2335
  • Older →

HyperKitty Powered by HyperKitty