mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 33 participants
  • 18991 discussions
[PATCH openEuler-22.03-LTS-SP1 0/2] fix CVE-2024-53197
by Tengda Wu 30 Dec '24

30 Dec '24
Fix CVE-2024-53197. Benoît Sevens (1): ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices Dan Carpenter (1): ALSA: usb-audio: Fix a DMA to stack memory bug sound/usb/quirks.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) -- 2.34.1
2 3
0 0
[PATCH OLK-6.6 0/2] LoongArch: compile loongson drm driver as module
by Hongchen Zhang 30 Dec '24

30 Dec '24
Hongchen Zhang (2): LoongArch: cleanup loongson3 defconfig drm/loongson: compile loongson drm driver as module arch/loongarch/configs/loongson3_defconfig | 13 ++++++------- drivers/gpu/drm/Makefile | 2 +- 2 files changed, 7 insertions(+), 8 deletions(-) -- 2.33.0
2 7
0 0
[PATCH OLK-5.10] io_uring: check if iowq is killed before queuing
by Yifan Qiao 30 Dec '24

30 Dec '24
From: Pavel Begunkov <asml.silence(a)gmail.com> stable inclusion from stable-v6.6.68 commit 2ca94c8de36091067b9ce7527ae8db3812d38781 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEG43 CVE: CVE-2024-56709 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit dbd2ca9367eb19bc5e269b8c58b0b1514ada9156 upstream. task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable(a)vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot(a)protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.17346379… Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- io_uring/io_uring.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6e5e00a7692c..3a1eee5bac77 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1096,6 +1096,7 @@ static struct file *io_file_get(struct io_ring_ctx *ctx, unsigned int issue_flags); static void __io_queue_sqe(struct io_kiocb *req); static void io_rsrc_put_work(struct work_struct *work); +static void io_req_task_queue_fail(struct io_kiocb *req, int ret); static void io_req_task_queue(struct io_kiocb *req); static void io_submit_flush_completions(struct io_ring_ctx *ctx); @@ -1465,7 +1466,11 @@ static void io_queue_async_work(struct io_kiocb *req, bool *locked) locked = NULL; BUG_ON(!tctx); - BUG_ON(!tctx->io_wq); + + if ((current->flags & PF_KTHREAD) || !tctx->io_wq) { + io_req_task_queue_fail(req, -ECANCELED); + return; + } /* init ->work of the whole link before punting */ io_prep_async_link(req); -- 2.39.2
2 5
0 0
[PATCH OLK-5.10] virtiofs: use pages instead of pointer for kernel direct IO
by Yifan Qiao 30 Dec '24

30 Dec '24
From: Hou Tao <houtao1(a)huawei.com> mainline inclusion from mainline-v6.13-rc1 commit 41748675c0bf252b3c5f600a95830f0936d366c1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAGA CVE: CVE-2024-53219 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When trying to insert a 10MB kernel module kept in a virtio-fs with cache disabled, the following warning was reported: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ...... Modules linked in: CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:__alloc_pages+0x2bf/0x380 ...... Call Trace: <TASK> ? __warn+0x8e/0x150 ? __alloc_pages+0x2bf/0x380 __kmalloc_large_node+0x86/0x160 __kmalloc+0x33c/0x480 virtio_fs_enqueue_req+0x240/0x6d0 virtio_fs_wake_pending_and_unlock+0x7f/0x190 queue_request_and_unlock+0x55/0x60 fuse_simple_request+0x152/0x2b0 fuse_direct_io+0x5d2/0x8c0 fuse_file_read_iter+0x121/0x160 __kernel_read+0x151/0x2d0 kernel_read+0x45/0x50 kernel_read_file+0x1a9/0x2a0 init_module_from_file+0x6a/0xe0 idempotent_init_module+0x175/0x230 __x64_sys_finit_module+0x5d/0xb0 x64_sys_call+0x1c3/0x9e0 do_syscall_64+0x3d/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ...... </TASK> ---[ end trace 0000000000000000 ]--- The warning is triggered as follows: 1) syscall finit_module() handles the module insertion and it invokes kernel_read_file() to read the content of the module first. 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and passes it to kernel_read(). kernel_read() constructs a kvec iter by using iov_iter_kvec() and passes it to fuse_file_read_iter(). 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes fuse_direct_io(). As for now, the maximal read size for kvec iter is only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so fuse_direct_io() doesn't split the 10MB buffer. It saves the address and the size of the 10MB-sized buffer in out_args[0] of a fuse request and passes the fuse request to virtio_fs_wake_pending_and_unlock(). 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to queue the request. Because virtiofs need DMA-able address, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for all fuse args, copies these args into the bounce buffer and passed the physical address of the bounce buffer to virtiofsd. The total length of these fuse args for the passed fuse request is about 10MB, so copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and it triggers the warning in __alloc_pages(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; 5) virtio_fs_enqueue_req() will retry the memory allocation in a kworker, but it won't help, because kmalloc() will always return NULL due to the abnormal size and finit_module() will hang forever. A feasible solution is to limit the value of max_read for virtio-fs, so the length passed to kmalloc() will be limited. However it will affect the maximal read size for normal read. And for virtio-fs write initiated from kernel, it has the similar problem but now there is no way to limit fc->max_write in kernel. So instead of limiting both the values of max_read and max_write in kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use pages instead of pointer to pass the KVEC_IO data. After switching to pages for KVEC_IO data, these pages will be used for DMA through virtio-fs. If these pages are backed by vmalloc(), {flush|invalidate}_kernel_vmap_range() are necessary to flush or invalidate the cache before the DMA operation. So add two new fields in fuse_args_pages to record the base address of vmalloc area and the condition indicating whether invalidation is needed. Perform the flush in fuse_get_user_pages() for write operations and the invalidation in fuse_release_user_pages() for read operations. It may seem necessary to introduce another field in fuse_conn to indicate that these KVEC_IO pages are used for DMA, However, considering that virtio-fs is currently the only user of use_pages_for_kvec_io, just reuse use_pages_for_kvec_io to indicate that these pages will be used for DMA. Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Hou Tao <houtao1(a)huawei.com> Tested-by: Jingbo Xu <jefflexu(a)linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- fs/fuse/fuse_i.h | 6 +++++ fs/fuse/file.c | 62 +++++++++++++++++++++++++++++++-------------- fs/fuse/virtio_fs.c | 1 + 3 files changed, 50 insertions(+), 19 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 11cc437fe078..749003487776 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -273,9 +273,12 @@ struct fuse_args { #ifndef __GENKSYMS__ bool user_pages:1; #endif + bool invalidate_vmap:1; struct fuse_in_arg in_args[3]; struct fuse_arg out_args[2]; void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error); + /* Used for kvec iter backed by vmalloc address */ + void *vmap_base; }; struct fuse_args_pages { @@ -766,6 +769,9 @@ struct fuse_conn { /* Auto-mount submounts announced by the server */ unsigned int auto_submounts:1; + /* Use pages instead of pointer for kernel I/O */ + unsigned int use_pages_for_kvec_io:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d3e798b5a560..e0aac6019cdb 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -615,7 +615,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, args->out_args[0].size = count; } -static void fuse_release_user_pages(struct fuse_args_pages *ap, +static void fuse_release_user_pages(struct fuse_args_pages *ap, ssize_t nres, bool should_dirty) { unsigned int i; @@ -625,6 +625,9 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, set_page_dirty_lock(ap->pages[i]); put_page(ap->pages[i]); } + + if (nres > 0 && ap->args.invalidate_vmap) + invalidate_kernel_vmap_range(ap->args.vmap_base, nres); } static void fuse_io_release(struct kref *kref) @@ -723,25 +726,29 @@ static void fuse_aio_complete_req(struct fuse_mount *fm, struct fuse_args *args, struct fuse_io_args *ia = container_of(args, typeof(*ia), ap.args); struct fuse_io_priv *io = ia->io; ssize_t pos = -1; - - fuse_release_user_pages(&ia->ap, io->should_dirty); + size_t nres; if (err) { /* Nothing */ } else if (io->write) { if (ia->write.out.size > ia->write.in.size) { err = -EIO; - } else if (ia->write.in.size != ia->write.out.size) { - pos = ia->write.in.offset - io->offset + - ia->write.out.size; + } else { + nres = ia->write.out.size; + if (ia->write.in.size != ia->write.out.size) + pos = ia->write.in.offset - io->offset + + ia->write.out.size; } } else { u32 outsize = args->out_args[0].size; + nres = outsize; if (ia->read.in.size != outsize) pos = ia->read.in.offset - io->offset + outsize; } + fuse_release_user_pages(&ia->ap, err ?: nres, io->should_dirty); + fuse_aio_complete(io, err, pos); fuse_io_free(ia); } @@ -1394,24 +1401,37 @@ static inline size_t fuse_get_frag_size(const struct iov_iter *ii, static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, size_t *nbytesp, int write, - unsigned int max_pages) + unsigned int max_pages, + bool use_pages_for_kvec_io) { + bool flush_or_invalidate = false; size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; - /* Special case for kernel I/O: can copy directly into the buffer */ + /* Special case for kernel I/O: can copy directly into the buffer. + * However if the implementation of fuse_conn requires pages instead of + * pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead. + */ if (iov_iter_is_kvec(ii)) { - unsigned long user_addr = fuse_get_user_addr(ii); - size_t frag_size = fuse_get_frag_size(ii, *nbytesp); + void *user_addr = (void *)fuse_get_user_addr(ii); - if (write) - ap->args.in_args[1].value = (void *) user_addr; - else - ap->args.out_args[0].value = (void *) user_addr; + if (!use_pages_for_kvec_io) { + size_t frag_size = fuse_get_frag_size(ii, *nbytesp); - iov_iter_advance(ii, frag_size); - *nbytesp = frag_size; - return 0; + if (write) + ap->args.in_args[1].value = user_addr; + else + ap->args.out_args[0].value = user_addr; + + iov_iter_advance(ii, frag_size); + *nbytesp = frag_size; + return 0; + } + + if (is_vmalloc_addr(user_addr)) { + ap->args.vmap_base = user_addr; + flush_or_invalidate = true; + } } while (nbytes < *nbytesp && ap->num_pages < max_pages) { @@ -1438,6 +1458,10 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, (PAGE_SIZE - ret) & (PAGE_SIZE - 1); } + if (write && flush_or_invalidate) + flush_kernel_vmap_range(ap->args.vmap_base, nbytes); + + ap->args.invalidate_vmap = !write && flush_or_invalidate; ap->args.user_pages = true; if (write) ap->args.in_pages = true; @@ -1489,7 +1513,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, size_t nbytes = min(count, nmax); err = fuse_get_user_pages(&ia->ap, iter, &nbytes, write, - max_pages); + max_pages, fc->use_pages_for_kvec_io); if (err && !nbytes) break; @@ -1503,7 +1527,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, } if (!io->async || nres < 0) { - fuse_release_user_pages(&ia->ap, io->should_dirty); + fuse_release_user_pages(&ia->ap, nres, io->should_dirty); fuse_io_free(ia); } ia = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 17d30577340a..4ef123691a83 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1464,6 +1464,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->release = fuse_free_conn; fc->delete_stale = true; fc->auto_submounts = true; + fc->use_pages_for_kvec_io = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, -- 2.39.2
2 5
0 0
[PATCH OLK-6.6] virtiofs: use pages instead of pointer for kernel direct IO
by Yifan Qiao 30 Dec '24

30 Dec '24
From: Hou Tao <houtao1(a)huawei.com> mainline inclusion from mainline-v6.13-rc1 commit 41748675c0bf252b3c5f600a95830f0936d366c1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAGA CVE: CVE-2024-53219 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- When trying to insert a 10MB kernel module kept in a virtio-fs with cache disabled, the following warning was reported: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ...... Modules linked in: CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:__alloc_pages+0x2bf/0x380 ...... Call Trace: <TASK> ? __warn+0x8e/0x150 ? __alloc_pages+0x2bf/0x380 __kmalloc_large_node+0x86/0x160 __kmalloc+0x33c/0x480 virtio_fs_enqueue_req+0x240/0x6d0 virtio_fs_wake_pending_and_unlock+0x7f/0x190 queue_request_and_unlock+0x55/0x60 fuse_simple_request+0x152/0x2b0 fuse_direct_io+0x5d2/0x8c0 fuse_file_read_iter+0x121/0x160 __kernel_read+0x151/0x2d0 kernel_read+0x45/0x50 kernel_read_file+0x1a9/0x2a0 init_module_from_file+0x6a/0xe0 idempotent_init_module+0x175/0x230 __x64_sys_finit_module+0x5d/0xb0 x64_sys_call+0x1c3/0x9e0 do_syscall_64+0x3d/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ...... </TASK> ---[ end trace 0000000000000000 ]--- The warning is triggered as follows: 1) syscall finit_module() handles the module insertion and it invokes kernel_read_file() to read the content of the module first. 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and passes it to kernel_read(). kernel_read() constructs a kvec iter by using iov_iter_kvec() and passes it to fuse_file_read_iter(). 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes fuse_direct_io(). As for now, the maximal read size for kvec iter is only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so fuse_direct_io() doesn't split the 10MB buffer. It saves the address and the size of the 10MB-sized buffer in out_args[0] of a fuse request and passes the fuse request to virtio_fs_wake_pending_and_unlock(). 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to queue the request. Because virtiofs need DMA-able address, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for all fuse args, copies these args into the bounce buffer and passed the physical address of the bounce buffer to virtiofsd. The total length of these fuse args for the passed fuse request is about 10MB, so copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and it triggers the warning in __alloc_pages(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; 5) virtio_fs_enqueue_req() will retry the memory allocation in a kworker, but it won't help, because kmalloc() will always return NULL due to the abnormal size and finit_module() will hang forever. A feasible solution is to limit the value of max_read for virtio-fs, so the length passed to kmalloc() will be limited. However it will affect the maximal read size for normal read. And for virtio-fs write initiated from kernel, it has the similar problem but now there is no way to limit fc->max_write in kernel. So instead of limiting both the values of max_read and max_write in kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use pages instead of pointer to pass the KVEC_IO data. After switching to pages for KVEC_IO data, these pages will be used for DMA through virtio-fs. If these pages are backed by vmalloc(), {flush|invalidate}_kernel_vmap_range() are necessary to flush or invalidate the cache before the DMA operation. So add two new fields in fuse_args_pages to record the base address of vmalloc area and the condition indicating whether invalidation is needed. Perform the flush in fuse_get_user_pages() for write operations and the invalidation in fuse_release_user_pages() for read operations. It may seem necessary to introduce another field in fuse_conn to indicate that these KVEC_IO pages are used for DMA, However, considering that virtio-fs is currently the only user of use_pages_for_kvec_io, just reuse use_pages_for_kvec_io to indicate that these pages will be used for DMA. Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Hou Tao <houtao1(a)huawei.com> Tested-by: Jingbo Xu <jefflexu(a)linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- fs/fuse/fuse_i.h | 6 +++++ fs/fuse/file.c | 62 +++++++++++++++++++++++++++++++-------------- fs/fuse/virtio_fs.c | 1 + 3 files changed, 50 insertions(+), 19 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 3d5734ed99cf..1bb136bcbe9e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -286,9 +286,12 @@ struct fuse_args { bool page_replace:1; bool may_block:1; bool is_ext:1; + bool invalidate_vmap:1; struct fuse_in_arg in_args[3]; struct fuse_arg out_args[2]; void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error); + /* Used for kvec iter backed by vmalloc address */ + void *vmap_base; }; struct fuse_args_pages { @@ -823,6 +826,9 @@ struct fuse_conn { /* Is statx not implemented by fs? */ unsigned int no_statx:1; + /* Use pages instead of pointer for kernel I/O */ + unsigned int use_pages_for_kvec_io:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ceb9f7d23038..fca2be898336 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -625,7 +625,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, args->out_args[0].size = count; } -static void fuse_release_user_pages(struct fuse_args_pages *ap, +static void fuse_release_user_pages(struct fuse_args_pages *ap, ssize_t nres, bool should_dirty) { unsigned int i; @@ -635,6 +635,9 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, set_page_dirty_lock(ap->pages[i]); put_page(ap->pages[i]); } + + if (nres > 0 && ap->args.invalidate_vmap) + invalidate_kernel_vmap_range(ap->args.vmap_base, nres); } static void fuse_io_release(struct kref *kref) @@ -733,25 +736,29 @@ static void fuse_aio_complete_req(struct fuse_mount *fm, struct fuse_args *args, struct fuse_io_args *ia = container_of(args, typeof(*ia), ap.args); struct fuse_io_priv *io = ia->io; ssize_t pos = -1; - - fuse_release_user_pages(&ia->ap, io->should_dirty); + size_t nres; if (err) { /* Nothing */ } else if (io->write) { if (ia->write.out.size > ia->write.in.size) { err = -EIO; - } else if (ia->write.in.size != ia->write.out.size) { - pos = ia->write.in.offset - io->offset + - ia->write.out.size; + } else { + nres = ia->write.out.size; + if (ia->write.in.size != ia->write.out.size) + pos = ia->write.in.offset - io->offset + + ia->write.out.size; } } else { u32 outsize = args->out_args[0].size; + nres = outsize; if (ia->read.in.size != outsize) pos = ia->read.in.offset - io->offset + outsize; } + fuse_release_user_pages(&ia->ap, err ?: nres, io->should_dirty); + fuse_aio_complete(io, err, pos); fuse_io_free(ia); } @@ -1368,24 +1375,37 @@ static inline size_t fuse_get_frag_size(const struct iov_iter *ii, static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, size_t *nbytesp, int write, - unsigned int max_pages) + unsigned int max_pages, + bool use_pages_for_kvec_io) { + bool flush_or_invalidate = false; size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; - /* Special case for kernel I/O: can copy directly into the buffer */ + /* Special case for kernel I/O: can copy directly into the buffer. + * However if the implementation of fuse_conn requires pages instead of + * pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead. + */ if (iov_iter_is_kvec(ii)) { - unsigned long user_addr = fuse_get_user_addr(ii); - size_t frag_size = fuse_get_frag_size(ii, *nbytesp); + void *user_addr = (void *)fuse_get_user_addr(ii); - if (write) - ap->args.in_args[1].value = (void *) user_addr; - else - ap->args.out_args[0].value = (void *) user_addr; + if (!use_pages_for_kvec_io) { + size_t frag_size = fuse_get_frag_size(ii, *nbytesp); - iov_iter_advance(ii, frag_size); - *nbytesp = frag_size; - return 0; + if (write) + ap->args.in_args[1].value = user_addr; + else + ap->args.out_args[0].value = user_addr; + + iov_iter_advance(ii, frag_size); + *nbytesp = frag_size; + return 0; + } + + if (is_vmalloc_addr(user_addr)) { + ap->args.vmap_base = user_addr; + flush_or_invalidate = true; + } } while (nbytes < *nbytesp && ap->num_pages < max_pages) { @@ -1411,6 +1431,10 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, (PAGE_SIZE - ret) & (PAGE_SIZE - 1); } + if (write && flush_or_invalidate) + flush_kernel_vmap_range(ap->args.vmap_base, nbytes); + + ap->args.invalidate_vmap = !write && flush_or_invalidate; ap->args.user_pages = true; if (write) ap->args.in_pages = true; @@ -1478,7 +1502,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, size_t nbytes = min(count, nmax); err = fuse_get_user_pages(&ia->ap, iter, &nbytes, write, - max_pages); + max_pages, fc->use_pages_for_kvec_io); if (err && !nbytes) break; @@ -1492,7 +1516,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, } if (!io->async || nres < 0) { - fuse_release_user_pages(&ia->ap, io->should_dirty); + fuse_release_user_pages(&ia->ap, nres, io->should_dirty); fuse_io_free(ia); } ia = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index d84dacbdce2c..5779c7ba1e3d 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1458,6 +1458,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->delete_stale = true; fc->auto_submounts = true; fc->sync_fs = true; + fc->use_pages_for_kvec_io = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, -- 2.39.2
2 5
0 0
[PATCH OLK-5.10] arm64: stacktrace: Handle 'lr' in interrupt context
by Zheng Yejian 30 Dec '24

30 Dec '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBEMJK CVE: NA -------------------------------- Suppose existing call chain: P() -> A() -> B(), B() don't construct stack frame in which fp and lr are saved for call stack unwinding, then if task1 is interrupted as running at address 'B2' and then preempted by task2, and task2 unwind the call stack of task1, it expect to see P->A->B, but actually P->B, A disappeared! A(): A1: stp fp, lr, ... <-- suppose fp_P and lr_P saved A2: mov fp, sp <-- suppose fp_A saved in 'fp' register A3: bl B <-- call to B() A4: mov ... <-- 'A4' saved in 'lr' register B(): B1: mov ... B2: mov ... <-- interrupt comes, then run into el1_irq() B3: mov ... <-- 'B3' is saved in 'elr_el1' register el1_irq(): ... <-- save registers then construct stack frame Cm: bl arm64_preempt_schedule_irq <-- Can be preempted here Cn: ... In this case, at the time interrupt comes, the address 'A4' will be saved In 'lr' register, then in interrupt entry, 'lr' register will be saved in Stack memory as struct pt_regs. See following stack memory layout, as call stack unwinding, if address 'Cn' is found , we know that fp_C is point to pt_regs.stackframe[0], then we can found the 'A4' in pt_regs.regs[30], then we can know that B() is currently called by A(). Stack memory (High address downto Low address): <High address> |-----------------| | lr_P | |-----------------| | fp_P | -> |-----------------| | | ... | | |-----------------| | | B3 | | |-----------------| -- | fp_A | -> |-----------------| <-- pt_regs.stackframe[0] | | | | | X0... fp lr(A4) | <-- pt_regs.regs[] | |-----------------| | | ... | | |-----------------| | | Cn | <-- 'Cn' is return address of | |-----------------| arm64_preempt_schedule_irq() -- | fp_C | |-----------------| <Low address> Fixes: e429c61d12bf ("livepatch/arm64: Support livepatch without ftrace") Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> --- arch/arm64/include/asm/stacktrace.h | 3 ++ arch/arm64/kernel/entry.S | 2 + arch/arm64/kernel/stacktrace.c | 66 +++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index eb29b1fe8255..dda36f917263 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -171,4 +171,7 @@ static inline void start_backtrace(struct stackframe *frame, frame->prev_type = STACK_TYPE_UNKNOWN; } +#ifdef CONFIG_PREEMPTION +extern void preempt_schedule_irq_ret_addr(void); +#endif #endif /* __ASM_STACKTRACE_H */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index eb4ba8308397..1290f36c8371 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -523,6 +523,8 @@ alternative_else_nop_endif #endif cbnz x24, 1f // preempt count != 0 || NMI return path bl arm64_preempt_schedule_irq // irq en/disable is done inside +.global preempt_schedule_irq_ret_addr +preempt_schedule_irq_ret_addr: 1: #endif diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 2073a3a7fe75..93ac3c74fb41 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -129,6 +129,72 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame, if (!fn(data, frame->pc)) break; +#ifdef CONFIG_PREEMPTION + /* + * Suppose existing call chain: P() -> A() -> B(), B() don't construct stack + * frame in which fp and lr are saved for call stack unwinding, then if task1 + * is interrupted as running at address 'B2' and then preempted by task2, + * and task2 unwind the call stack of task1, it expect to see P->A->B, but + * actually P->B, A disappeared! + * + * A(): + * A1: stp fp, lr, ... <-- suppose fp_P and lr_P saved + * A2: mov fp, sp <-- suppose fp_A saved in 'fp' register + * A3: bl B <-- call to B() + * A4: mov ... <-- 'A4' saved in 'lr' register + * + * B(): + * B1: mov ... + * B2: mov ... <-- interrupt comes, then run into el1_irq() + * B3: mov ... <-- 'B3' is saved in 'elr_el1' register + * + * el1_irq(): + * ... <-- save registers then construct stack frame + * Cm: bl arm64_preempt_schedule_irq <-- Can be preempted here + * Cn: ... + * + * In this case, at the time interrupt comes, the address 'A4' will be saved + * In 'lr' register, then in interrupt entry, 'lr' register will be saved in + * Stack memory as struct pt_regs. + * + * See following stack memory layout, as call stack unwinding, if address + * 'Cn' is found , we know that fp_C is point to pt_regs.stackframe[0], + * then we can found the 'A4' in pt_regs.regs[30], then we can know that + * B() is currently called by A(). + * + * Stack memory (High address downto Low address): + * + * <High address> + * |-----------------| + * | lr_P | + * |-----------------| + * | fp_P | + * -> |-----------------| + * | | ... | + * | |-----------------| + * | | B3 | + * | |-----------------| + * -- | fp_A | + * -> |-----------------| <-- pt_regs.stackframe[0] + * | | | + * | | X0... fp lr(A4) | <-- pt_regs.regs[] + * | |-----------------| + * | | ... | + * | |-----------------| + * | | Cn | <-- 'Cn' is return address of + * | |-----------------| arm64_preempt_schedule_irq() + * -- | fp_C | + * |-----------------| + * <Low address> + */ + if (frame->pc == (unsigned long)preempt_schedule_irq_ret_addr) { + struct pt_regs *reg = container_of((u64 *)frame->fp, + struct pt_regs, stackframe[0]); + + if (!fn(data, reg->regs[30])) + break; + } +#endif ret = unwind_frame(tsk, frame); if (ret < 0) break; -- 2.25.1
2 5
0 0
[PATCH openEuler-1.0-LTS] printk: Skip log flush in NMI context when logbuf_lock is held
by Xiaomeng Zhang 30 Dec '24

30 Dec '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB963V CVE: NA -------------------------------- In nmi_trigger_cpumask_backtrace(), printk_safe_flush() is called after sending NMI to flush the logs. When logbuf_lock is already held and the current CPU is in printk-safe context (e.g., NMI context), attempting to acquire the lock again can lead to deadlock. Modify the function to return early when detecting logbuf_lock is held and current CPU is in printk-safe context. This prevents deadlock scenarios where CPU0 holds the lock while other CPUs try to acquire it in NMI context. Fixes: 099f1c84c005 ("printk: introduce per-cpu safe_print seq buffer") Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com> --- kernel/printk/printk_safe.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 809f92492ec7..c97845688fe1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -256,6 +256,10 @@ void printk_safe_flush(void) { int cpu; + if (raw_spin_is_locked(&logbuf_lock) && + (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work); -- 2.34.1
2 5
0 0
[PATCH openEuler-1.0-LTS] printk: Skip log flush in NMI context when logbuf_lock is held
by Xiaomeng Zhang 30 Dec '24

30 Dec '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB963V CVE: NA -------------------------------- In nmi_trigger_cpumask_backtrace(), printk_safe_flush() is called after sending NMI to flush the logs. When logbuf_lock is already held and the current CPU is in printk-safe context (e.g., NMI context), attempting to acquire the lock again can lead to deadlock. Modify the function to return early when detecting logbuf_lock is held and current CPU is in printk-safe context. This prevents deadlock scenarios where CPU0 holds the lock while other CPUs try to acquire it in NMI context. Fixes: 099f1c84c005 ("printk: introduce per-cpu safe_print seq buffer") Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com> --- kernel/printk/printk_safe.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 809f92492ec7..c97845688fe1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -256,6 +256,10 @@ void printk_safe_flush(void) { int cpu; + if (raw_spin_is_locked(&logbuf_lock) && + (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work); -- 2.34.1
2 5
0 0
[PATCH openEuler-22.03-LTS-SP1] io_uring: check if iowq is killed before queuing
by Yifan Qiao 30 Dec '24

30 Dec '24
From: Pavel Begunkov <asml.silence(a)gmail.com> stable inclusion from stable-v6.6.68 commit 2ca94c8de36091067b9ce7527ae8db3812d38781 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEG43 CVE: CVE-2024-56709 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit dbd2ca9367eb19bc5e269b8c58b0b1514ada9156 upstream. task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable(a)vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot(a)protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.17346379… Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- io_uring/io_uring.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4d69fb4cf803..bb37e8f08ae5 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1090,6 +1090,7 @@ static struct file *io_file_get(struct io_ring_ctx *ctx, unsigned int issue_flags); static void __io_queue_sqe(struct io_kiocb *req); static void io_rsrc_put_work(struct work_struct *work); +static void io_req_task_queue_fail(struct io_kiocb *req, int ret); static void io_req_task_queue(struct io_kiocb *req); static void io_submit_flush_completions(struct io_ring_ctx *ctx); @@ -1459,7 +1460,11 @@ static void io_queue_async_work(struct io_kiocb *req, bool *locked) locked = NULL; BUG_ON(!tctx); - BUG_ON(!tctx->io_wq); + + if ((current->flags & PF_KTHREAD) || !tctx->io_wq) { + io_req_task_queue_fail(req, -ECANCELED); + return; + } /* init ->work of the whole link before punting */ io_prep_async_link(req); -- 2.39.2
2 5
0 0
[PATCH OLK-5.10] io_uring: check if iowq is killed before queuing
by Yifan Qiao 30 Dec '24

30 Dec '24
From: Pavel Begunkov <asml.silence(a)gmail.com> stable inclusion from stable-v6.6.68 commit 2ca94c8de36091067b9ce7527ae8db3812d38781 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEG43 CVE: CVE-2024-56709 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit dbd2ca9367eb19bc5e269b8c58b0b1514ada9156 upstream. task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable(a)vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot(a)protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.17346379… Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- io_uring/io_uring.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6e5e00a7692c..3a1eee5bac77 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1096,6 +1096,7 @@ static struct file *io_file_get(struct io_ring_ctx *ctx, unsigned int issue_flags); static void __io_queue_sqe(struct io_kiocb *req); static void io_rsrc_put_work(struct work_struct *work); +static void io_req_task_queue_fail(struct io_kiocb *req, int ret); static void io_req_task_queue(struct io_kiocb *req); static void io_submit_flush_completions(struct io_ring_ctx *ctx); @@ -1465,7 +1466,11 @@ static void io_queue_async_work(struct io_kiocb *req, bool *locked) locked = NULL; BUG_ON(!tctx); - BUG_ON(!tctx->io_wq); + + if ((current->flags & PF_KTHREAD) || !tctx->io_wq) { + io_req_task_queue_fail(req, -ECANCELED); + return; + } /* init ->work of the whole link before punting */ io_prep_async_link(req); -- 2.39.2
2 5
0 0
  • ← Newer
  • 1
  • ...
  • 316
  • 317
  • 318
  • 319
  • 320
  • 321
  • 322
  • ...
  • 1900
  • Older →

HyperKitty Powered by HyperKitty