From: Chen Zhongjin chenzhongjin@huawei.com
stable inclusion from stable-v5.10.163 commit 740c537f52c1f54aff9094744483d1515c7c8b7b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6GCCV
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 672e4268b2863d7e4978dfed29552b31c2f9bd4e upstream.
ovl_dentry_revalidate_common() can be called in rcu-walk mode. As document said, "in rcu-walk mode, d_parent and d_inode should not be used without care".
Check inode here to protect access under rcu-walk mode.
Fixes: bccece1ead36 ("ovl: allow remote upper") Reported-and-tested-by: syzbot+a4055c78774bbf3498bb@syzkaller.appspotmail.com Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/overlayfs/super.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 45c596dfe3a3..e3cd5a00f880 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -138,11 +138,16 @@ static int ovl_dentry_revalidate_common(struct dentry *dentry, unsigned int flags, bool weak) { struct ovl_entry *oe = dentry->d_fsdata; + struct inode *inode = d_inode_rcu(dentry); struct dentry *upper; unsigned int i; int ret = 1;
- upper = ovl_dentry_upper(dentry); + /* Careful in RCU mode */ + if (!inode) + return -ECHILD; + + upper = ovl_i_dentry_upper(inode); if (upper) ret = ovl_revalidate_real(upper, flags, weak);
From: Keith Busch kbusch@kernel.org
mainline inclusion from mainline-v6.1-rc1 commit 8237c01f1696bc53c470493bf1fe092a107648a6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6GI6F CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id...
--------------------------------
The hctx's run_work may be racing with the elevator switch when reinitializing hardware queues. The queue is merely frozen in this context, but that only prevents requests from allocating and doesn't stop the hctx work from running. The work may get an elevator pointer that's being torn down, and can result in use-after-free errors and kernel panics (example below). Use the quiesced elevator switch instead, and make the previous one static since it is now only used locally.
nvme nvme0: resetting controller nvme nvme0: 32/0/0 default/read/poll queues BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0 Oops: 0000 [#1] SMP PTI Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:kyber_has_work+0x29/0x70
...
Call Trace: __blk_mq_do_dispatch_sched+0x83/0x2b0 __blk_mq_sched_dispatch_requests+0x12e/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x50 process_one_work+0x1ef/0x380 worker_thread+0x2d/0x3e0
Signed-off-by: Keith Busch kbusch@kernel.org Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflicts: block/blk-mq.c block/blk.h Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- block/blk-mq.c | 6 +++--- block/blk.h | 3 +-- block/elevator.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index c02e42071615..663c9f5d6556 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3873,14 +3873,14 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
mutex_lock(&q->sysfs_lock); /* - * After elevator_switch_mq, the previous elevator_queue will be + * After elevator_switch, the previous elevator_queue will be * released by elevator_release. The reference of the io scheduler * module get by elevator_get will also be put. So we need to get * a reference of the io scheduler module here to prevent it to be * removed. */ __module_get(qe->type->elevator_owner); - elevator_switch_mq(q, NULL); + elevator_switch(q, NULL); mutex_unlock(&q->sysfs_lock);
return true; @@ -3905,7 +3905,7 @@ static void blk_mq_elv_switch_back(struct list_head *head, kfree(qe);
mutex_lock(&q->sysfs_lock); - elevator_switch_mq(q, t); + elevator_switch(q, t); mutex_unlock(&q->sysfs_lock); }
diff --git a/block/blk.h b/block/blk.h index 5a9fb24c2ed9..a644910678ac 100644 --- a/block/blk.h +++ b/block/blk.h @@ -201,8 +201,7 @@ void blk_account_io_done(struct request *req, u64 now); void blk_insert_flush(struct request *rq);
void elevator_init_mq(struct request_queue *q); -int elevator_switch_mq(struct request_queue *q, - struct elevator_type *new_e); +int elevator_switch(struct request_queue *q, struct elevator_type *new_e); void __elevator_exit(struct request_queue *, struct elevator_queue *); int elv_register_queue(struct request_queue *q, bool uevent); void elv_unregister_queue(struct request_queue *q); diff --git a/block/elevator.c b/block/elevator.c index b66afb571e15..6f7de2ffad0e 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -577,7 +577,7 @@ void elv_unregister(struct elevator_type *e) } EXPORT_SYMBOL_GPL(elv_unregister);
-int elevator_switch_mq(struct request_queue *q, +static int elevator_switch_mq(struct request_queue *q, struct elevator_type *new_e) { int ret; @@ -716,7 +716,7 @@ void elevator_init_mq(struct request_queue *q) * need for the new one. this way we have a chance of going back to the old * one, if the new one fails init for some reason. */ -static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) +int elevator_switch(struct request_queue *q, struct elevator_type *new_e) { int err;
From: Mukesh Ojha quic_mojha@quicinc.com
mainline inclusion from mainline-v6.3-rc1 commit 8843e06f67b14f71c044bf6267b2387784c7e198 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6HWOV CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
It seems a data race between ring_buffer writing and integrity check. That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG was cleared when doing integrity check rb_check_pages():
rb_check_pages() rb_handle_head_page(): -------- -------- rb_head_page_deactivate() rb_head_page_set_normal() rb_head_page_activate()
We do intergrity test of the list to check if the list is corrupted and it is still worth doing it. So, let's refactor rb_check_pages() such that we no longer clear and set flag during the list sanity checking.
[1] and [2] are the test to reproduce and the crash report respectively.
1: ``` read_trace.sh while true; do # the "trace" file is closed after read head -1 /sys/kernel/tracing/trace > /dev/null done ``` ``` repro.sh sysctl -w kernel.panic_on_warn=1 # function tracer will writing enough data into ring_buffer echo function > /sys/kernel/tracing/current_tracer ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ```
2: ------------[ cut here ]------------ WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653 rb_move_tail+0x450/0x470 Modules linked in: CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G W 6.2.0-rc6+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:rb_move_tail+0x450/0x470 Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24 83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83 f8 02 0f 84 ce fb ff ff e9 db RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203 RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18 RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400 RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2 R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000 R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108 FS: 0000000000000000(0000) GS:ffff9db3bdcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0 Call Trace: <TASK> ring_buffer_lock_reserve+0x136/0x360 ? __do_softirq+0x287/0x2df ? __pfx_rcu_softirq_qs+0x10/0x10 trace_function+0x21/0x110 ? __pfx_rcu_softirq_qs+0x10/0x10 ? __do_softirq+0x287/0x2df function_trace_call+0xf6/0x120 0xffffffffc038f097 ? rcu_softirq_qs+0x5/0x140 rcu_softirq_qs+0x5/0x140 __do_softirq+0x287/0x2df run_ksoftirqd+0x2a/0x30 smpboot_thread_fn+0x188/0x220 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0xe7/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]---
[ crash report and test reproducer credit goes to Zheng Yejian]
Link: https://lore.kernel.org/linux-trace-kernel/1676376403-16462-1-git-send-email...
Cc: mhiramat@kernel.org Cc: stable@vger.kernel.org Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") Reported-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Mukesh Ojha quic_mojha@quicinc.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Reviewed-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- kernel/trace/ring_buffer.c | 42 +++++++++----------------------------- 1 file changed, 10 insertions(+), 32 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6deac666ba3e..4a9b16dfb5cd 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1376,19 +1376,6 @@ static int rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, return 0; }
-/** - * rb_check_list - make sure a pointer to a list has the last bits zero - */ -static int rb_check_list(struct ring_buffer_per_cpu *cpu_buffer, - struct list_head *list) -{ - if (RB_WARN_ON(cpu_buffer, rb_list_head(list->prev) != list->prev)) - return 1; - if (RB_WARN_ON(cpu_buffer, rb_list_head(list->next) != list->next)) - return 1; - return 0; -} - /** * rb_check_pages - integrity check of buffer pages * @cpu_buffer: CPU buffer with pages to test @@ -1398,36 +1385,27 @@ static int rb_check_list(struct ring_buffer_per_cpu *cpu_buffer, */ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) { - struct list_head *head = cpu_buffer->pages; - struct buffer_page *bpage, *tmp; + struct list_head *head = rb_list_head(cpu_buffer->pages); + struct list_head *tmp;
- /* Reset the head page if it exists */ - if (cpu_buffer->head_page) - rb_set_head_page(cpu_buffer); - - rb_head_page_deactivate(cpu_buffer); - - if (RB_WARN_ON(cpu_buffer, head->next->prev != head)) - return -1; - if (RB_WARN_ON(cpu_buffer, head->prev->next != head)) + if (RB_WARN_ON(cpu_buffer, + rb_list_head(rb_list_head(head->next)->prev) != head)) return -1;
- if (rb_check_list(cpu_buffer, head)) + if (RB_WARN_ON(cpu_buffer, + rb_list_head(rb_list_head(head->prev)->next) != head)) return -1;
- list_for_each_entry_safe(bpage, tmp, head, list) { + for (tmp = rb_list_head(head->next); tmp != head; tmp = rb_list_head(tmp->next)) { if (RB_WARN_ON(cpu_buffer, - bpage->list.next->prev != &bpage->list)) + rb_list_head(rb_list_head(tmp->next)->prev) != tmp)) return -1; + if (RB_WARN_ON(cpu_buffer, - bpage->list.prev->next != &bpage->list)) - return -1; - if (rb_check_list(cpu_buffer, &bpage->list)) + rb_list_head(rb_list_head(tmp->prev)->next) != tmp)) return -1; }
- rb_head_page_activate(cpu_buffer); - return 0; }
From: Pietro Borrello borrello@diag.uniroma1.it
mainline inclusion from mainline-v6.3-rc1 commit 584f3742890e966d2f0a1f3c418c9ead70b2d99e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7UC CVE: CVE-2023-1076
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add sock_init_data_uid() to explicitly initialize the socket uid. To initialise the socket uid, sock_init_data() assumes a the struct socket* sock is always embedded in a struct socket_alloc, used to access the corresponding inode uid. This may not be true. Examples are sockets created in tun_chr_open() and tap_open().
Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net
Conflicts: net/core/sock.c
Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/net/sock.h | 7 ++++++- net/core/sock.c | 15 ++++++++++++--- 2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index 74dbbf0e30c0..117f1126e45a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1808,7 +1808,12 @@ void sk_common_release(struct sock *sk); * Default socket callbacks and setup code */
-/* Initialise core socket variables */ +/* Initialise core socket variables using an explicit uid. */ +void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid); + +/* Initialise core socket variables. + * Assumes struct socket *sock is embedded in a struct socket_alloc. + */ void sock_init_data(struct socket *sock, struct sock *sk);
/* diff --git a/net/core/sock.c b/net/core/sock.c index 56a927b9b372..d8d42ff15d20 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2967,7 +2967,7 @@ void sk_stop_timer_sync(struct sock *sk, struct timer_list *timer) } EXPORT_SYMBOL(sk_stop_timer_sync);
-void sock_init_data(struct socket *sock, struct sock *sk) +void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid) { sk_init_common(sk); sk->sk_send_head = NULL; @@ -2986,13 +2986,12 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_type = sock->type; RCU_INIT_POINTER(sk->sk_wq, &sock->wq); sock->sk = sk; - sk->sk_uid = SOCK_INODE(sock)->i_uid; sk->sk_gid = SOCK_INODE(sock)->i_gid; } else { RCU_INIT_POINTER(sk->sk_wq, NULL); - sk->sk_uid = make_kuid(sock_net(sk)->user_ns, 0); sk->sk_gid = make_kgid(sock_net(sk)->user_ns, 0); } + sk->sk_uid = uid;
rwlock_init(&sk->sk_callback_lock); if (sk->sk_kern_sock) @@ -3050,6 +3049,16 @@ void sock_init_data(struct socket *sock, struct sock *sk) refcount_set(&sk->sk_refcnt, 1); atomic_set(&sk->sk_drops, 0); } +EXPORT_SYMBOL(sock_init_data_uid); + +void sock_init_data(struct socket *sock, struct sock *sk) +{ + kuid_t uid = sock ? + SOCK_INODE(sock)->i_uid : + make_kuid(sock_net(sk)->user_ns, 0); + + sock_init_data_uid(sock, sk, uid); +} EXPORT_SYMBOL(sock_init_data);
void lock_sock_nested(struct sock *sk, int subclass)
From: Pietro Borrello borrello@diag.uniroma1.it
mainline inclusion from mainline-v6.3-rc1 commit a096ccca6e503a5c575717ff8a36ace27510ab0a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7UC CVE: CVE-2023-1076
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
sock_init_data() assumes that the `struct socket` passed in input is contained in a `struct socket_alloc` allocated with sock_alloc(). However, tun_chr_open() passes a `struct socket` embedded in a `struct tun_file` allocated with sk_alloc(). This causes a type confusion when issuing a container_of() with SOCK_INODE() in sock_init_data() which results in assigning a wrong sk_uid to the `struct sock` in input. On default configuration, the type confused field overlaps with the high 4 bytes of `struct tun_struct __rcu *tun` of `struct tun_file`, NULL at the time of call, which makes the uid of all tun sockets 0, i.e., the root one. Fix the assignment by using sock_init_data_uid().
Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index f8f15f2f7cc8..e387cd7a7873 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3446,7 +3446,7 @@ static int tun_chr_open(struct inode *inode, struct file * file) tfile->socket.file = file; tfile->socket.ops = &tun_socket_ops;
- sock_init_data(&tfile->socket, &tfile->sk); + sock_init_data_uid(&tfile->socket, &tfile->sk, inode->i_uid);
tfile->sk.sk_write_space = tun_sock_write_space; tfile->sk.sk_sndbuf = INT_MAX;
From: Pietro Borrello borrello@diag.uniroma1.it
mainline inclusion from mainline-v6.3-rc1 commit 66b2c338adce580dfce2199591e65e2bab889cff category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7UC CVE: CVE-2023-1076
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
sock_init_data() assumes that the `struct socket` passed in input is contained in a `struct socket_alloc` allocated with sock_alloc(). However, tap_open() passes a `struct socket` embedded in a `struct tap_queue` allocated with sk_alloc(). This causes a type confusion when issuing a container_of() with SOCK_INODE() in sock_init_data() which results in assigning a wrong sk_uid to the `struct sock` in input. On default configuration, the type confused field overlaps with padding bytes between `int vnet_hdr_sz` and `struct tap_dev __rcu *tap` in `struct tap_queue`, which makes the uid of all tap sockets 0, i.e., the root one. Fix the assignment by using sock_init_data_uid().
Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/tap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 8f7bb15206e9..d9018d9fe310 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -523,7 +523,7 @@ static int tap_open(struct inode *inode, struct file *file) q->sock.state = SS_CONNECTED; q->sock.file = file; q->sock.ops = &tap_socket_ops; - sock_init_data(&q->sock, &q->sk); + sock_init_data_uid(&q->sock, &q->sk, inode->i_uid); q->sk.sk_write_space = tap_sock_write_space; q->sk.sk_destruct = tap_sock_destruct; q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
From: Pietro Borrello borrello@diag.uniroma1.it
stable inclusion from stable-v5.10.168 commit c53f34ec3fbf3e9f67574118a6bb35ae1146f7ca category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7UF CVE: CVE-2023-1078
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
[ Upstream commit f753a68980cf4b59a80fe677619da2b1804f526d ]
rds_rm_zerocopy_callback() uses list_entry() on the head of a list causing a type confusion. Use list_first_entry() to actually access the first element of the rs_zcookie_queue list.
Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniro... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/rds/message.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rds/message.c b/net/rds/message.c index 799034e0f513..b363ef13c75e 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -104,9 +104,9 @@ static void rds_rm_zerocopy_callback(struct rds_sock *rs, spin_lock_irqsave(&q->lock, flags); head = &q->zcookie_head; if (!list_empty(head)) { - info = list_entry(head, struct rds_msg_zcopy_info, - rs_zcookie_next); - if (info && rds_zcookie_add(info, cookie)) { + info = list_first_entry(head, struct rds_msg_zcopy_info, + rs_zcookie_next); + if (rds_zcookie_add(info, cookie)) { spin_unlock_irqrestore(&q->lock, flags); kfree(rds_info_from_znotifier(znotif)); /* caller invokes rds_wake_sk_sleep() */
From: Pavel Begunkov asml.silence@gmail.com
mainline inclusion from mainline-v5.12-rc1 commit 0f1d344feb534555a0dcd0beafb7211a37c5355e category: bugfix bugzilla: 188445,https://gitee.com/openeuler/kernel/issues/I6J5NZ
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
iter_file_splice_write() may spawn bvec segments with zero-length. In preparation for prohibiting them, filter out by hand at splice level.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/splice.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c index 866d5c2367b2..474fb8b5562a 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -662,12 +662,14 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
/* build the vector */ left = sd.total_len; - for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++, n++) { + for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++) { struct pipe_buffer *buf = &pipe->bufs[tail & mask]; size_t this_len = buf->len;
- if (this_len > left) - this_len = left; + /* zero-length bvecs are not supported, skip them */ + if (!this_len) + continue; + this_len = min(this_len, left);
ret = pipe_buf_confirm(pipe, buf); if (unlikely(ret)) { @@ -680,6 +682,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, array[n].bv_len = this_len; array[n].bv_offset = buf->offset; left -= this_len; + n++; }
iov_iter_bvec(&from, WRITE, array, n, sd.total_len - left);
From: Hawkins Jiawei yin31149@gmail.com
stable inclusion from stable-v5.10.155 commit 6322dda483344abe47d17335809f7bbb730bd88b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6HWOS CVE: CVE-2023-26607
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=t...
--------------------------------
commit 36a4d82dddbbd421d2b8e79e1cab68c8126d5075 upstream.
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To ensure access on these ATTR_RECORDs are within bounds, kernel will do some checking during iteration.
The problem is that during checking whether ATTR_RECORD's name is within bounds, kernel will dereferences the ATTR_RECORD name_offset field, before checking this ATTR_RECORD strcture is within bounds. This problem may result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller:
================================================================== BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
[...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193 ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845 ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854 mount_bdev+0x34d/0x410 fs/super.c:1400 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK>
The buggy address belongs to the physical page: page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350 head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140 raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
This patch solves it by moving the ATTR_RECORD strcture's bounds checking earlier, then checking whether ATTR_RECORD's name is within bounds. What's more, this patch also add some comments to improve its maintainability.
Link: https://lkml.kernel.org/r/20220831160935.3409-3-yin31149@gmail.com Link: https://lore.kernel.org/all/1636796c-c85e-7f47-e96f-e074fee3c7d3@huawei.com/ Link: https://groups.google.com/g/syzkaller-bugs/c/t_XdeKPGTR4/m/LECAuIGcBgAJ Signed-off-by: chenxiaosong (A) chenxiaosong2@huawei.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Reported-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com Tested-by: syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com Cc: Anton Altaparmakov anton@tuxera.com Cc: syzkaller-bugs syzkaller-bugs@googlegroups.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/ntfs/attrib.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c index 914e99173130..f171563e5106 100644 --- a/fs/ntfs/attrib.c +++ b/fs/ntfs/attrib.c @@ -594,11 +594,23 @@ static int ntfs_attr_find(const ATTR_TYPE type, const ntfschar *name, for (;; a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))) { u8 *mrec_end = (u8 *)ctx->mrec + le32_to_cpu(ctx->mrec->bytes_allocated); - u8 *name_end = (u8 *)a + le16_to_cpu(a->name_offset) + - a->name_length * sizeof(ntfschar); - if ((u8*)a < (u8*)ctx->mrec || (u8*)a > mrec_end || - name_end > mrec_end) + u8 *name_end; + + /* check whether ATTR_RECORD wrap */ + if ((u8 *)a < (u8 *)ctx->mrec) + break; + + /* check whether Attribute Record Header is within bounds */ + if ((u8 *)a > mrec_end || + (u8 *)a + sizeof(ATTR_RECORD) > mrec_end) break; + + /* check whether ATTR_RECORD's name is within bounds */ + name_end = (u8 *)a + le16_to_cpu(a->name_offset) + + a->name_length * sizeof(ntfschar); + if (name_end > mrec_end) + break; + ctx->attr = a; if (unlikely(le32_to_cpu(a->type) > le32_to_cpu(type) || a->type == AT_END))
From: Miaoqian Lin linmq006@gmail.com
mainline inclusion from mainline-v5.17-rc1 commit fa0ef93868a6062babe1144df2807a8b1d4924d2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IEH4 CVE: CVE-2023-22995
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add the missing platform_device_put() before return from dwc3_qcom_acpi_register_core in the error handling case.
Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20211231113641.31474-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 2bc02355f8ba ("usb: dwc3: qcom: Add support for booting with ACPI") [Fix conflict due to lack of 8dc6e6dd1bee39cd65a232a17d51240fc65a0f4a] Conflict: drivers/usb/dwc3/dwc3-qcom.c Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Reviewed-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/usb/dwc3/dwc3-qcom.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index ca3a35fd8f74..b4ba16037631 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -606,8 +606,10 @@ static int dwc3_qcom_acpi_register_core(struct platform_device *pdev) qcom->dwc3->dev.coherent_dma_mask = dev->coherent_dma_mask;
child_res = kcalloc(2, sizeof(*child_res), GFP_KERNEL); - if (!child_res) + if (!child_res) { + platform_device_put(qcom->dwc3); return -ENOMEM; + }
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) { @@ -643,10 +645,15 @@ static int dwc3_qcom_acpi_register_core(struct platform_device *pdev) }
ret = platform_device_add(qcom->dwc3); - if (ret) + if (ret) { dev_err(&pdev->dev, "failed to add device\n"); + goto out; + } + kfree(child_res); + return 0;
out: + platform_device_put(qcom->dwc3); kfree(child_res); return ret; }
From: Pietro Borrello borrello@diag.uniroma1.it
mainline inclusion from mainline-v6.2-rc7 commit ffe2a22562444720b05bdfeb999c03e810d84cbb category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7U2 CVE: CVE-2023-1075
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
tls_is_tx_ready() checks that list_first_entry() does not return NULL. This condition can never happen. For empty lists, list_first_entry() returns the list_entry() of the head, which is a type confusion. Use list_first_entry_or_null() which returns NULL in case of empty lists.
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Link: https://lore.kernel.org/r/20230128-list-entry-null-check-tls-v1-1-525bbfe6f0... Signed-off-by: Jakub Kicinski kuba@kernel.org Conflicts: net/tls/tls_sw.c Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/net/tls.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/tls.h b/include/net/tls.h index 27737c7953f3..c837ef871564 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -442,7 +442,7 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx) { struct tls_rec *rec;
- rec = list_first_entry(&ctx->tx_list, struct tls_rec, list); + rec = list_first_entry_or_null(&ctx->tx_list, struct tls_rec, list); if (!rec) return false;
From: Paolo Abeni pabeni@redhat.com
stable inclusion from stable-v5.10.163 commit f8ed0a93b5d576bbaf01639ad816473bdfd1dcb0 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6H3MB CVE: CVE-2023-0461
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 2c02d41d71f90a5168391b6a5f2954112ba2307c upstream.
When an ULP-enabled socket enters the LISTEN status, the listener ULP data pointer is copied inside the child/accepted sockets by sk_clone_lock().
The relevant ULP can take care of de-duplicating the context pointer via the clone() operation, but only MPTCP and SMC implement such op.
Other ULPs may end-up with a double-free at socket disposal time.
We can't simply clear the ULP data at clone time, as TLS replaces the socket ops with custom ones assuming a valid TLS ULP context is available.
Instead completely prevent clone-less ULP sockets from entering the LISTEN status.
Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Reported-by: slipper slipper.alive@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.167274060... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv4/inet_connection_sock.c | 16 +++++++++++++++- net/ipv4/tcp_ulp.c | 4 ++++ 2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 4d9713324003..8411b2407c20 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -912,11 +912,25 @@ void inet_csk_prepare_forced_close(struct sock *sk) } EXPORT_SYMBOL(inet_csk_prepare_forced_close);
+static int inet_ulp_can_listen(const struct sock *sk) +{ + const struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ulp_ops && !icsk->icsk_ulp_ops->clone) + return -EINVAL; + + return 0; +} + int inet_csk_listen_start(struct sock *sk, int backlog) { struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet = inet_sk(sk); - int err = -EADDRINUSE; + int err; + + err = inet_ulp_can_listen(sk); + if (unlikely(err)) + return err;
reqsk_queue_alloc(&icsk->icsk_accept_queue);
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index 7c27aa629af1..b5d707a5a31b 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -136,6 +136,10 @@ static int __tcp_set_ulp(struct sock *sk, const struct tcp_ulp_ops *ulp_ops) if (icsk->icsk_ulp_ops) goto out_err;
+ err = -EINVAL; + if (!ulp_ops->clone && sk->sk_state == TCP_LISTEN) + goto out_err; + err = ulp_ops->init(sk); if (err) goto out_err;
From: Paolo Abeni pabeni@redhat.com
stable inclusion from stable-v5.10.165 commit f6c201b4382d1536f44b922b8f16dcb4772cc82c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6H3MB CVE: CVE-2023-0461
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
commit 8ccc99362b60c6f27bb46f36fdaaccf4ef0303de upstream.
The referenced commit changed the error code returned by the kernel when preventing a non-established socket from attaching the ktls ULP. Before to such a commit, the user-space got ENOTCONN instead of EINVAL.
The existing self-tests depend on such error code, and the change caused a failure:
RUN global.non_established ... tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107) non_established: Test failed at step #3 FAIL global.non_established
In the unlikely event existing applications do the same, address the issue by restoring the prior error code in the above scenario.
Note that the only other ULP performing similar checks at init time - smc_ulp_ops - also fails with ENOTCONN when trying to attach the ULP to a non-established socket.
Reported-by: Sabrina Dubroca sd@queasysnail.net Fixes: 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.167404268... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv4/tcp_ulp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index b5d707a5a31b..8e135af0d4f7 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -136,7 +136,7 @@ static int __tcp_set_ulp(struct sock *sk, const struct tcp_ulp_ops *ulp_ops) if (icsk->icsk_ulp_ops) goto out_err;
- err = -EINVAL; + err = -ENOTCONN; if (!ulp_ops->clone && sk->sk_state == TCP_LISTEN) goto out_err;
From: Kuniyuki Iwashima kuniyu@amazon.com
mainline inclusion from mainline-v6.3-rc1 commit be9832c2e9cc4c15906a77baddcd906fb4bb864b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6H3MB CVE: CVE-2023-0461
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") guarantees that all ULP listeners have clone() op, so we no longer need to test it in inet_clone_ulp().
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv4/inet_connection_sock.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 8411b2407c20..35aa98aa8f22 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -814,8 +814,7 @@ static void inet_clone_ulp(const struct request_sock *req, struct sock *newsk, if (!icsk->icsk_ulp_ops) return;
- if (icsk->icsk_ulp_ops->clone) - icsk->icsk_ulp_ops->clone(req, newsk, priority); + icsk->icsk_ulp_ops->clone(req, newsk, priority); }
/**
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v5.15.95 commit fdaf88531cfd17b2a710cceb3141ef6f9085ff40 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6H3MB CVE: CVE-2023-0461
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
---------------------------
When we backport dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status"), we have accidentally backported a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start().
Thus, listen() no longer returns -EADDRINUSE even if ->get_port() failed as reported in [0].
We set -EADDRINUSE to err just before ->get_port() to fix the regression.
[0]: https://lore.kernel.org/stable/EF8A45D0-768A-4CD5-9A8A-0FA6E610ABF7@winter.c...
Reported-by: Winter winter@winter.cafe Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 35aa98aa8f22..2c4843c281d2 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -941,6 +941,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog) * It is OK, because this socket enters to hash table only * after validation is complete. */ + err = -EADDRINUSE; inet_sk_state_store(sk, TCP_LISTEN); if (!sk->sk_prot->get_port(sk, inet->inet_num)) { inet->inet_sport = htons(inet->inet_num);
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JN04 CVE: NA
--------------------------------
handle_read_error() will resumit r10_bio by raid10_read_request(), which will call bio_start_io_acct() again, while bio_end_io_acct() will only be called once.
Fix the problem by don't account io again from handle_read_error().
Fixes: 528bc2cf2fcc ("md/raid10: enable io accounting") Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/raid10.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index b13dcbe00c38..1109811fd7d2 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1117,7 +1117,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf, }
static void raid10_read_request(struct mddev *mddev, struct bio *bio, - struct r10bio *r10_bio) + struct r10bio *r10_bio, bool handle_error) { struct r10conf *conf = mddev->private; struct bio *read_bio; @@ -1187,7 +1187,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio, } slot = r10_bio->read_slot;
- if (blk_queue_io_stat(bio->bi_disk->queue)) + if (!handle_error && blk_queue_io_stat(bio->bi_disk->queue)) r10_bio->start_time = bio_start_io_acct(bio); read_bio = bio_clone_fast(bio, gfp, &mddev->bio_set);
@@ -1503,7 +1503,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors) memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->copies);
if (bio_data_dir(bio) == READ) - raid10_read_request(mddev, bio, r10_bio); + raid10_read_request(mddev, bio, r10_bio, false); else raid10_write_request(mddev, bio, r10_bio); } @@ -2597,7 +2597,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio) rdev_dec_pending(rdev, mddev); allow_barrier(conf); r10_bio->state = 0; - raid10_read_request(mddev, r10_bio->master_bio, r10_bio); + raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true); }
static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
From: Logan Gunthorpe logang@deltatee.com
mainline inclusion from mainline-v6.0-rc1 commit eac58d08d4937d2eab8f71c663d98d0759845bde category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JN1I CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Comments in the code document special values used for mddev->curr_resync. Make this clearer by using an enum to label these values.
The only functional change is a couple places use the wrong comparison operator that implied 3 is another special value. They are all fixed to imply that 3 or greater is an active resync.
Signed-off-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Song Liu song@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/md.c | 40 ++++++++++++++++++---------------------- drivers/md/md.h | 15 +++++++++++++++ 2 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 84f7db691335..447b9da3aeb9 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5043,7 +5043,7 @@ static ssize_t sync_speed_show(struct mddev *mddev, char *page) { unsigned long resync, dt, db; - if (mddev->curr_resync == 0) + if (mddev->curr_resync == MD_RESYNC_NONE) return sprintf(page, "none\n"); resync = mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active); dt = (jiffies - mddev->resync_mark) / HZ; @@ -5062,8 +5062,8 @@ sync_completed_show(struct mddev *mddev, char *page) if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return sprintf(page, "none\n");
- if (mddev->curr_resync == 1 || - mddev->curr_resync == 2) + if (mddev->curr_resync == MD_RESYNC_YIELDED || + mddev->curr_resync == MD_RESYNC_DELAYED) return sprintf(page, "delayed\n");
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) || @@ -8069,7 +8069,7 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) max_sectors = mddev->dev_sectors;
resync = mddev->curr_resync; - if (resync <= 3) { + if (resync < MD_RESYNC_ACTIVE) { if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) /* Still cleaning up */ resync = max_sectors; @@ -8078,7 +8078,7 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) else resync -= atomic_read(&mddev->recovery_active);
- if (resync == 0) { + if (resync == MD_RESYNC_NONE) { if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) { struct md_rdev *rdev;
@@ -8102,7 +8102,7 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) } return 0; } - if (resync < 3) { + if (resync < MD_RESYNC_ACTIVE) { seq_printf(seq, "\tresync=DELAYED"); return 1; } @@ -8740,13 +8740,7 @@ void md_do_sync(struct md_thread *thread)
mddev->last_sync_action = action ?: desc;
- /* we overload curr_resync somewhat here. - * 0 == not engaged in resync at all - * 2 == checking that there is no conflict with another sync - * 1 == like 2, but have yielded to allow conflicting resync to - * commence - * other == active in resync - this many blocks - * + /* * Before starting a resync we must have set curr_resync to * 2, and then checked that every "conflicting" array has curr_resync * less than ours. When we find one that is the same or higher @@ -8758,7 +8752,7 @@ void md_do_sync(struct md_thread *thread)
do { int mddev2_minor = -1; - mddev->curr_resync = 2; + mddev->curr_resync = MD_RESYNC_DELAYED;
try_again: if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) @@ -8770,12 +8764,14 @@ void md_do_sync(struct md_thread *thread) && mddev2->curr_resync && match_mddev_units(mddev, mddev2)) { DEFINE_WAIT(wq); - if (mddev < mddev2 && mddev->curr_resync == 2) { + if (mddev < mddev2 && + mddev->curr_resync == MD_RESYNC_DELAYED) { /* arbitrarily yield */ - mddev->curr_resync = 1; + mddev->curr_resync = MD_RESYNC_YIELDED; wake_up(&resync_wait); } - if (mddev > mddev2 && mddev->curr_resync == 1) + if (mddev > mddev2 && + mddev->curr_resync == MD_RESYNC_YIELDED) /* no need to wait here, we can wait the next * time 'round when curr_resync == 2 */ @@ -8803,7 +8799,7 @@ void md_do_sync(struct md_thread *thread) finish_wait(&resync_wait, &wq); } } - } while (mddev->curr_resync < 2); + } while (mddev->curr_resync < MD_RESYNC_DELAYED);
j = 0; if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { @@ -8887,7 +8883,7 @@ void md_do_sync(struct md_thread *thread) desc, mdname(mddev)); mddev->curr_resync = j; } else - mddev->curr_resync = 3; /* no longer delayed */ + mddev->curr_resync = MD_RESYNC_ACTIVE; /* no longer delayed */ mddev->curr_resync_completed = j; sysfs_notify_dirent_safe(mddev->sysfs_completed); md_new_event(mddev); @@ -9022,14 +9018,14 @@ void md_do_sync(struct md_thread *thread)
if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && - mddev->curr_resync > 3) { + mddev->curr_resync >= MD_RESYNC_ACTIVE) { mddev->curr_resync_completed = mddev->curr_resync; sysfs_notify_dirent_safe(mddev->sysfs_completed); } mddev->pers->sync_request(mddev, max_sectors, &skipped);
if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) && - mddev->curr_resync > 3) { + mddev->curr_resync >= MD_RESYNC_ACTIVE) { if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { if (mddev->curr_resync >= mddev->recovery_cp) { @@ -9094,7 +9090,7 @@ void md_do_sync(struct md_thread *thread) } else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) mddev->resync_min = mddev->curr_resync_completed; set_bit(MD_RECOVERY_DONE, &mddev->recovery); - mddev->curr_resync = 0; + mddev->curr_resync = MD_RESYNC_NONE; spin_unlock(&mddev->lock);
wake_up(&resync_wait); diff --git a/drivers/md/md.h b/drivers/md/md.h index 20c16cfdd6bb..dbc566d309d6 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -284,6 +284,21 @@ struct serial_info { sector_t _subtree_last; /* highest sector in subtree of rb node */ };
+/* + * mddev->curr_resync stores the current sector of the resync but + * also has some overloaded values. + */ +enum { + /* No resync in progress */ + MD_RESYNC_NONE = 0, + /* Yielded to allow another conflicting resync to commence */ + MD_RESYNC_YIELDED = 1, + /* Delayed to check that there is no conflict with another sync */ + MD_RESYNC_DELAYED = 2, + /* Any value greater than or equal to this is in an active resync */ + MD_RESYNC_ACTIVE = 3, +}; + struct mddev { void *private; struct md_personality *pers;
From: Logan Gunthorpe logang@deltatee.com
mainline inclusion from mainline-v6.0-rc1 commit b368856aab02c8fcaabb809aad401b2cf96504f2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JN1I CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The 07layouts test in mdadm fails on some systems. The failure presents itself as the backup file not being removed before the next layout is grown into:
mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup: File exists
This is because the background mdadm process, which is responsible for cleaning up this backup file gets into an infinite loop waiting for the reshape to start. mdadm checks the mdstat file if a reshape is going and, if it is not, it waits for an event on the file or times out in 5 seconds. On faster machines, the reshape may complete before the 5 seconds times out, and thus the background mdadm process loops waiting for a reshape to start that has already occurred.
mdadm reads the mdstat file to start, but mdstat does not report that the reshape has begun, even though it has indeed begun. So the mdstat_wait() call (in mdadm) which polls on the mdstat file won't ever return until timing out.
The reason mdstat reports the reshape has started is due to an issue in status_resync(). recovery_active is subtracted from curr_resync which will result in a value of zero for the first chunk of reshaped data, and the resulting read will report no reshape in progress.
To fix this, if "resync - recovery_active" is an overloaded value, force the value to be MD_RESYNC_ACTIVE so the code reports a resync in progress.
Signed-off-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Song Liu song@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/md.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 447b9da3aeb9..a6fa4872faf3 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8073,10 +8073,20 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) /* Still cleaning up */ resync = max_sectors; - } else if (resync > max_sectors) + } else if (resync > max_sectors) { resync = max_sectors; - else + } else { resync -= atomic_read(&mddev->recovery_active); + if (resync < MD_RESYNC_ACTIVE) { + /* + * Resync has started, but the subtraction has + * yielded one of the special values. Force it + * to active to ensure the status reports an + * active resync. + */ + resync = MD_RESYNC_ACTIVE; + } + }
if (resync == MD_RESYNC_NONE) { if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {
From: Hou Tao houtao1@huawei.com
mainline inclusion from mainline-v6.3-rc1 commit 1d1f25bfda432a6b61bd0205d426226bbbd73504 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JN1I CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Don't update recovery_cp when curr_resync is MD_RESYNC_ACTIVE, otherwise md may skip the resync of the first 3 sectors if the resync procedure is interrupted before the first calling of ->sync_request() as shown below:
md_do_sync thread control thread // setup resync mddev->recovery_cp = 0 j = 0 mddev->curr_resync = MD_RESYNC_ACTIVE
// e.g., set array as idle set_bit(MD_RECOVERY_INTR, &&mddev_recovery) // resync loop // check INTR before calling sync_request !test_bit(MD_RECOVERY_INTR, &mddev->recovery
// resync interrupted // update recovery_cp from 0 to 3 // the resync of three 3 sectors will be skipped mddev->recovery_cp = 3
Fixes: eac58d08d493 ("md: Use enum for overloaded magic numbers used by mddev->curr_resync") Cc: stable@vger.kernel.org # 6.0+ Signed-off-by: Hou Tao houtao1@huawei.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index a6fa4872faf3..bf3a632176e8 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9035,7 +9035,7 @@ void md_do_sync(struct md_thread *thread) mddev->pers->sync_request(mddev, max_sectors, &skipped);
if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) && - mddev->curr_resync >= MD_RESYNC_ACTIVE) { + mddev->curr_resync > MD_RESYNC_ACTIVE) { if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { if (mddev->curr_resync >= mddev->recovery_cp) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JN1I CVE: NA
--------------------------------
status_resync() will calculate 'curr_resync - recovery_active' to show user a progress bar like following:
[============>........] resync = 61.4%
'curr_resync' and 'recovery_active' is updated in md_do_sync(), and status_resync() can read them concurrently, hence it's possible that 'curr_resync - recovery_active' can overflow to a huge number. In this case status_resync() will be stuck in the loop to print a large amount of '=', which will end up soft lockup.
Fix the problem by setting 'resync' to MD_RESYNC_ACTIVE in this case, this way resync in progress will be reported to user.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/md.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index bf3a632176e8..e072ccb08735 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8076,16 +8076,16 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) } else if (resync > max_sectors) { resync = max_sectors; } else { - resync -= atomic_read(&mddev->recovery_active); - if (resync < MD_RESYNC_ACTIVE) { - /* - * Resync has started, but the subtraction has - * yielded one of the special values. Force it - * to active to ensure the status reports an - * active resync. - */ + res = atomic_read(&mddev->recovery_active); + /* + * Resync has started, but the subtraction has overflowed or + * yielded one of the special values. Force it to active to + * ensure the status reports an active resync. + */ + if (resync < res || resync - res < MD_RESYNC_ACTIVE) resync = MD_RESYNC_ACTIVE; - } + else + resync -= res; }
if (resync == MD_RESYNC_NONE) {
From: Yu Kuai yukuai3@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6JZ3F CVE: NA
--------------------------------
raid10_sync_request() will add 'r10bio->remaining' for both rdev and replacement rdev. However, if the read io failed, recovery_request_write() will return without issuring the write io, in this case, end_sync_request() is only called once and 'remaining' is leaked, which will cause io hang.
Fix the probleming by decreasing 'remaining' according to if 'bio' and 'repl_bio' is valid.
Fixes: 24afd80d99f8 ("md/raid10: handle recovery of replacement devices.") Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/md/raid10.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 1109811fd7d2..beaada3e87ba 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2218,11 +2218,22 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) { struct r10conf *conf = mddev->private; int d; - struct bio *wbio, *wbio2; + struct bio *wbio = r10_bio->devs[1].bio; + struct bio *wbio2 = r10_bio->devs[1].repl_bio; + + /* Need to test wbio2->bi_end_io before we call + * submit_bio_noacct as if the former is NULL, + * the latter is free to free wbio2. + */ + if (wbio2 && !wbio2->bi_end_io) + wbio2 = NULL;
if (!test_bit(R10BIO_Uptodate, &r10_bio->state)) { fix_recovery_read_error(r10_bio); - end_sync_request(r10_bio); + if (wbio->bi_end_io) + end_sync_request(r10_bio); + if (wbio2) + end_sync_request(r10_bio); return; }
@@ -2231,14 +2242,6 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) * and submit the write request */ d = r10_bio->devs[1].devnum; - wbio = r10_bio->devs[1].bio; - wbio2 = r10_bio->devs[1].repl_bio; - /* Need to test wbio2->bi_end_io before we call - * submit_bio_noacct as if the former is NULL, - * the latter is free to free wbio2. - */ - if (wbio2 && !wbio2->bi_end_io) - wbio2 = NULL; if (wbio->bi_end_io) { atomic_inc(&conf->mirrors[d].rdev->nr_pending); md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(wbio));
From: Tom Lendacky thomas.lendacky@amd.com
stable inclusion from stable-v5.15.94 commit 8f12dcab90e886d0169a9cd372a8bb35339cfc19 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C CVE: CVE-2022-27672
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit be8de49bea505e7777a69ef63d60e02ac1712683 upstream.
Certain AMD processors are vulnerable to a cross-thread return address predictions bug. When running in SMT mode and one of the sibling threads transitions out of C0 state, the other sibling thread could use return target predictions from the sibling thread that transitioned out of C0.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB when context switching to the idle thread. However, KVM allows a VMM to prevent exiting guest mode when transitioning out of C0. A guest could act maliciously in this situation, so create a new x86 BUG that can be used to detect if the processor is vulnerable.
Reviewed-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Message-Id: 91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Guo Mengqi <guomengqi3@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/common.c | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 3cb7e1793c25..949325c8af52 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -471,5 +471,6 @@ #define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */ #define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */ #define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */ +#define X86_BUG_SMT_RSB X86_BUG(29) /* CPU is vulnerable to Cross-Thread Return Address Predictions */
#endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9c7708e98b32..12931233fdd7 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1124,6 +1124,8 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { #define MMIO_SBDS BIT(2) /* CPU is affected by RETbleed, speculating where you would not expect it */ #define RETBLEED BIT(3) +/* CPU is affected by SMT (cross-thread) return predictions */ +#define SMT_RSB BIT(4)
static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = { VULNBL_INTEL_STEPPINGS(IVYBRIDGE, X86_STEPPING_ANY, SRBDS), @@ -1155,8 +1157,8 @@ static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_AMD(0x15, RETBLEED), VULNBL_AMD(0x16, RETBLEED), - VULNBL_AMD(0x17, RETBLEED), - VULNBL_HYGON(0x18, RETBLEED), + VULNBL_AMD(0x17, RETBLEED | SMT_RSB), + VULNBL_HYGON(0x18, RETBLEED | SMT_RSB), {} };
@@ -1274,6 +1276,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) !(ia32_cap & ARCH_CAP_PBRSB_NO)) setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
+ if (cpu_matches(cpu_vuln_blacklist, SMT_RSB)) + setup_force_cpu_bug(X86_BUG_SMT_RSB); + if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN)) return;
From: Tom Lendacky thomas.lendacky@amd.com
stable inclusion from stable-v5.15.94 commit 5122e0e44363e3d837592b78bc04222b9d289868 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C CVE: CVE-2022-27672
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 6f0f2d5ef895d66a3f2b32dd05189ec34afa5a55 upstream.
By default, KVM/SVM will intercept attempts by the guest to transition out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change this behavior. To mitigate the cross-thread return address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to override the default behavior to intercept C0 transitions.
Use a module parameter to control the mitigation on processors that are vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug, KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Message-Id: 4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Guo Mengqi guomengqi3@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/kvm/x86.c | 43 ++++++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 52605740b1da..53ef53d5b414 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -174,6 +174,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO); int __read_mostly pi_inject_timer = -1; module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
+/* Enable/disable SMT_RSB bug mitigation */ +bool __read_mostly mitigate_smt_rsb; +module_param(mitigate_smt_rsb, bool, 0444); + /* * Restoring the host value for MSRs that are only consumed when running in * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU @@ -3945,10 +3949,15 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_CLOCK_TSC_STABLE; break; case KVM_CAP_X86_DISABLE_EXITS: - r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | - KVM_X86_DISABLE_EXITS_CSTATE; - if(kvm_can_mwait_in_guest()) - r |= KVM_X86_DISABLE_EXITS_MWAIT; + r = KVM_X86_DISABLE_EXITS_PAUSE; + + if (!mitigate_smt_rsb) { + r |= KVM_X86_DISABLE_EXITS_HLT | + KVM_X86_DISABLE_EXITS_CSTATE; + + if (kvm_can_mwait_in_guest()) + r |= KVM_X86_DISABLE_EXITS_MWAIT; + } break; case KVM_CAP_X86_SMM: /* SMBASE is usually relocated above 1M on modern chipsets, @@ -5491,15 +5500,26 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS) break;
- if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) && - kvm_can_mwait_in_guest()) - kvm->arch.mwait_in_guest = true; - if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT) - kvm->arch.hlt_in_guest = true; if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE) kvm->arch.pause_in_guest = true; - if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE) - kvm->arch.cstate_in_guest = true; + +#define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \ + "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests." + + if (!mitigate_smt_rsb) { + if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() && + (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE)) + pr_warn_once(SMT_RSB_MSG); + + if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) && + kvm_can_mwait_in_guest()) + kvm->arch.mwait_in_guest = true; + if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT) + kvm->arch.hlt_in_guest = true; + if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE) + kvm->arch.cstate_in_guest = true; + } + r = 0; break; case KVM_CAP_MSR_PLATFORM_INFO: @@ -11698,6 +11718,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request); static int __init kvm_x86_init(void) { kvm_mmu_x86_module_init(); + mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible(); return 0; } module_init(kvm_x86_init);
From: Tom Lendacky thomas.lendacky@amd.com
stable inclusion from stable-v5.15.94 commit 17170acdc7c8b8585501bb443b4f196168ae9890 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C CVE: CVE-2022-27672
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 493a2c2d23ca91afba96ac32b6cbafb54382c2a3 upstream.
Add the admin guide for the Cross-Thread Return Predictions vulnerability.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Message-Id: 60f9c0b4396956ce70499ae180cb548720b25c7e.1675956146.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Guo Mengqi guomengqi3@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- .../admin-guide/hw-vuln/cross-thread-rsb.rst | 92 +++++++++++++++++++ Documentation/admin-guide/hw-vuln/index.rst | 1 + 2 files changed, 93 insertions(+) create mode 100644 Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst
diff --git a/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst new file mode 100644 index 000000000000..ec6e9f5bcf9e --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst @@ -0,0 +1,92 @@ + +.. SPDX-License-Identifier: GPL-2.0 + +Cross-Thread Return Address Predictions +======================================= + +Certain AMD and Hygon processors are subject to a cross-thread return address +predictions vulnerability. When running in SMT mode and one sibling thread +transitions out of C0 state, the other sibling thread could use return target +predictions from the sibling thread that transitioned out of C0. + +The Spectre v2 mitigations protect the Linux kernel, as it fills the return +address prediction entries with safe targets when context switching to the idle +thread. However, KVM does allow a VMM to prevent exiting guest mode when +transitioning out of C0. This could result in a guest-controlled return target +being consumed by the sibling thread. + +Affected processors +------------------- + +The following CPUs are vulnerable: + + - AMD Family 17h processors + - Hygon Family 18h processors + +Related CVEs +------------ + +The following CVE entry is related to this issue: + + ============== ======================================= + CVE-2022-27672 Cross-Thread Return Address Predictions + ============== ======================================= + +Problem +------- + +Affected SMT-capable processors support 1T and 2T modes of execution when SMT +is enabled. In 2T mode, both threads in a core are executing code. For the +processor core to enter 1T mode, it is required that one of the threads +requests to transition out of the C0 state. This can be communicated with the +HLT instruction or with an MWAIT instruction that requests non-C0. +When the thread re-enters the C0 state, the processor transitions back +to 2T mode, assuming the other thread is also still in C0 state. + +In affected processors, the return address predictor (RAP) is partitioned +depending on the SMT mode. For instance, in 2T mode each thread uses a private +16-entry RAP, but in 1T mode, the active thread uses a 32-entry RAP. Upon +transition between 1T/2T mode, the RAP contents are not modified but the RAP +pointers (which control the next return target to use for predictions) may +change. This behavior may result in return targets from one SMT thread being +used by RET predictions in the sibling thread following a 1T/2T switch. In +particular, a RET instruction executed immediately after a transition to 1T may +use a return target from the thread that just became idle. In theory, this +could lead to information disclosure if the return targets used do not come +from trustworthy code. + +Attack scenarios +---------------- + +An attack can be mounted on affected processors by performing a series of CALL +instructions with targeted return locations and then transitioning out of C0 +state. + +Mitigation mechanism +-------------------- + +Before entering idle state, the kernel context switches to the idle thread. The +context switch fills the RAP entries (referred to as the RSB in Linux) with safe +targets by performing a sequence of CALL instructions. + +Prevent a guest VM from directly putting the processor into an idle state by +intercepting HLT and MWAIT instructions. + +Both mitigations are required to fully address this issue. + +Mitigation control on the kernel command line +--------------------------------------------- + +Use existing Spectre v2 mitigations that will fill the RSB on context switch. + +Mitigation control for KVM - module parameter +--------------------------------------------- + +By default, the KVM hypervisor mitigates this issue by intercepting guest +attempts to transition out of C0. A VMM can use the KVM_CAP_X86_DISABLE_EXITS +capability to override those interceptions, but since this is not common, the +mitigation that covers this path is not enabled by default. + +The mitigation for the KVM_CAP_X86_DISABLE_EXITS capability can be turned on +using the boolean module parameter mitigate_smt_rsb, e.g.: + kvm.mitigate_smt_rsb=1 diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index 9b688b1990c6..083e80718306 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -17,3 +17,4 @@ are configurable at compile, boot or run time. special-register-buffer-data-sampling.rst core-scheduling.rst processor_mmio_stale_data.rst + cross-thread-rsb.rst
From: Paolo Bonzini pbonzini@redhat.com
mainline inclusion from mainline-v6.2 commit 971cecb9591a7b8ceae658252bf15240d7078a45 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C CVE: CVE-2022-27672
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The following warning:
Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst:92: ERROR: Unexpected indentation.
was introduced by commit 493a2c2d23ca. Fix it by placing everything in the same paragraph and also use a monospace font.
Fixes: 493a2c2d23ca ("Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions") Reported-by: Stephen Rothwell <sfr@canb@auug.org.au> Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Guo Mengqi guomengqi3@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst index ec6e9f5bcf9e..875616d675fe 100644 --- a/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst +++ b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst @@ -88,5 +88,4 @@ capability to override those interceptions, but since this is not common, the mitigation that covers this path is not enabled by default.
The mitigation for the KVM_CAP_X86_DISABLE_EXITS capability can be turned on -using the boolean module parameter mitigate_smt_rsb, e.g.: - kvm.mitigate_smt_rsb=1 +using the boolean module parameter mitigate_smt_rsb, e.g. ``kvm.mitigate_smt_rsb=1``.
From: Julian Anastasov ja@ssi.bg
mainline inclusion from mainline-v6.2-rc8 commit c1d2ecdf5e38e3489ce8328238b558b3b2866fe1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6I3JU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Entries can linger in cache without timer for days, thanks to the gc_thresh1 limit. As result, without traffic, the confirmed time can be outdated and to appear to be in the future. Later, on traffic, NUD_STALE entries can switch to NUD_DELAY and start the timer which can see the invalid confirmed time and wrongly switch to NUD_REACHABLE state instead of NUD_PROBE. As result, timer is set many days in the future. This is more visible on 32-bit platforms, with higher HZ value.
Why this is a problem? While we expect unused entries to expire, such entries stay in REACHABLE state for too long, locked in cache. They are not expired normally, only when cache is full.
Problem and the wrong state change reported by Zhang Changzhong:
172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
350520 seconds have elapsed since this entry was last updated, but it is still in the REACHABLE state (base_reachable_time_ms is 30000), preventing lladdr from being updated through probe.
Fix it by ensuring timer is started with valid used/confirmed times. Considering the valid time range is LONG_MAX jiffies, we try not to go too much in the past while we are in DELAY/PROBE state. There are also places that need used/updated times to be validated while timer is not running.
Reported-by: Zhang Changzhong zhangchangzhong@huawei.com Signed-off-by: Julian Anastasov ja@ssi.bg Tested-by: Zhang Changzhong zhangchangzhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Zhang Changzhong zhangchangzhong@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/core/neighbour.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 434c5aab83ea..e833e7e85a4b 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -242,7 +242,7 @@ static int neigh_forced_gc(struct neigh_table *tbl) (n->nud_state == NUD_NOARP) || (tbl->is_multicast && tbl->is_multicast(n->primary_key)) || - time_after(tref, n->updated)) + !time_in_range(n->updated, tref, jiffies)) remove = true; write_unlock(&n->lock);
@@ -262,7 +262,17 @@ static int neigh_forced_gc(struct neigh_table *tbl)
static void neigh_add_timer(struct neighbour *n, unsigned long when) { + /* Use safe distance from the jiffies - LONG_MAX point while timer + * is running in DELAY/PROBE state but still show to user space + * large times in the past. + */ + unsigned long mint = jiffies - (LONG_MAX - 86400 * HZ); + neigh_hold(n); + if (!time_in_range(n->confirmed, mint, jiffies)) + n->confirmed = mint; + if (time_before(n->used, n->confirmed)) + n->used = n->confirmed; if (unlikely(mod_timer(&n->timer, when))) { printk("NEIGH: BUG, double timer add, state is %x\n", n->nud_state); @@ -948,12 +958,14 @@ static void neigh_periodic_work(struct work_struct *work) goto next_elt; }
- if (time_before(n->used, n->confirmed)) + if (time_before(n->used, n->confirmed) && + time_is_before_eq_jiffies(n->confirmed)) n->used = n->confirmed;
if (refcount_read(&n->refcnt) == 1 && (state == NUD_FAILED || - time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) { + !time_in_range_open(jiffies, n->used, + n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) { *np = n->next; neigh_mark_dead(n); write_unlock(&n->lock);
From: Miaoqian Lin linmq006@gmail.com
mainline inclusion from mainline-v5.17-rc1 commit 045a31b95509c8f25f5f04ec5e0dec5cd09f2c5f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IXQP CVE: CVE-2023-23000
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
callers of tegra_xusb_find_port_node() function only do NULL checking for the return value. return NULL instead of ERR_PTR(-ENOMEM) to keep consistent.
Signed-off-by: Miaoqian Lin linmq006@gmail.com Acked-by: Thierry Reding treding@nvidia.com Link: https://lore.kernel.org/r/20211213020507.1458-1-linmq006@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Wang Yufen wangyufen@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/phy/tegra/xusb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c index 181a1be5f491..02da8c0a14ff 100644 --- a/drivers/phy/tegra/xusb.c +++ b/drivers/phy/tegra/xusb.c @@ -449,7 +449,7 @@ tegra_xusb_find_port_node(struct tegra_xusb_padctl *padctl, const char *type, name = kasprintf(GFP_KERNEL, "%s-%u", type, index); if (!name) { of_node_put(ports); - return ERR_PTR(-ENOMEM); + return NULL; } np = of_get_child_by_name(ports, name); kfree(name);
From: Pietro Borrello borrello@diag.uniroma1.it
stable inclusion from stable-v5.10.166 commit 5dc3469a1170dd1344d262a332b26994214eeb58 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KCIO CVE: CVE-2023-1073
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Add a check for empty report_list in hid_validate_values(). The missing check causes a type confusion when issuing a list_entry() on an empty report_list. The problem is caused by the assumption that the device must have valid report_list. While this will be true for all normal HID devices, a suitably malicious device can violate the assumption.
Fixes: 1b15d2e5b807 ("HID: core: fix validation of report id 0") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/hid/hid-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index 5550c943f985..2f512814a111 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -988,8 +988,8 @@ struct hid_report *hid_validate_values(struct hid_device *hid, * Validating on id 0 means we should examine the first * report in the list. */ - report = list_entry( - hid->report_enum[type].report_list.next, + report = list_first_entry_or_null( + &hid->report_enum[type].report_list, struct hid_report, list); } else { report = hid->report_enum[type].report_id_hash[id];
From: Pietro Borrello borrello@diag.uniroma1.it
stable inclusion from stable-v5.10.166 commit 20fd4598762e2d717deb64ef028e6f5f587ac2a6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KCIO CVE: CVE-2023-1073
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
Add a check for empty report_list in bigben_probe(). The missing check causes a type confusion when issuing a list_entry() on an empty report_list. The problem is caused by the assumption that the device must have valid report_list. While this will be true for all normal HID devices, a suitably malicious device can violate the assumption.
Fixes: 256a90ed9e46 ("HID: hid-bigbenff: driver for BigBen Interactive PS3OFMINIPAD gamepad") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Lin Yujun linyujun809@huawei.com Reviewed-by: Liao Chang liaochang1@huawei.com Reviewed-by: Zhang Jianhua chris.zjh@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/hid/hid-bigbenff.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/hid/hid-bigbenff.c b/drivers/hid/hid-bigbenff.c index e8c5e3ac9fff..e8b16665860d 100644 --- a/drivers/hid/hid-bigbenff.c +++ b/drivers/hid/hid-bigbenff.c @@ -344,6 +344,11 @@ static int bigben_probe(struct hid_device *hid, }
report_list = &hid->report_enum[HID_OUTPUT_REPORT].report_list; + if (list_empty(report_list)) { + hid_err(hid, "no output report found\n"); + error = -ENODEV; + goto error_hw_stop; + } bigben->report = list_entry(report_list->next, struct hid_report, list);
From: Duoming Zhou duoming@zju.edu.cn
mainline inclusion from mainline-v6.3-rc1 commit 29b0589a865b6f66d141d79b2dd1373e4e50fe17 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6IW01 CVE: CVE-2023-1118
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h...
--------------------------------
When the ene device is detaching, function ene_remove() will be called. But there is no function to cancel tx_sim_timer in ene_remove(), the timer handler ene_tx_irqsim() could race with ene_remove(). As a result, the UAF bugs could happen, the process is shown below.
(cleanup routine) | (timer routine) | mod_timer(&dev->tx_sim_timer, ..) ene_remove() | (wait a time) | ene_tx_irqsim() | dev->hw_lock //USE | ene_tx_sample(dev) //USE
Fix by adding del_timer_sync(&dev->tx_sim_timer) in ene_remove(), The tx_sim_timer could stop before ene device is deallocated.
What's more, The rc_unregister_device() and del_timer_sync() should be called first in ene_remove() and the deallocated functions such as free_irq(), release_region() and so on should be called behind them. Because the rc_unregister_device() is well synchronized. Otherwise, race conditions may happen. The situations that may lead to race conditions are shown below.
Firstly, the rx receiver is disabled with ene_rx_disable() before rc_unregister_device() in ene_remove(), which means it can be enabled again if a process opens /dev/lirc0 between ene_rx_disable() and rc_unregister_device().
Secondly, the irqaction descriptor is freed by free_irq() before the rc device is unregistered, which means irqaction descriptor may be accessed again after it is deallocated.
Thirdly, the timer can call ene_tx_sample() that can write to the io ports, which means the io ports could be accessed again after they are deallocated by release_region().
Therefore, the rc_unregister_device() and del_timer_sync() should be called first in ene_remove().
Suggested by: Sean Young sean@mess.org
Fixes: 9ea53b74df9c ("V4L/DVB: STAGING: remove lirc_ene0100 driver") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Ren Zhijie renzhijie2@huawei.com Reviewed-by: songping yu yusongping@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Reviewed-by: chenhui judy.chenhui@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/media/rc/ene_ir.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/rc/ene_ir.c b/drivers/media/rc/ene_ir.c index 6049e5c95394..5aa3953cab82 100644 --- a/drivers/media/rc/ene_ir.c +++ b/drivers/media/rc/ene_ir.c @@ -1106,6 +1106,8 @@ static void ene_remove(struct pnp_dev *pnp_dev) struct ene_device *dev = pnp_get_drvdata(pnp_dev); unsigned long flags;
+ rc_unregister_device(dev->rdev); + del_timer_sync(&dev->tx_sim_timer); spin_lock_irqsave(&dev->hw_lock, flags); ene_rx_disable(dev); ene_rx_restore_hw_buffer(dev); @@ -1113,7 +1115,6 @@ static void ene_remove(struct pnp_dev *pnp_dev)
free_irq(dev->irq, dev); release_region(dev->hw_io, ENE_IO_SIZE); - rc_unregister_device(dev->rdev); kfree(dev); }